Clostridium difficile, a Gram positive, anaerobic, spore-forming bacterium is an emergent pathogen and the most common cause of nosocomial diarrhea. Although transmission of C. difficile is mediated ...by contamination of the gut by spores, the regulatory cascade controlling spore formation remains poorly characterized. During Bacillus subtilis sporulation, a cascade of four sigma factors, σ(F) and σ(G) in the forespore and σ(E) and σ(K) in the mother cell governs compartment-specific gene expression. In this work, we combined genome wide transcriptional analyses and promoter mapping to define the C. difficile σ(F), σ(E), σ(G) and σ(K) regulons. We identified about 225 genes under the control of these sigma factors: 25 in the σ(F) regulon, 97 σ(E)-dependent genes, 50 σ(G)-governed genes and 56 genes under σ(K) control. A significant fraction of genes in each regulon is of unknown function but new candidates for spore coat proteins could be proposed as being synthesized under σ(E) or σ(K) control and detected in a previously published spore proteome. SpoIIID of C. difficile also plays a pivotal role in the mother cell line of expression repressing the transcription of many members of the σ(E) regulon and activating sigK expression. Global analysis of developmental gene expression under the control of these sigma factors revealed deviations from the B. subtilis model regarding the communication between mother cell and forespore in C. difficile. We showed that the expression of the σ(E) regulon in the mother cell was not strictly under the control of σ(F) despite the fact that the forespore product SpoIIR was required for the processing of pro-σ(E). In addition, the σ(K) regulon was not controlled by σ(G) in C. difficile in agreement with the lack of pro-σ(K) processing. This work is one key step to obtain new insights about the diversity and evolution of the sporulation process among Firmicutes.
The GNTR family of transcription factors (TFs) is a large group of proteins present in diverse bacteria and regulating various biological processes. Here we use the comparative genomics approach to ...reconstruct regulons and identify binding motifs of regulators from three subfamilies of the GNTR family, FADR, HUTC, and YTRA. Using these data, we attempt to predict DNA-protein contacts by analyzing correlations between binding motifs in DNA and amino acid sequences of TFs. We identify pairs of positions with high correlation between amino acids and nucleotides for FADR, HUTC, and YTRA subfamilies and show that the most predicted DNA-protein interactions are quite similar in all subfamilies and conform well to the experimentally identified contacts formed by FadR from E. coli and AraR from B. subtilis. The most frequent predicted contacts in the analyzed subfamilies are Arg-G, Asn-A, Asp-C. We also analyze the divergon structure and preferred site positions relative to regulated genes in the FADR and HUTC subfamilies. A single site in a divergon usually regulates both operons and is approximately in the middle of the intergenic area. Double sites are either involved in the co-operative regulation of both operons and then are in the center of the intergenic area, or each site in the pair independently regulates its own operon and tends to be near it. We also identify additional candidate TF-binding boxes near palindromic binding sites of TFs from the FADR, HUTC, and YTRA subfamilies, which may play role in the binding of additional TF-subunits.
Holometabolous insects are predominantly motionless during metamorphosis, when no active feeding is observed and the body is enclosed in a hardened cuticle. These physiological properties as well as ...undergoing processes resemble embryogenesis, since at the pupal stage organs and systems of the imago are formed. Therefore, recapitulation of the embryonic expression program during metamorphosis could be hypothesized. To assess this hypothesis at the transcriptome level, we have performed a comprehensive analysis of the developmental datasets available in the public domain. Indeed, for most datasets, the pupal gene expression resembles the embryonic rather than the larval pattern, interrupting gradual changes in the transcriptome. Moreover, changes in the transcriptome profile during the pupa-to-imago transition are positively correlated with those at the embryo-to-larvae transition, suggesting that similar expression programs are activated. Gene sets that change their expression level during the larval stage and revert it to the embryonic-like state during the metamorphosis are enriched with genes associated with metabolism and development.
The pangenome is the collection of all groups of orthologous genes (OGGs) from a set of genomes. We apply the pangenome analysis to propose a definition of prokaryotic species based on identification ...of lineage-specific gene sets. While being similar to the classical biological definition based on allele flow, it does not rely on DNA similarity levels and does not require analysis of homologous recombination. Hence this definition is relatively objective and independent of arbitrary thresholds. A systematic analysis of 110 accepted species with the largest numbers of sequenced strains yields results largely consistent with the existing nomenclature. However, it has revealed that abundant marine cyanobacteria
should be divided into two species. As a control we have confirmed the paraphyletic origin of
(with embedded, monophyletic
) and
(with
). We also demonstrate that by our definition and in accordance with recent studies
and
spp. are one species.
Nickel (Ni) and cobalt (Co) are trace elements required for a variety of biological processes. Ni is directly coordinated by proteins, whereas Co is mainly used as a component of vitamin B12. ...Although a number of Ni and Co-dependent enzymes have been characterized, systematic evolutionary analyses of utilization of these metals are limited.
We carried out comparative genomic analyses to examine occurrence and evolutionary dynamics of the use of Ni and Co at the level of (i) transport systems, and (ii) metalloproteomes. Our data show that both metals are widely used in bacteria and archaea. Cbi/NikMNQO is the most common prokaryotic Ni/Co transporter, while Ni-dependent urease and Ni-Fe hydrogenase, and B12-dependent methionine synthase (MetH), ribonucleotide reductase and methylmalonyl-CoA mutase are the most widespread metalloproteins for Ni and Co, respectively. Occurrence of other metalloenzymes showed a mosaic distribution and a new B12-dependent protein family was predicted. Deltaproteobacteria and Methanosarcina generally have larger Ni- and Co-dependent proteomes. On the other hand, utilization of these two metals is limited in eukaryotes, and very few of these organisms utilize both of them. The Ni-utilizing eukaryotes are mostly fungi (except saccharomycotina) and plants, whereas most B12-utilizing organisms are animals. The NiCoT transporter family is the most widespread eukaryotic Ni transporter, and eukaryotic urease and MetH are the most common Ni- and B12-dependent enzymes, respectively. Finally, investigation of environmental and other conditions and identity of organisms that show dependence on Ni or Co revealed that host-associated organisms (particularly obligate intracellular parasites and endosymbionts) have a tendency for loss of Ni/Co utilization.
Our data provide information on the evolutionary dynamics of Ni and Co utilization and highlight widespread use of these metals in the three domains of life, yet only a limited number of user proteins.
Multiple sequencing of genomes belonging to a bacterial species allows one to analyze and compare statistics and dynamics of the gene complements of species, their pan-genomes. Here, we analyzed ...multiple genomes of Escherichia coli, Shigella spp., and Salmonella enterica. We demonstrate that the distribution of the number of genomes harboring a gene is well approximated by a sum of two power functions, describing frequent genes (present in many strains) and rare genes (present in few strains). The virtual absence of Shigella-specific genes not present in E. coli genomes confirms previous observations that Shigella is not an independent genus. While the pan-genome size is increasing with each new strain, the number of genes present in a fixed fraction of strains stabilizes quickly. For instance, slightly fewer than 4,000 genes are present in at least half of any group of E. coli genomes. Comparison of S. enterica and E. coli pan-genomes revealed the existence of a common periphery, that is, genes present in some but not all strains of both species. Analysis of phylogenetic trees demonstrates that rare genes from the periphery likely evolve under horizontal transfer, whereas frequent periphery genes may have been inherited from the periphery genome of the common ancestor.
Over the past decade, genome-wide assays for chromatin interactions in single cells have enabled the study of individual nuclei at unprecedented resolution and throughput. Current chromosome ...conformation capture techniques survey contacts for up to tens of thousands of individual cells, improving our understanding of genome function in 3D. However, these methods recover a small fraction of all contacts in single cells, requiring specialised processing of sparse interactome data. In this review, we highlight recent advances in methods for the interpretation of single-cell genomic contacts. After discussing the strengths and limitations of these methods, we outline frontiers for future development in this rapidly moving field.
Transcriptome sequencing data has become an integral component of modern genetics, genomics and evolutionary biology. However, despite advances in the technologies of DNA sequencing, such data are ...lacking for many groups of living organisms, in particular, many plant taxa. We present here the results of transcriptome sequencing for two closely related plant species. These species, Fagopyrum esculentum and F. tataricum, belong to the order Caryophyllales--a large group of flowering plants with uncertain evolutionary relationships. F. esculentum (common buckwheat) is also an important food crop. Despite these practical and evolutionary considerations Fagopyrum species have not been the subject of large-scale sequencing projects.
Normalized cDNA corresponding to genes expressed in flowers and inflorescences of F. esculentum and F. tataricum was sequenced using the 454 pyrosequencing technology. This resulted in 267 (for F. esculentum) and 229 (F. tataricum) thousands of reads with average length of 341-349 nucleotides. De novo assembly of the reads produced about 25 thousands of contigs for each species, with 7.5-8.2× coverage. Comparative analysis of two transcriptomes demonstrated their overall similarity but also revealed genes that are presumably differentially expressed. Among them are retrotransposon genes and genes involved in sugar biosynthesis and metabolism. Thirteen single-copy genes were used for phylogenetic analysis; the resulting trees are largely consistent with those inferred from multigenic plastid datasets. The sister relationships of the Caryophyllales and asterids now gained high support from nuclear gene sequences.
454 transcriptome sequencing and de novo assembly was performed for two congeneric flowering plant species, F. esculentum and F. tataricum. As a result, a large set of cDNA sequences that represent orthologs of known plant genes as well as potential new genes was generated.
A typical eukaryotic gene is comprised of alternating stretches of regions, exons and introns, retained in and spliced out a mature mRNA, respectively. Although the length of introns may vary ...substantially among organisms, a large fraction of genes contains short introns in many species. Notably, some Ciliates (Paramecium and Nyctotherus) possess only ultra-short introns, around 25 bp long. In Paramecium, ultra-short introns with length divisible by three (3n) are under strong evolutionary pressure and have a high frequency of in-frame stop codons, which, in the case of intron retention, cause premature termination of mRNA translation and consequent degradation of the mis-spliced mRNA by the nonsense-mediated decay mechanism. Here, we analyzed introns in five genera of Ciliates, Paramecium, Tetrahymena, Ichthyophthirius, Oxytricha, and Stylonychia. Introns can be classified into two length classes in Tetrahymena and Ichthyophthirius (with means 48 bp, 69 bp, and 55 bp, 64 bp, respectively), but, surprisingly, comprise three distinct length classes in Oxytricha and Stylonychia (with means 33-35 bp, 47-51 bp, and 78-80 bp). In most ranges of the intron lengths, 3n introns are underrepresented and have a high frequency of in-frame stop codons in all studied species. Introns of Paramecium, Tetrahymena, and Ichthyophthirius are preferentially located at the 5' and 3' ends of genes, whereas introns of Oxytricha and Stylonychia are strongly skewed towards the 5' end. Analysis of evolutionary conservation shows that, in each studied genome, a significant fraction of intron positions is conserved between the orthologs, but intron lengths are not correlated between the species. In summary, our study provides a detailed characterization of introns in several genera of Ciliates and highlights some of their distinctive properties, which, together, indicate that splicing spellchecking is a universal and evolutionarily conserved process in the biogenesis of short introns in various representatives of Ciliates.
RNA editing in the form of substituting adenine with inosine (A-to-I editing) is the most frequent type of RNA editing in many metazoan species. In most species, A-to-I editing sites tend to form ...clusters and editing at clustered sites depends on editing of the adjacent sites. Although functionally important in some specific cases, A-to-I editing usually is rare. The exception occurs in soft-bodied coleoid cephalopods, where tens of thousands of potentially important A-to-I editing sites have been identified, making coleoids an ideal model for studying of properties and evolution of A-to-I editing sites. Here, we apply several diverse techniques to demonstrate a strong tendency of coleoid RNA editing sites to cluster along the transcript. We show that clustering of editing sites and correlated editing substantially contribute to the transcriptome diversity that arises due to extensive RNA editing. Moreover, we identify three distinct types of editing site clusters, varying in size, and describe RNA structural features and mechanisms likely underlying formation of these clusters. In particular, these observations may explain sequence conservation at large distances around editing sites and the observed dependency of editing on mutations in the vicinity of editing sites.