The quality of automated gene prediction in microbial organisms has improved steadily over the past decade, but there is still room for improvement. Increasing the number of correct identifications, ...both of genes and of the translation initiation sites for each gene, and reducing the overall number of false positives, are all desirable goals.
With our years of experience in manually curating genomes for the Joint Genome Institute, we developed a new gene prediction algorithm called Prodigal (PROkaryotic DYnamic programming Gene-finding ALgorithm). With Prodigal, we focused specifically on the three goals of improved gene structure prediction, improved translation initiation site recognition, and reduced false positives. We compared the results of Prodigal to existing gene-finding methods to demonstrate that it met each of these objectives.
We built a fast, lightweight, open source gene prediction program called Prodigal http://compbio.ornl.gov/prodigal/. Prodigal achieved good results compared to existing methods, and we believe it will be a valuable asset to automated microbial annotation pipelines.
The emergence of next generation sequencing (NGS) has provided the means for rapid and high throughput sequencing and data generation at low cost, while concomitantly creating a new set of ...challenges. The number of available assembled microbial genomes continues to grow rapidly and their quality reflects the quality of the sequencing technology used, but also of the analysis software employed for assembly and annotation.
In this work, we have explored the quality of the microbial draft genomes across various sequencing technologies. We have compared the draft and finished assemblies of 133 microbial genomes sequenced at the Department of Energy-Joint Genome Institute and finished at the Los Alamos National Laboratory using a variety of combinations of sequencing technologies, reflecting the transition of the institute from Sanger-based sequencing platforms to NGS platforms. The quality of the public assemblies and of the associated gene annotations was evaluated using various metrics. Results obtained with the different sequencing technologies, as well as their effects on downstream processes, were analyzed. Our results demonstrate that the Illumina HiSeq 2000 sequencing system, the primary sequencing technology currently used for de novo genome sequencing and assembly at JGI, has various advantages in terms of total sequence throughput and cost, but it also introduces challenges for the downstream analyses. In all cases assembly results although on average are of high quality, need to be viewed critically and consider sources of errors in them prior to analysis.
These data follow the evolution of microbial sequencing and downstream processing at the JGI from draft genome sequences with large gaps corresponding to missing genes of significant biological role to assemblies with multiple small gaps (Illumina) and finally to assemblies that generate almost complete genomes (Illumina+PacBio).
Enigmatic, ultrasmall, uncultivated Archaea Baker, Brett J; Comolli, Luis R; Dick, Gregory J ...
Proceedings of the National Academy of Sciences,
05/2010, Letnik:
107, Številka:
19
Journal Article
Recenzirano
Odprti dostop
Metagenomics has provided access to genomes of as yet uncultivated microorganisms in natural environments, yet there are gaps in our knowledge--particularly for Archaea--that occur at relatively low ...abundance and in extreme environments. Ultrasmall cells (<500 nm in diameter) from lineages without cultivated representatives that branch near the crenarchaeal/euryarchaeal divide have been detected in a variety of acidic ecosystems. We reconstructed composite, near-complete approximately 1-Mb genomes for three lineages, referred to as ARMAN (archaeal Richmond Mine acidophilic nanoorganisms), from environmental samples and a biofilm filtrate. Genes of two lineages are among the smallest yet described, enabling a 10% higher coding density than found genomes of the same size, and there are noncontiguous genes. No biological function could be inferred for up to 45% of genes and no more than 63% of the predicted proteins could be assigned to a revised set of archaeal clusters of orthologous groups. Some core metabolic genes are more common in Crenarchaeota than Euryarchaeota, up to 21% of genes have the highest sequence identity to bacterial genes, and 12 belong to clusters of orthologous groups that were previously exclusive to bacteria. A small subset of 3D cryo-electron tomographic reconstructions clearly show penetration of the ARMAN cell wall and cytoplasmic membranes by protuberances extended from cells of the archaeal order THERMOPLASMATALES: Interspecies interactions, the presence of a unique internal tubular organelle Comolli, et al. (2009) ISME J 3:159-167, and many genes previously only affiliated with Crenarchaea or Bacteria indicate extensive unique physiology in organisms that branched close to the time that Cren- and Euryarchaeotal lineages diverged.
The genome sequence of Geobacter metallireducens is the second to be completed from the metal-respiring genus Geobacter, and is compared in this report to that of Geobacter sulfurreducens in order to ...understand their metabolic, physiological and regulatory similarities and differences.
The experimentally observed greater metabolic versatility of G. metallireducens versus G. sulfurreducens is borne out by the presence of more numerous genes for metabolism of organic acids including acetate, propionate, and pyruvate. Although G. metallireducens lacks a dicarboxylic acid transporter, it has acquired a second putative succinate dehydrogenase/fumarate reductase complex, suggesting that respiration of fumarate was important until recently in its evolutionary history. Vestiges of the molybdate (ModE) regulon of G. sulfurreducens can be detected in G. metallireducens, which has lost the global regulatory protein ModE but retained some putative ModE-binding sites and multiplied certain genes of molybdenum cofactor biosynthesis. Several enzymes of amino acid metabolism are of different origin in the two species, but significant patterns of gene organization are conserved. Whereas most Geobacteraceae are predicted to obtain biosynthetic reducing equivalents from electron transfer pathways via a ferredoxin oxidoreductase, G. metallireducens can derive them from the oxidative pentose phosphate pathway. In addition to the evidence of greater metabolic versatility, the G. metallireducens genome is also remarkable for the abundance of multicopy nucleotide sequences found in intergenic regions and even within genes.
The genomic evidence suggests that metabolism, physiology and regulation of gene expression in G. metallireducens may be dramatically different from other Geobacteraceae.
Clostridium thermocellum is a thermophilic, obligately anaerobic, Gram-positive bacterium that is a candidate microorganism for converting cellulosic biomass into ethanol through consolidated ...bioprocessing. Ethanol intolerance is an important metric in terms of process economics, and tolerance has often been described as a complex and likely multigenic trait for which complex gene interactions come into play. Here, we resequence the genome of an ethanol-tolerant mutant, show that the tolerant phenotype is primarily due to a mutated bifunctional acetaldehyde-CoA/alcohol dehydrogenase gene (adhE), hypothesize based on structural analysis that cofactor specificity may be affected, and confirm this hypothesis using enzyme assays. Biochemical assays confirm a complete loss of NADH-dependent activity with concomitant acquisition of NADPH-dependent activity, which likely affects electron flow in the mutant. The simplicity of the genetic basis for the ethanol-tolerant phenotype observed here informs rational engineering of mutant microbial strains for cellulosic ethanol production.
Chloroflexus aurantiacus is a thermophilic filamentous anoxygenic phototrophic (FAP) bacterium, and can grow phototrophically under anaerobic conditions or chemotrophically under aerobic and dark ...conditions. According to 16S rRNA analysis, Chloroflexi species are the earliest branching bacteria capable of photosynthesis, and Cfl. aurantiacus has been long regarded as a key organism to resolve the obscurity of the origin and early evolution of photosynthesis. Cfl. aurantiacus contains a chimeric photosystem that comprises some characters of green sulfur bacteria and purple photosynthetic bacteria, and also has some unique electron transport proteins compared to other photosynthetic bacteria.
The complete genomic sequence of Cfl. aurantiacus has been determined, analyzed and compared to the genomes of other photosynthetic bacteria.
Abundant genomic evidence suggests that there have been numerous gene adaptations/replacements in Cfl. aurantiacus to facilitate life under both anaerobic and aerobic conditions, including duplicate genes and gene clusters for the alternative complex III (ACIII), auracyanin and NADH:quinone oxidoreductase; and several aerobic/anaerobic enzyme pairs in central carbon metabolism and tetrapyrroles and nucleic acids biosynthesis. Overall, genomic information is consistent with a high tolerance for oxygen that has been reported in the growth of Cfl. aurantiacus. Genes for the chimeric photosystem, photosynthetic electron transport chain, the 3-hydroxypropionate autotrophic carbon fixation cycle, CO2-anaplerotic pathways, glyoxylate cycle, and sulfur reduction pathway are present. The central carbon metabolism and sulfur assimilation pathways in Cfl. aurantiacus are discussed. Some features of the Cfl. aurantiacus genome are compared with those of the Roseiflexus castenholzii genome. Roseiflexus castenholzii is a recently characterized FAP bacterium and phylogenetically closely related to Cfl. aurantiacus. According to previous reports and the genomic information, perspectives of Cfl. aurantiacus in the evolution of photosynthesis are also discussed.
The genomic analyses presented in this report, along with previous physiological, ecological and biochemical studies, indicate that the anoxygenic phototroph Cfl. aurantiacus has many interesting and certain unique features in its metabolic pathways. The complete genome may also shed light on possible evolutionary connections of photosynthesis.
Bacterial endophytes that colonize Populus trees contribute to nutrient acquisition, prime immunity responses, and directly or indirectly increase both above- and below-ground biomasses. Endophytes ...are embedded within plant material, so physical separation and isolation are difficult tasks. Application of culture-independent methods, such as metagenome or bacterial transcriptome sequencing, has been limited due to the predominance of DNA from the plant biomass. Here, we describe a modified differential and density gradient centrifugation-based protocol for the separation of endophytic bacteria from Populus roots. This protocol achieved substantial reduction in contaminating plant DNA, allowed enrichment of endophytic bacteria away from the plant material, and enabled single-cell genomics analysis. Four single-cell genomes were selected for whole-genome amplification based on their rarity in the microbiome (potentially uncultured taxa) as well as their inferred abilities to form associations with plants. Bioinformatics analyses, including assembly, contamination removal, and completeness estimation, were performed to obtain single-amplified genomes (SAGs) of organisms from the phyla Armatimonadetes, Verrucomicrobia, and Planctomycetes, which were unrepresented in our previous cultivation efforts. Comparative genomic analysis revealed unique characteristics of each SAG that could facilitate future cultivation efforts for these bacteria.
Plant roots harbor a diverse collection of microbes that live within host tissues. To gain a comprehensive understanding of microbial adaptations to this endophytic lifestyle from strains that cannot be cultivated, it is necessary to separate bacterial cells from the predominance of plant tissue. This study provides a valuable approach for the separation and isolation of endophytic bacteria from plant root tissue. Isolated live bacteria provide material for microbiome sequencing, single-cell genomics, and analyses of genomes of uncultured bacteria to provide genomics information that will facilitate future cultivation attempts.
The selected robust fungus,
Aspergillus oryzae
strain BCC7051 is of interest for biotechnological production of lipid-derived products due to its capability to accumulate high amount of intracellular ...lipids using various sugars and agro-industrial substrates. Here, we report the genome sequence of the oleaginous
A
.
oryzae
BCC7051. The obtained reads were de novo assembled into 25 scaffolds spanning of 38,550,958 bps with predicted 11,456 protein-coding genes. By synteny mapping, a large rearrangement was found in two scaffolds of
A. oryzae
BCC7051 as compared to the reference RIB40 strain. The genetic relationship between BCC7051 and other strains of
A. oryzae
in terms of aflatoxin production was investigated, indicating that the
A. oryzae
BCC7051 was categorized into group 2 nonaflatoxin-producing strain. Moreover, a comparative analysis of the structural genes focusing on the involvement in lipid metabolism among oleaginous yeast and fungi revealed the presence of multiple isoforms of metabolic enzymes responsible for fatty acid synthesis in BCC7051. The alternative routes of acetyl-CoA generation as oleaginous features and malate/citrate/pyruvate shuttle were also identified in this
A. oryzae
strain. The genome sequence generated in this work is a dedicated resource for expanding genome-wide study of microbial lipids at systems level, and developing the fungal-based platform for production of diversified lipids with commercial relevance.