Abstract
Large-scale genome sequencing and the increasingly massive use of high-throughput approaches produce a vast amount of new information that completely transforms our understanding of ...thousands of microbial species. However, despite the development of powerful bioinformatics approaches, full interpretation of the content of these genomes remains a difficult task. Launched in 2005, the MicroScope platform (https://www.genoscope.cns.fr/agc/microscope) has been under continuous development and provides analysis for prokaryotic genome projects together with metabolic network reconstruction and post-genomic experiments allowing users to improve the understanding of gene functions. Here we present new improvements of the MicroScope user interface for genome selection, navigation and expert gene annotation. Automatic functional annotation procedures of the platform have also been updated and we added several new tools for the functional annotation of genes and genomic regions. We finally focus on new tools and pipeline developed to perform comparative analyses on hundreds of genomes based on pangenome graphs. To date, MicroScope contains data for >11 800 microbial genomes, part of which are manually curated and maintained by microbiologists (>4500 personal accounts in September 2019). The platform enables collaborative work in a rich comparative genomic context and improves community-based curation efforts.
The annotation of genomes from NGS platforms needs to be automated and fully integrated. However, maintaining consistency and accuracy in genome annotation is a challenging problem because millions ...of protein database entries are not assigned reliable functions. This shortcoming limits the knowledge that can be extracted from genomes and metabolic models. Launched in 2005, the MicroScope platform (http://www.genoscope.cns.fr/agc/microscope) is an integrative resource that supports systematic and efficient revision of microbial genome annotation, data management and comparative analysis. Effective comparative analysis requires a consistent and complete view of biological data, and therefore, support for reviewing the quality of functional annotation is critical. MicroScope allows users to analyze microbial (meta)genomes together with post-genomic experiment results if any (i.e. transcriptomics, re-sequencing of evolved strains, mutant collections, phenotype data). It combines tools and graphical interfaces to analyze genomes and to perform the expert curation of gene functions in a comparative context. Starting with a short overview of the MicroScope system, this paper focuses on some major improvements of the Web interface, mainly for the submission of genomic data and on original tools and pipelines that have been developed and integrated in the platform: computation of pan-genomes and prediction of biosynthetic gene clusters. Today the resource contains data for more than 6000 microbial genomes, and among the 2700 personal accounts (65% of which are now from foreign countries), 14% of the users are performing expert annotations, on at least a weekly basis, contributing to improve the quality of microbial genome annotations.
MicroScope is an integrated platform dedicated to both the methodical updating of microbial genome annotation and to comparative analysis. The resource provides data from completed and ongoing genome ...projects (automatic and expert annotations), together with data sources from post-genomic experiments (i.e. transcriptomics, mutant collections) allowing users to perfect and improve the understanding of gene functions. MicroScope (http://www.genoscope.cns.fr/agc/microscope) combines tools and graphical interfaces to analyse genomes and to perform the manual curation of gene annotations in a comparative context. Since its first publication in January 2006, the system (previously named MaGe for Magnifying Genomes) has been continuously extended both in terms of data content and analysis tools. The last update of MicroScope was published in 2009 in the Database journal. Today, the resource contains data for >1600 microbial genomes, of which ∼300 are manually curated and maintained by biologists (1200 personal accounts today). Expert annotations are continuously gathered in the MicroScope database (∼50 000 a year), contributing to the improvement of the quality of microbial genomes annotations. Improved data browsing and searching tools have been added, original tools useful in the context of expert annotation have been developed and integrated and the website has been significantly redesigned to be more user-friendly. Furthermore, in the context of the European project Microme (Framework Program 7 Collaborative Project), MicroScope is becoming a resource providing for the curation and analysis of both genomic and metabolic data. An increasing number of projects are related to the study of environmental bacterial (meta)genomes that are able to metabolize a large variety of chemical compounds that may be of high industrial interest.
The use of comparative genomics for functional, evolutionary, and epidemiological studies requires methods to classify gene families in terms of occurrence in a given species. These methods usually ...lack multivariate statistical models to infer the partitions and the optimal number of classes and don't account for genome organization. We introduce a graph structure to model pangenomes in which nodes represent gene families and edges represent genomic neighborhood. Our method, named PPanGGOLiN, partitions nodes using an Expectation-Maximization algorithm based on multivariate Bernoulli Mixture Model coupled with a Markov Random Field. This approach takes into account the topology of the graph and the presence/absence of genes in pangenomes to classify gene families into persistent, cloud, and one or several shell partitions. By analyzing the partitioned pangenome graphs of isolate genomes from 439 species and metagenome-assembled genomes from 78 species, we demonstrate that our method is effective in estimating the persistent genome. Interestingly, it shows that the shell genome is a key element to understand genome dynamics, presumably because it reflects how genes present at intermediate frequencies drive adaptation of species, and its proportion in genomes is independent of genome size. The graph-based approach proposed by PPanGGOLiN is useful to depict the overall genomic diversity of thousands of strains in a compact structure and provides an effective basis for very large scale comparative genomics. The software is freely available at https://github.com/labgem/PPanGGOLiN.
Many bacteria in the environment have adapted to the presence of toxic heavy metals. Over the last 30 years, this heavy metal tolerance was the subject of extensive research. The bacterium ...Cupriavidus metallidurans strain CH34, originally isolated by us in 1976 from a metal processing factory, is considered a major model organism in this field because it withstands milli-molar range concentrations of over 20 different heavy metal ions. This tolerance is mostly achieved by rapid ion efflux but also by metal-complexation and -reduction. We present here the full genome sequence of strain CH34 and the manual annotation of all its genes. The genome of C. metallidurans CH34 is composed of two large circular chromosomes CHR1 and CHR2 of, respectively, 3,928,089 bp and 2,580,084 bp, and two megaplasmids pMOL28 and pMOL30 of, respectively, 171,459 bp and 233,720 bp in size. At least 25 loci for heavy-metal resistance (HMR) are distributed over the four replicons. Approximately 67% of the 6,717 coding sequences (CDSs) present in the CH34 genome could be assigned a putative function, and 9.1% (611 genes) appear to be unique to this strain. One out of five proteins is associated with either transport or transcription while the relay of environmental stimuli is governed by more than 600 signal transduction systems. The CH34 genome is most similar to the genomes of other Cupriavidus strains by correspondence between the respective CHR1 replicons but also displays similarity to the genomes of more distantly related species as a result of gene transfer and through the presence of large genomic islands. The presence of at least 57 IS elements and 19 transposons and the ability to take in and express foreign genes indicates a very dynamic and complex genome shaped by evolutionary forces. The genome data show that C. metallidurans CH34 is particularly well equipped to live in extreme conditions and anthropogenic environments that are rich in metals.
Adaptation by natural selection depends on the rates, effects and interactions of many mutations, making it difficult to determine what proportion of mutations in an evolving lineage are beneficial. ...Here we analysed 264 complete genomes from 12 Escherichia coli populations to characterize their dynamics over 50,000 generations. The populations that retained the ancestral mutation rate support a model in which most fixed mutations are beneficial, the fraction of beneficial mutations declines as fitness rises, and neutral mutations accumulate at a constant rate. We also compared these populations to mutation-accumulation lines evolved under a bottlenecking regime that minimizes selection. Nonsynonymous mutations, intergenic mutations, insertions and deletions are overrepresented in the long-term populations, further supporting the inference that most mutations that reached high frequency were favoured by selection. These results illuminate the shifting balance of forces that govern genome evolution in populations adapting to a new environment.
1 CEA, Institut de Génomique, Génoscope, 2 rue Gaston Crémieux, 91057 Évry, France
2 CEA, Institut de Génomique, Laboratoire de Génomique Comparative/CNRS UMR8030, Génoscope, 2 rue Gaston Crémieux, ...91057 Évry, France
3 Institut Pasteur, Intégration et Analyse Génomiques, 28 rue du Docteur Roux, 75724 Paris Cedex 15, France
4 Institut Pasteur, Génétique des Génomes Bactériens/CNRS URA2171, 28 rue du Docteur Roux, 75724 Paris Cedex 15, France
Comparative genomics is the cornerstone of identification of gene functions. The immense number of living organisms precludes experimental identification of functions except in a handful of model organisms. The bacterial domain is split into large branches, among which the Firmicutes occupy a considerable space. Bacillus subtilis has been the model of Firmicutes for decades and its genome has been a reference for more than 10 years. Sequencing the genome involved more than 30 laboratories, with different expertises, in a attempt to make the most of the experimental information that could be associated with the sequence. This had the expected drawback that the sequencing expertise was quite varied among the groups involved, especially at a time when sequencing genomes was extremely hard work. The recent development of very efficient, fast and accurate sequencing techniques, in parallel with the development of high-level annotation platforms, motivated the present resequencing work. The updated sequence has been reannotated in agreement with the UniProt protein knowledge base, keeping in perspective the split between the paleome (genes necessary for sustaining and perpetuating life) and the cenome (genes required for occupation of a niche, suggesting here that B. subtilis is an epiphyte). This should permit investigators to make reliable inferences to prepare validation experiments in a variety of domains of bacterial growth and development as well as build up accurate phylogenies.
Correspondence Antoine Danchin antoine.danchin{at}normalesup.org
Abbreviations: AdoMet, S -adenosylmethionine; CDS, coding sequence; IIMP, integral inner-membrane protein; MTR, methylthioribose; RGP, regions of genomic plasticity; ROS, reactive oxygen species
These authors contributed equally to this work.
The GenBank/EMBL/DDBJ accession number for the sequence reported in this paper is AL009126.
Four supplementary tables are available with the online version of this paper.
A new Escherichia coli virulent clonal group, O45:K1, belonging to the highly virulent subgroup B2₁ was recently identified in France, where it accounts for one-third of E. coli neonatal meningitis ...cases. Here we describe the sequence, epidemiology and function of the large plasmid harbored by strain S88, which is representative of the O45:K1 clonal group. Plasmid pS88 is 133,853 bp long and contains 144 protein-coding genes. It harbors three different iron uptake systems (aerobactin, salmochelin, and the sitABCD genes) and other putative virulence genes (iss, etsABC, ompTP, and hlyF). The pS88 sequence is composed of several gene blocks homologous to avian pathogenic E. coli plasmids pAPEC-O2-ColV and pAPEC-O1-ColBM. PCR amplification of 11 open reading frames scattered throughout the plasmid was used to investigate the distribution of pS88 and showed that a pS88-like plasmid is present in other meningitis clonal groups such as O18:K1, O1:K1, and O83:K1. A pS88-like plasmid was also found in avian pathogenic strains and human urosepsis strains belonging to subgroup B2₁. A variant of S88 cured of its plasmid displayed a marked loss of virulence relative to the wild-type strain in a neonatal rat model, with bacteremia more than 2 log CFU/ml lower. The salmochelin siderophore, a known meningovirulence factor, could not alone explain the plasmid's contribution to virulence, as a salmochelin mutant displayed only a minor fall in bacteremia (0.9 log CFU/ml). Thus, pS88 is a major virulence determinant related to avian pathogenic plasmids that has spread not only through meningitis clonal groups but also human urosepsis and avian pathogenic strains.
Large-scale rearrangements may be important in evolution because they can alter chromosome organization and gene expression in ways not possible through point mutations. In a long-term evolution ...experiment, twelve Escherichia coli populations have been propagated in a glucose-limited environment for over 25 years. We used whole-genome mapping (optical mapping) combined with genome sequencing and PCR analysis to identify the large-scale chromosomal rearrangements in clones from each population after 40,000 generations. A total of 110 rearrangement events were detected, including 82 deletions, 19 inversions, and 9 duplications, with lineages having between 5 and 20 events. In three populations, successive rearrangements impacted particular regions. In five populations, rearrangements affected over a third of the chromosome. Most rearrangements involved recombination between insertion sequence (IS) elements, illustrating their importance in mediating genome plasticity. Two lines of evidence suggest that at least some of these rearrangements conferred higher fitness. First, parallel changes were observed across the independent populations, with ~65% of the rearrangements affecting the same loci in at least two populations. For example, the ribose-utilization operon and the manB-cpsG region were deleted in 12 and 10 populations, respectively, suggesting positive selection, and this inference was previously confirmed for the former case. Second, optical maps from clones sampled over time from one population showed that most rearrangements occurred early in the experiment, when fitness was increasing most rapidly. However, some rearrangements likely occur at high frequency and may have simply hitchhiked to fixation. In any case, large-scale rearrangements clearly influenced genomic evolution in these populations.
Bacterial chromosomes are dynamic structures shaped by long histories of evolution. Among genomic changes, large-scale DNA rearrangements can have important effects on the presence, order, and expression of genes. Whole-genome sequencing that relies on short DNA reads cannot identify all large-scale rearrangements. Therefore, deciphering changes in the overall organization of genomes requires alternative methods, such as optical mapping. We analyzed the longest-running microbial evolution experiment (more than 25 years of evolution in the laboratory) by optical mapping, genome sequencing, and PCR analyses. We found multiple large genome rearrangements in all 12 independently evolving populations. In most cases, it is unclear whether these changes were beneficial themselves or, alternatively, hitchhiked to fixation with other beneficial mutations. In any case, many genome rearrangements accumulated over decades of evolution, providing these populations with genetic plasticity reminiscent of that observed in some pathogenic bacteria.