With the increasing number of sequenced genomes and their comparisons, the detection of orthologs is crucial for reliable functional annotation and evolutionary analyses of genes and species. Yet, ...the dynamic remodeling of genome content through gain, loss, transfer of genes, and segmental and whole-genome duplication hinders reliable orthology detection. Moreover, the lack of direct functional evidence and the questionable quality of some available genome sequences and annotations present additional difficulties to assess orthology. This article reviews the existing computational methods and their potential accuracy in the high-throughput era of genome sequencing and anticipates open questions in terms of methodology, reliability, and computation. Appropriate taxon sampling together with combination of methods based on similarity, phylogeny, synteny, and evolutionary knowledge that may help detecting speciation events appears to be the most accurate strategy. This review also raises perspectives on the potential determination of orthology throughout the whole species phylogeny.
Abstract
Reading, writing, publishing, and publicly presenting scientific works are vital for a young researcher's profile building and career development. Generally, the traditional educational ...curricula do not offer training possibilities to learn and practice how to prepare, write, and present scientific works. These are rather a part of lab meeting activities in research groups. The lack of such training is more critical in some developing countries because this adds to the rare opportunities to discuss and become involved in the exchanges on state of the art scientific literature. Here the authors relate their experience in introducing a weekly 1‐day lab meeting in the framework of two previously organized 3‐month courses on “Bioinformatics and Genome Analyses”. The main activities which are developed during these lab meetings include scientific literature follow up as well as preparing and presenting oral and written scientific reviews. These activities prove to be useful for a student's self‐confidence building, for enhancing their active participation during the lectures and practical sessions, as well as for the positive impact on running the whole course program. Incorporation of such lab meeting activities in the course program significantly improves the capacity building of the participants, their analytical and critical reading of scientific literature, as well as communication skills. In this work it is shown how to proceed with the different steps involved in the implementation of lab meeting activities, and to recommend their regular institution in similar courses.
Recent developments of sequencing technologies that allow the production of massive amounts of genomic and genotyping data have highlighted the need for synthetic data representation and pattern ...recognition methods that can mine and help discovering biologically meaningful knowledge included in such large data sets. Correspondence analysis (CA) is an exploratory descriptive method designed to analyze two-way data tables, including some measure of association between rows and columns. It constructs linear combinations of variables, known as factors. CA has been used for decades to study high-dimensional data, and remarkable inferences from large data tables were obtained by reducing the dimensionality to a few orthogonal factors that correspond to the largest amount of variability in the data. Herein, I review CA and highlight its use by considering examples in handling high-dimensional data that can be constructed from genomic and genetic studies. Examples in amino acid compositions of large sets of species (viruses, phages, yeast, and fungi) as well as an example related to pairwise shared orthologs in a set of yeast and fungal species, as obtained from their proteome comparisons, are considered. For the first time, results show striking segregations between yeasts and fungi as well as between viruses and phages. Distributions obtained from shared orthologs show clusters of yeast and fungal species corresponding to their phylogenetic relationships. A direct comparison with the principal component analysis method is discussed using a recently published example of genotyping data related to newly discovered traces of an ancient hominid that was compared to modern human populations in the search for ancestral similarities. CA offers more detailed results highlighting links between modern humans and the ancient hominid and their characterizations. Compared to the popular principal component analysis method, CA allows easier and more effective interpretation of results, particularly by the ability of relating individual patterns with their corresponding characteristic variables.
Genome data, with underlying new knowledge, are accumulating at exponential rate thanks to ever-improving sequencing technologies and the parallel development of dedicated efficient Bioinformatics ...methods and tools. Advanced Education in Bioinformatics and Genome Analyses is to a large extent not accessible to students in developing countries where endeavors to set up Bioinformatics courses concern most often only basic levels. Here, we report a pioneering pilot experience concerning the design and implementation, from scratch, of a three-months advanced and extensive course in Bioinformatics and Genome Analyses in the Institut Pasteur de Tunis. Most significantly the outcome of the course was upgrading the participants' skills in Bioinformatics and Genome Analyses to recognized international standards. Here we detail the different steps involved in the implementation of this course as well as the topics covered in the program. The description of this pilot experience might be helpful for the implementation of other similar educational projects, notably in developing countries, aiming to go beyond basics and providing young researchers with high-level skills.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Integration of mitochondrial DNA fragments into nuclear chromosomes (giving rise to nuclear DNA sequences of mitochondrial origin, or NUMTs) is an ongoing process that shapes nuclear genomes. In ...yeast this process depends on double-strand-break repair. Since NUMTs lack amplification and specific integration mechanisms, they represent the prototype of exogenous insertions in the nucleus. From sequence analysis of the genome of Homo sapiens, followed by sampling humans from different ethnic backgrounds, and chimpanzees, we have identified 27 NUMTs that are specific to humans and must have colonized human chromosomes in the last 4-6 million years. Thus, we measured the fixation rate of NUMTs in the human genome. Six such NUMTs show insertion polymorphism and provide a useful set of DNA markers for human population genetics. We also found that during recent human evolution, Chromosomes 18 and Y have been more susceptible to colonization by NUMTs. Surprisingly, 23 out of 27 human-specific NUMTs are inserted in known or predicted genes, mainly in introns. Some individuals carry a NUMT insertion in a tumor-suppressor gene and in a putative angiogenesis inhibitor. Therefore in humans, but not in yeast, NUMT integrations preferentially target coding or regulatory sequences. This is indeed the case for novel insertions associated with human diseases and those driven by environmental insults. We thus propose a mutagenic phenomenon that may be responsible for a variety of genetic diseases in humans and suggest that genetic or environmental factors that increase the frequency of chromosome breaks provide the impetus for the continued colonization of the human genome by mitochondrial DNA.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
The evolutionary characterization of species and lifestyles at global levels is nowadays a subject of considerable interest, particularly with the availability of many complete genomes. Are there ...specific properties associated with lifestyles and phylogenies? What are the underlying evolutionary trends? One of the simplest analyses to address such questions concerns characterization of proteomes at the amino acids composition level.
In this work, amino acid compositions of a large set of 208 proteomes, with significant number of representatives from the three phylogenetic domains and different lifestyles are analyzed, resorting to an appropriate multidimensional method: Correspondence analysis. The analysis reveals striking discrimination between eukaryotes, prokaryotic mesophiles and hyperthemophiles-themophiles, following amino acid usage. In sharp contrast, no similar discrimination is observed for psychrophiles. The observed distributional properties are compared with various inferred chronologies for the recruitment of amino acids into the genetic code. Such comparisons reveal correlations between the observed segregations of species following amino acid usage, and the separation of amino acids following early or late recruitment.
A simple description of proteomes according to amino acid compositions reveals striking signatures, with sharp segregations or on the contrary non-discriminations following phylogenies and lifestyles. The distribution of species, following amino acid usage, exhibits a discrimination between high GC-high optimal growth temperatures and low GC-moderate temperatures characteristics. This discrimination appears to coincide closely with the separation of amino acids following their inferred early or late recruitment into the genetic code. Taken together the various results provide a consistent picture for the evolution of proteomes, in terms of amino acid usage.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Mycobacterium ulcerans is found in aquatic ecosystems and causes Buruli ulcer in humans, a neglected but devastating necrotic disease of subcutaneous tissue that is rampant throughout West and ...Central Africa. Here, we report the complete 5.8-Mb genome sequence of M. ulcerans and show that it comprises two circular replicons, a chromosome of 5632 kb and a virulence plasmid of 174 kb. The plasmid is required for production of the polyketide toxin mycolactone, which provokes necrosis. Comparisons with the recently completed 6.6-Mb genome of Mycobacterium marinum revealed >98% nucleotide sequence identity and genome-wide synteny. However, as well as the plasmid, M. ulcerans has accumulated 213 copies of the insertion sequence IS2404, 91 copies of IS2606, 771 pseudogenes, two bacteriophages, and multiple DNA deletions and rearrangements. These data indicate that M. ulcerans has recently evolved via lateral gene transfer and reductive evolution from the generalist, more rapid-growing environmental species M. marinum to become a niche-adapted specialist. Predictions based on genome inspection for the production of modified mycobacterial virulence factors, such as the highly abundant phthiodiolone lipids, were confirmed by structural analyses. Similarly, 11 protein-coding sequences identified as M. ulcerans-specific by comparative genomics were verified as such by PCR screening a diverse collection of 33 strains of M. ulcerans and M. marinum. This work offers significant insight into the biology and evolution of mycobacterial pathogens and is an important component of international efforts to counter Buruli ulcer.
Reading, writing, publishing, and publicly presenting scientific works are vital for a young researcher's profile building and career development. Generally, the traditional educational curricula do ...not offer training possibilities to learn and practice how to prepare, write, and present scientific works. These are rather a part of lab meeting activities in research groups. The lack of such training is more critical in some developing countries because this adds to the rare opportunities to discuss and become involved in the exchanges on state of the art scientific literature. Here the authors relate their experience in introducing a weekly 1‐day lab meeting in the framework of two previously organized 3‐month courses on “Bioinformatics and Genome Analyses”. The main activities which are developed during these lab meetings include scientific literature follow up as well as preparing and presenting oral and written scientific reviews. These activities prove to be useful for a student's self‐confidence building, for enhancing their active participation during the lectures and practical sessions, as well as for the positive impact on running the whole course program. Incorporation of such lab meeting activities in the course program significantly improves the capacity building of the participants, their analytical and critical reading of scientific literature, as well as communication skills. In this work it is shown how to proceed with the different steps involved in the implementation of lab meeting activities, and to recommend their regular institution in similar courses.
Incorporation of lab meeting activities in “Bioinformatics and Genomics” courses significantly improves the capacity building of the participants, their analytical and critical reading of scientific literature, as well as their communication skills. Such activities include scientific literature follow up as well as preparing and presenting oral and written scientific reviews and prove to be useful for a student's self‐confidence building. It is shown how to proceed and recommended to introduce such activities in similar courses.
Abstract
Transfer of fragments of mtDNA to the nuclear genome is a general phenomenon that gives rise to NUMTs (NUclear sequences of MiTochondrial origin). We present here the first comparative ...analysis of the NUMT content of entirely sequenced species belonging to a monophyletic group, the hemiascomycetous yeasts (Candida glabrata, Kluyveromyces lactis, Kluyveromyces thermotolerans, Debaryomyces hansenii and Yarrowia lipolytica, along with the updated NUMT content of Saccharomyces cerevisiae). This study revealed a huge diversity in NUMT number and organization across the six species. Debaryomyces hansenii harbors the highest number of NUMTs (145), half of which are distributed in numerous large mosaics of up to eight NUMTs arising from multiple noncontiguous mtDNA fragments inserted at the same chromosomal locus. Most NUMTs, in all species, are found within intergenic regions including seven NUMTs in pseudogenes. However, five NUMTs overlap a gene, suggesting a positive impact of NUMTs on protein evolution. Contrary to the other species, K. lactis and K. thermotolerans harbor only a few diverged NUMTs, suggesting that mitochondrial transfer to the nuclear genome has decreased or ceased in these phylogenetic branches. The dynamics of NUMT acquisition and loss are illustrated here by their species-specific distribution.