The origin of 'orphan' genes, species-specific sequences that lack detectable homologues, has remained mysterious since the dawn of the genomic era. There are two dominant explanations for orphan ...genes: complete sequence divergence from ancestral genes, such that homologues are not readily detectable; and de novo emergence from ancestral non-genic sequences, such that homologues genuinely do not exist. The relative contribution of the two processes remains unknown. Here, we harness the special circumstance of conserved synteny to estimate the contribution of complete divergence to the pool of orphan genes. By separately comparing yeast, fly and human genes to related taxa using conservative criteria, we find that complete divergence accounts, on average, for at most a third of eukaryotic orphan and taxonomically restricted genes. We observe that complete divergence occurs at a stable rate within a phylum but at different rates between phyla, and is frequently associated with gene shortening akin to pseudogenization.
Recent evidence demonstrates that novel protein-coding genes can arise de novo from non-genic loci. This evolutionary innovation is thought to be facilitated by the pervasive translation of non-genic ...transcripts, which exposes a reservoir of variable polypeptides to natural selection. Here, we systematically characterize how these de novo emerging coding sequences impact fitness in budding yeast. Disruption of emerging sequences is generally inconsequential for fitness in the laboratory and in natural populations. Overexpression of emerging sequences, however, is enriched in adaptive fitness effects compared to overexpression of established genes. We find that adaptive emerging sequences tend to encode putative transmembrane domains, and that thymine-rich intergenic regions harbor a widespread potential to produce transmembrane domains. These findings, together with in-depth examination of the de novo emerging YBR196C-A locus, suggest a novel evolutionary model whereby adaptive transmembrane polypeptides emerge de novo from thymine-rich non-genic regions and subsequently accumulate changes molded by natural selection.
De novo genes, that is, protein-coding genes originating from previously noncoding sequence, have gone from being considered impossibly unlikely to being recognized as an important source of genetic ...novelty in eukaryotic genomes. It is clear that de novo gene evolution is a rare but consistent feature of eukaryotic genomes, being detected in every genome studied. However, different studies often use different computational methods, and the numbers and identities of the detected genes vary greatly. Here we present a coherent protocol for the computational identification of de novo genes by comparative genomics. The method described uses homology searches, identification of syntenic regions, and ancestral sequence reconstruction to produce high-confidence candidates with robust evidence of de novo emergence. It is designed to be easily applicable given the basic knowledge of bioinformatic tools and scalable so that it can be applied on large and small datasets.
High-throughput sequencing (HTS) technologies and bioinformatic analyses are of growing interest to be used as a routine diagnostic tool in the field of plant viruses. The reliability of HTS ...workflows from sample preparation to data analysis and results interpretation for plant virus detection and identification must be evaluated (verified and validated) to approve this tool for diagnostics. Many different extraction methods, library preparation protocols, and sequence and bioinformatic pipelines are available for virus sequence detection. To assess the performance of plant virology diagnostic laboratories in using the HTS of ribosomal RNA depleted total RNA (ribodepleted totRNA) as a diagnostic tool, we carried out an interlaboratory comparison study in which eight participants were required to use the same samples, (RNA) extraction kit, ribosomal RNA depletion kit, and commercial sequencing provider, but also their own bioinformatics pipeline, for analysis. The accuracy of virus detection ranged from 65% to 100%. The false-positive detection rate was very low and was related to the misinterpretation of results as well as to possible cross-contaminations in the lab or sequencing provider. The bioinformatic pipeline used by each laboratory influenced the correct detection of the viruses of this study. The main difficulty was the detection of a novel virus as its sequence was not available in a publicly accessible database at the time. The raw data were reanalysed using Virtool to assess its ability for virus detection. All virus sequences were detected using Virtool in the different pools. This study revealed that the ribodepletion target enrichment for sample preparation is a reliable approach for the detection of plant viruses with different genomes. A significant level of virology expertise is needed to correctly interpret the results. It is also important to improve and complete the reference data.
Abstract
Intergenic genomic regions have essential regulatory and structural roles that impose constraints on their sequences. But regions that do not currently encode proteins also carry the ...potential to do so in the future. De novo gene emergence, the evolution of novel genes out of previously noncoding sequences has now been established as a potent force for genomic novelty. Recently, it was shown that intergenic regions in the genome of Saccharomyces cerevisiae harbor pervasive cryptic potential to, if theoretically translated, form transmembrane domains (TM domains) more frequently than expected by chance given their nucleotide composition, a property that we refer to as TM-forming enrichment. The source and biological relevance of this property is unknown. Here, we expand the investigation into the TM-forming potential of intergenic regions to the entire Saccharomycotina budding yeast subphylum, in an effort to explain this property and understand its importance. We find pervasive but variable enrichment in TM-forming potential across the subphylum regardless of the composition and average size of intergenic regions. This cryptic property is evenly spread across the genome, cannot be explained by the hydrophobic content of the sequence, and does not appear to localize to regions containing regulatory motifs. This TM-forming enrichment specifically, and not the actual TM-forming potential, is associated, across genomes, with more TM domains in evolutionarily young genes. Our findings shed light on this newly discovered feature of yeast genomes and constitute a first step toward understanding its evolutionary importance.
Small open reading frames (sORFs) can encode functional “microproteins” that perform crucial biological tasks. However, their size makes them less amenable to genomic analysis, and their origins and ...conservation are poorly understood. Given their short length, it is plausible that some of these functional microproteins have recently originated entirely de novo from noncoding sequences. Here we sought to identify such cases in the human lineage by reconstructing the evolutionary origins of human microproteins previously found to have measurable, statistically significant fitness effects. By tracing the formation of each ORF and its transcriptional activation, we show that novel microproteins with significant phenotypic effects have emerged de novo throughout animal evolution, including two after the human-chimpanzee split. Notably, traditional methods for assessing coding potential would miss most of these cases. This evidence demonstrates that the functional potential intrinsic to sORFs can be relatively rapidly and frequently realized through de novo gene emergence.
Display omitted
•We estimate the evolutionary origins of functional human microproteins•Some are novel, having originated entirely de novo from noncoding sequences•These mostly lack sequence signals of conservation and selection•Many more novel ones could exist and escape detection
Human microproteins encoded by small ORFs have been found to be functional. By comparing the corresponding sequences across vertebrate genomes, Vakirlis et al. show that a number of these originated “from scratch” from noncoding sequences, including two very recent cases unique to humans. These cases demonstrate the rapid evolution of genetic novelty.
New genes, with novel protein functions, can evolve "from scratch" out of intergenic sequences. These de novo genes can integrate the cell's genetic network and drive important phenotypic ...innovations. Therefore, identifying de novo genes and understanding how the transition from noncoding to coding occurs are key problems in evolutionary biology. However, identifying de novo genes is a difficult task, hampered by the presence of remote homologs, fast evolving sequences and erroneously annotated protein coding genes. To overcome these limitations, we developed a procedure that handles the usual pitfalls in de novo gene identification and predicted the emergence of 703 de novo gene candidates in 15 yeast species from 2 genera whose phylogeny spans at least 100 million years of evolution. We validated 85 candidates by proteomic data, providing new translation evidence for 25 of them through mass spectrometry experiments. We also unambiguously identified the mutations that enabled the transition from noncoding to coding for 30 Saccharomyces de novo genes. We established that de novo gene origination is a widespread phenomenon in yeasts, only a few being ultimately maintained by selection. We also found that de novo genes preferentially emerge next to divergent promoters in GC-rich intergenic regions where the probability of finding a fortuitous and transcribed ORF is the highest. Finally, we found a more than 3-fold enrichment of de novo genes at recombination hot spots, which are GC-rich and nucleosome-free regions, suggesting that meiotic recombination contributes to de novo gene emergence in yeasts.
•Downregulation of CaPCaP also results in low PVY replication in pepper protoplasts.•Overexpression of CaPCaP affects positively PVY accumulation in plant level.•A novel protocol for pepper ...protoplast isolation and transfection was developed.
The Plasma membrane Cation binding Protein 1 (PCaP1) has been shown to be important for the intra-cellular movement of two members of the Potyvirus genus in arabidopsis and tobacco plants. In this study, the orthologous PCaP1 gene of pepper (Capsicum annuum) was examined for its role in the accumulation of Potato virus Y, type member of the Potyvirus. Downregulation of C. annuum PCaP (CaPCaP) through tobacco rattle virus-induced gene silencing, resulted in lower accumulation of potato virus Y (PVY) in pepper plants. Using an improved pepper protoplast isolation protocol, we showed that knockdown of CaPCaP negatively affected PVY accumulation at the within-cell level in pepper in contrast with the turnip mosaic virus-arabidopsis pathosystem. Conversely, following overexpression of CaPCaP, the accumulation of PVY at the systemic level was increased. The results provide further knowledge on the role of PCaP in the potyvirus infection process and reveal differences of its action among different pathosystems.