Orphan genes, lacking detectable homologs in outgroup species, typically represent 10–30% of eukaryotic genomes. Efforts to find the source of these young genes indicate that de novo emergence from ...non-coding DNA may in part explain their prevalence. Here, we investigate the roots of orphan gene emergence in the
Drosophila
genus. Across the annotated proteomes of twelve species, we find 6297 orphan genes within 4953 taxon-specific clusters of orthologs. By inferring the ancestral DNA as non-coding for between 550 and 2467 (8.7–39.2%) of these genes, we describe for the first time how de novo emergence contributes to the abundance of clade-specific
Drosophila
genes. In support of them having functional roles, we show that de novo genes have robust expression and translational support. However, the distinct nucleotide sequences of de novo genes, which have characteristics intermediate between intergenic regions and conserved genes, reflect their recent birth from non-coding DNA. We find that de novo genes encode more disordered proteins than both older genes and intergenic regions. Together, our results suggest that gene emergence from non-coding DNA provides an abundant source of material for the evolution of new proteins. Following gene birth, gradual evolution over large evolutionary timescales moulds sequence properties towards those of conserved genes, resulting in a continuum of properties whose starting points depend on the nucleotide sequences of an initial pool of novel genes.
Transposable elements (TEs) are mobile genetic sequences, which can cause the accumulation of genomic damage in the lifetime of an organism. The regulation of TEs, for instance via the piRNA‐pathway, ...is an important mechanism to protect the integrity of genomes, especially in the germ‐line where mutations can be transmitted to offspring. In eusocial insects, soma and germ‐line are divided among worker and reproductive castes, so one may expect caste‐specific differences in TE regulation to exist. To test this, we compared whole‐genome levels of repeat element transcription in the fat body of female workers, kings and five different queen stages of the higher termite, Macrotermes natalensis. In this species, queens can live over 20 years, maintaining near maximum reproductive output, while sterile workers only live weeks. We found a strong, positive correlation between TE expression and the expression of neighbouring genes in all castes. However, we found substantially higher TE activity in workers than in reproductives. Furthermore, TE expression did not increase with age in queens, despite a sevenfold increase in overall gene expression, due to a significant upregulation of the piRNA‐pathway in 20‐year‐old queens. Our results suggest a caste‐ and age‐specific regulation of the piRNA‐pathway has evolved in higher termites that is analogous to germ‐line‐specific activity in solitary organisms. In the fat body of these termite queens, an important metabolic tissue for maintaining their extreme longevity and reproductive output, an efficient regulation of TEs likely protects genome integrity, thus further promoting reproductive fitness even at high age.
Over the past decade, evidence has accumulated that new protein‐coding genes can emerge de novo from previously non‐coding DNA. Most studies have focused on large scale computational predictions of ...de novo protein‐coding genes across a wide range of organisms. In contrast, experimental data concerning the folding and function of de novo proteins are scarce. This might be due to difficulties in handling de novo proteins in vitro, as most are short and predicted to be disordered. Here, we propose a guideline for the effective expression of eukaryotic de novo proteins in Escherichia coli. We used 11 sequences from Drosophila melanogaster and 10 from Homo sapiens, that are predicted de novo proteins from former studies, for heterologous expression. The candidate de novo proteins have varying secondary structure and disorder content. Using multiple combinations of purification tags, E. coli expression strains, and chaperone systems, we were able to increase the number of solubly expressed putative de novo proteins from 30% to 62%. Our findings indicate that the best combination for expressing putative de novo proteins in E. coli is a GST‐tag with T7 Express cells and co‐expressed chaperones. We found that, overall, proteins with higher predicted disorder were easier to express.
Statement
Today, we know that proteins do not only evolve by duplication and divergence of existing proteins but also arise from previously non‐coding DNA. These proteins are called de novo proteins. Their properties are still poorly understood and their experimental analysis faces major obstacles. Here, we aim to present a starting point for soluble expression of de novo proteins with the help of chaperones and thereby enable further characterization.
Comparative genomic studies have repeatedly shown that new protein-coding genes can emerge de novo from noncoding DNA. Still unknown is how and when the structures of encoded de novo proteins emerge ...and evolve. Combining biochemical, genetic and evolutionary analyses, we elucidate the function and structure of goddard, a gene which appears to have evolved de novo at least 50 million years ago within the Drosophila genus. Previous studies found that goddard is required for male fertility. Here, we show that Goddard protein localizes to elongating sperm axonemes and that in its absence, elongated spermatids fail to undergo individualization. Combining modelling, NMR and circular dichroism (CD) data, we show that Goddard protein contains a large central α-helix, but is otherwise partially disordered. We find similar results for Goddard's orthologs from divergent fly species and their reconstructed ancestral sequences. Accordingly, Goddard's structure appears to have been maintained with only minor changes over millions of years.
The ecological success of social Hymenoptera (ants, bees, wasps) depends on the division of labour between the queen and workers. Each caste exhibits highly specialized morphology, behaviour, and ...life‐history traits, such as lifespan and fecundity. Despite strong defences against alien intruders, insect societies are vulnerable to social parasites, such as workerless inquilines or slave‐making ants. Here, we investigate whether gene expression varies in parallel ways between lifestyles (slave‐making versus host ants) across five independent origins of ant slavery in the “Formicoxenus‐group” of the ant tribe Crematogastrini. As caste differences are often less pronounced in slave‐making ants than in nonparasitic ants, we also compare caste‐specific gene expression patterns between lifestyles. We demonstrate a substantial overlap in expression differences between queens and workers across taxa, irrespective of lifestyle. Caste affects the transcriptomes much more profoundly than lifestyle, as indicated by 37 times more genes being linked to caste than to lifestyle and by multiple caste‐associated modules of coexpressed genes with strong connectivity. However, several genes and one gene module are linked to slave‐making across the independent origins of this parasitic lifestyle, pointing to some evolutionary convergence. Finally, we do not find evidence for an interaction between caste and lifestyle, indicating that caste differences in gene expression remain consistent even when species switch to a parasitic lifestyle. Our findings strongly support the existence of a core set of genes whose expression is linked to the queen and worker caste in this ant taxon, as proposed by the “genetic toolkit” hypothesis.
Over long time scales, protein evolution is characterized by modular rearrangements of protein domains. Such rearrangements are mainly caused by gene duplication, fusion and terminal losses. To ...better understand domain emergence mechanisms we investigated 32 insect genomes covering a speciation gradient ranging from ~ 2 to ~ 390 mya. We use established domain models and foldable domains delineated by hydrophobic cluster analysis (HCA), which does not require homologous sequences, to also identify domains which have likely arisen de novo, that is, from previously noncoding DNA. Our results indicate that most novel domains emerge terminally as they originate from ORF extensions while fewer arise in middle arrangements, resulting from exonization of intronic or intergenic regions. Many novel domains rapidly migrate between terminal or middle positions and single‐ and multidomain arrangements. Young domains, such as most HCA‐defined domains, are under strong selection pressure as they show signals of purifying selection. De novo domains, linked to ancient domains or defined by HCA, have higher degrees of intrinsic disorder and disorder‐to‐order transition upon binding than ancient domains. However, the corresponding DNA sequences of the novel domains of de novo origins could only rarely be found in sister genomes. We conclude that novel domains are often recruited by other proteins and undergo important structural modifications shortly after their emergence, but evolve too fast to be characterized by cross‐species comparisons alone.
We identified and characterized novel insect‐specific protein domains using the Pfam database and hydrophobic cluster analysis (HCA). The 50 Pfam and 177 novel HCA domains were grouped by age, emergence mechanisms and positions in their arrangements. Most domains were created terminally (zig‐zag) becoming less disordered over time (blue, yellow, red boxes), by intronic exonization or as single genes.
Genome sequencing enables answering fundamental questions about the genetic basis of adaptation, population structure and epigenetic mechanisms. Yet, we usually need a suitable reference genome for ...mapping population‐level resequencing data. In some model systems, multiple reference genomes are available, giving the challenging task of determining which reference genome best suits the data. Here, we compared the use of two different reference genomes for the three‐spined stickleback (Gasterosteus aculeatus), one novel genome derived from a European gynogenetic individual and the published reference genome of a North American individual. Specifically, we investigated the impact of using a local reference versus one generated from a distinct lineage on several common population genomics analyses. Through mapping genome resequencing data of 60 sticklebacks from across Europe and North America, we demonstrate that genetic distance among samples and the reference genomes impacts downstream analyses. Using a local reference genome increased mapping efficiency and genotyping accuracy, effectively retaining more and better data. Despite comparable distributions of the metrics generated across the genome using SNP data (i.e. π, Tajima's D and FST), window‐based statistics using different references resulted in different outlier genes and enriched gene functions. A marker‐based analysis of DNA methylation distributions had a comparably high overlap in outlier genes and functions, yet with distinct differences depending on the reference genome. Overall, our results highlight how using a local reference genome decreases reference bias to increase confidence in downstream analyses of the data. Such results have significant implications in all reference‐genome‐based population genomic analyses.
Global climate change can influence organismic interactions like those between hosts and parasites. Rising temperatures may exacerbate the exploitation of hosts by parasites, especially in ...ectothermic systems. The metabolic activity of ectotherms is strongly linked to temperature and generally increases when temperatures rise. We hypothesized that temperature change in combination with parasite infection interferes with the host's immunometabolism. We used a parasite, the avian cestode Schistocephalus solidus, which taps most of its resources from the metabolism of an ectothermic intermediate host, the three‐spined stickleback. We experimentally exposed sticklebacks to this parasite, and studied liver transcriptomes 50 days after infection at 13°C and 24°C, to assess their immunometabolic responses. Furthermore, we monitored fitness parameters of the parasite and examined immunity and body condition of the sticklebacks at 13°C, 18°C and 24°C after 36, 50 and 64 days of infection. At low temperatures (13°C), S. solidus growth was constrained, presumably also by the more active stickleback's immune system, thus delaying its infectivity for the final host to 64 days. Warmer temperature (18°C and 24°C) enhanced S. solidus growth, and it became infective to the final host already after 36 days. Overall, S. solidus produced many more viable offspring after development at elevated temperatures. In contrast, stickleback hosts had lower body conditions, and their immune system was less active at warm temperature. The stickleback's liver transcriptome revealed that mainly metabolic processes were differentially regulated between temperatures, whereas immune genes were not strongly affected. Temperature effects on gene expression were strongly enhanced in infected sticklebacks, and even in exposed‐but‐not‐infected hosts. These data suggest that the parasite exposure in concert with rising temperature, as to be expected with global climate change, shifted the host's immunometabolism, thus providing nutrients for the enormous growth of the parasite and, at the same time suppressing immune defence.
Climate change effects on parasite are a major concern. Experimental exposure of a host parasite pair to warm and cold conditions, revealed prominent benefits for the growth of the parasite in warm conditions, while the host growth was impaired. The parasite achieves this by manipulation of gene expression profiles in the hosts liver, thus mobilizing nutrient supply from the hosts energy storage. As a result, the parasite produced much more viable offspring under warm conditions, suggesting that global warming is beneficial to parasites, while it is detrimental to cold blooded hosts.
Characterizing the adaptive landscapes that encompass the emergence of novel enzyme functions can provide molecular insights into both enzymatic and evolutionary mechanisms. Here, we combine ...ancestral protein reconstruction with biochemical, structural and mutational analyses to characterize the functional evolution of methyl-parathion hydrolase (MPH), an organophosphate-degrading enzyme. We identify five mutations that are necessary and sufficient for the evolution of MPH from an ancestral dihydrocoumarin hydrolase. In-depth analyses of the adaptive landscapes encompassing this evolutionary transition revealed that the mutations form a complex interaction network, defined in part by higher-order epistasis, that constrained the adaptive pathways available. By also characterizing the adaptive landscapes in terms of their functional activities towards three additional organophosphate substrates, we reveal that subtle differences in the polarity of the substrate substituents drastically alter the network of epistatic interactions. Our work suggests that the mutations function collectively to enable substrate recognition via subtle structural repositioning.
Social insects show an extreme degree of phenotypic plasticity. In highly eusocial species, this manifests in the generation of distinct castes with extreme differences in both morphology and life ...span. The molecular basis of these differences is highly entangled and not fully understood, but several recent studies demonstrated that insulin/insulin-like growth factor signaling (IIS) is one of the key pathways. Here, we investigate the molecular evolution of insect insulin receptors (InRs), which are membrane-bound dimers that enable IIS by relaying extracellular signals to intracellular signaling cascades. Classic models of invertebrate IIS include only one InR gene, but some recent studies on less commonly studied insects have found two InRs, which act in an antagonistic manner to facilitate polyphenism in at least one documented case. We search 22 arthropod genomes and identify several InR copies and their evolutionary origin that were lacking from previous annotations. Phylogenetic analysis shows that the two insect InR genes date back at least 400 million years to a common ancestor of winged insects. Most notably, we also identified the evolutionary origin of a third InR copy that is unique to the clade of Blattodea, just before therein the eusocial termites evolved. One of the InR paralogs consistently shows caste-biased expression in all three termites, which strongly suggests a role in caste differentiation. These results have important ramifications for past and future InR inhibition/InR knockdown experiments in insects and they provide a set of key genes regulating life span and morphology in termite castes.