Populations distributed across a broad thermal cline are instrumental in addressing adaptation to increasing temperatures under global warming. Using a space‐for‐time substitution design, we tested ...for parallel adaptation to warm temperatures along two independent thermal clines in Zostera marina, the most widely distributed seagrass in the temperate Northern Hemisphere. A North–South pair of populations was sampled along the European and North American coasts and exposed to a simulated heatwave in a common‐garden mesocosm. Transcriptomic responses under control, heat stress and recovery were recorded in 99 RNAseq libraries with ~13 000 uniquely annotated, expressed genes. We corrected for phylogenetic differentiation among populations to discriminate neutral from adaptive differentiation. The two southern populations recovered faster from heat stress and showed parallel transcriptomic differentiation, as compared with northern populations. Among 2389 differentially expressed genes, 21 exceeded neutral expectations and were likely involved in parallel adaptation to warm temperatures. However, the strongest differentiation following phylogenetic correction was between the three Atlantic populations and the Mediterranean population with 128 of 4711 differentially expressed genes exceeding neutral expectations. Although adaptation to warm temperatures is expected to reduce sensitivity to heatwaves, the continued resistance of seagrass to further anthropogenic stresses may be impaired by heat‐induced downregulation of genes related to photosynthesis, pathogen defence and stress tolerance.
Recent investigations on metazoan transcription factors (TFs) indicate that single-gene duplication events and the gain and loss of protein domains are 2 crucial factors in shaping their ...protein-protein interaction networks. Plant genomes, on the other hand, have a history of polyploidy and whole-genome duplications (WGDs), and thus, their study helps to understand whether WGDs have also had a significant influence on protein network evolution. Here we investigate the evolution of the interaction network in the well-studied MADS domain MIKC-type proteins, a TF family which plays an important role in both the vegetative and the reproductive phases of plant life. We combine phylogenetic reconstruction, protein domain analysis, and interaction data from different species. We show that, unlike previously analyzed interaction networks, the MIKC-type protein network displays a characteristic topology, with overall high inter-subfamily connectivity, shared interactors between paralogs, and conservation of interaction patterns across species. The evaluation of the number of MIKC-type proteins at key time points throughout the evolution of land plants in the lineage leading to Arabidopsis suggested that most duplicates were retained after each round of WGD. We provide evidence that an initial network, formed by 9-11 homodimerizing proteins interacting with each other, existed in the common ancestor of all seed plants. This basic structure has been conserved after each round of WGD, adding layers of paralogs with similar interaction patterns. We thus present the first model where we can show that a network of eukaryotic TFs has evolved via rounds of WGD. Furthermore, we found that in subfamilies in which the K domain is most diverged, the interactions with other subfamilies have been largely lost. We discuss the possibility that such a high proportion of genes were retained after each WGD because of their capacity to form higher order complexes involving proteins from different subfamilies. The simultaneous duplications allowed for the conservation of the quantitative balance between the constituents and facilitated sub- and neofunctionalization through differential expression of whole units.
The genomic era has revealed that the large repertoire of observed animal phenotypes is dependent on changes in the expression patterns of a finite number of genes, which are mediated by a plethora ...of transcription factors (TFs) with distinct specificities. The dimerization of TFs can also increase the complexity of a genetic regulatory network manifold, by combining a small number of monomers into dimers with distinct functions. Therefore, studying the evolution of these dimerizing TFs is vital for understanding how complexity increased during animal evolution. We focus on the second largest family of dimerizing TFs, the basic-region leucine zipper (bZIP), and infer when it expanded and how bZIP DNA-binding and dimerization functions evolved during the major phases of animal evolution. Specifically, we classify the metazoan bZIPs into 19 families and confirm the ancient nature of at least 13 of these families, predating the split of the cnidaria. We observe fixation of a core dimerization network in the last common ancestor of protostomes-deuterostomes. This was followed by an expansion of the number of proteins in the network, but no major dimerization changes in interaction partners, during the emergence of vertebrates. In conclusion, the bZIPs are an excellent model with which to understand how DNA binding and protein interactions of TFs evolved during animal evolution.
The wealth of available genomic data presents an unrivaled opportunity to study the molecular basis of evolution. Studies on gene family expansions and site-dependent analyses have already helped ...establish important insights into how proteins facilitate adaptation. However, efforts to conduct full-scale cross-genomic comparisons between species are challenged by both growing amounts of data and the inherent difficulty in accurately inferring homology between deeply rooted species. Proteins, in comparison, evolve by means of domain rearrangements, a process more amenable to study given the strength of profile-based homology inference and the lower rates with which rearrangements occur. However, adapting to a constantly changing environment can require molecular modulations beyond reach of rearrangement alone. Here, we explore rates and functional implications of novel domain emergence in contrast to domain gain and loss in 20 arthropod species of the pancrustacean clade. Emerging domains are more likely disordered in structure and spread more rapidly within their genomes than established domains. Furthermore, although domain turnover occurs at lower rates than gene family turnover, we find strong evidence that the emergence of novel domains is foremost associated with environmental adaptation such as abiotic stress response. The results presented here illustrate the simplicity with which domain-based analyses can unravel key players of nature's adaptational machinery, complementing the classical site-based analyses of adaptation.
The main mechanisms shaping the modular evolution of proteins are gene duplication, fusion and fission, recombination and loss of fragments. While a large body of research has focused on duplications ...and fusions, we concentrated, in this study, on how domains are lost. We investigated motif databases and introduced a measure of protein similarity that is based on domain arrangements. Proteins are represented as strings of domains and comparison was based on the classic dynamic alignment scheme. We found that domain losses and duplications were more frequent at the ends of proteins. We showed that losses can be explained by the introduction of start and stop codons which render the terminal domains nonfunctional, such that further shortening, until the whole domain is lost, is not evolutionarily selected against. We demonstrated that domains which also occur as single‐domain proteins are less likely to be lost at the N terminus and in the middle, than at the C terminus. We conclude that fission/fusion events with single‐domain proteins occur mostly at the C terminus. We found that domain substitutions are rare, in particular in the middle of proteins.We also showed that many cases of substitutions or losses result from erroneous annotations, but we were also able to find courses of evolutionary events where domains vanish over time. This is explained by a case study on the bacterial formate dehydrogenases.
Proteins are composed of domains, which are conserved evolutionary units that often also correspond to functional units and can frequently be detected with reasonable reliability using computational ...methods. Most proteins consist of two or more domains, giving rise to a variety of combinations of domains. Another level of complexity arises because proteins themselves can form complexes with small molecules, nucleic acids and other proteins. The networks of both domain combinations and protein interactions can be conceptualised as graphs, and these graphs can be analysed conveniently by computational methods. In this review we summarise facts and hypotheses about the evolution of domains in multi-domain proteins and protein complexes, and the tools and data resources available to study them.
The figure-to-structure maps for all uniquely folding sequences of short hydrophobic polar (HP) model proteins on a square lattice is analyzed to investigate aspects considered relevant to evolution. ...By ranking structures by their frequencies, few very frequent and many rare structures are found. The distribution can be empirically described by a generalized Zipf's law. All structures are relatively compact, yet the most compact ones are rare. Most sequences falling to the same structure belong to "neutral nets." These graphs in sequence space are connected by point mutations and centered around prototype sequences, which tolerate the largest number (up to 55%) of neutral mutations. Profiles have been derived from these homologous sequences. Frequent structures conserve hydrophobic cores only while rare ones are sensitive to surface mutations as well. Shape space covering, i.e., the ability to transform any structure into most others with few point mutations, is very unlikely. It is concluded that many characteristic features of the sequence-to-structure map of real proteins, such as the dominance of few folds, can be explained by the simple HP model. In analogy to protein families, nets are dense and well separated in sequence space. Potential implications in better understanding the evolution of proteins and applications to improving database searches are discussed.
The leucine zipper is a dimerization domain occurring mostly in regulatory and thus in many oncogenic proteins. The leucine repeat in the sequence has been traditionally used for identification, ...however with poor reliability. The coiled coil structure of a leucine zipper is required for dimerization and can be predicted with reasonable accuracy by existing algorithms. We exploit this fact for identification of leucine zippers from sequence alone. We present a program, 2ZIP, which combines a standard coiled coil prediction algorithm with an approximate search for the characteristic leucine repeat. No further information from homologues is required for prediction. This approach improves significantly over existing methods, especially in that the coiled coil prediction turns out to be highly informative and avoids large numbers of false positives. Many problems in predicting zippers or assessing prediction results stem from wrong sequence annotations in the database.
CADRE is a public resource for housing and analysing genomic data extracted from species of Aspergillus. It arose to enable maintenance of the complete annotated genomic sequence of Aspergillus ...fumigatus and to provide tools for searching, analysing and visualizing features of fungal genomes. By implementing CADRE using Ensembl, a framework is in place for storing and comparing several genomes: the resource will thus expand by including other Aspergillus genomes (such as Aspergillus nidulans) as they become available. CADRE is accessible at http://www.cadre. man.ac.uk.
As ecosystem engineers, seagrasses are angiosperms of paramount ecological importance in shallow shoreline habitats around the globe. Furthermore, the ancestors of independent seagrass lineages have ...secondarily returned into the sea in separate, independent evolutionary events. Thus, understanding the molecular adaptation of this clade not only makes significant contributions to the field of ecology, but also to principles of parallel evolution as well. With the use of Dr. Zompo, the first interactive seagrass sequence database presented here, new insights into the molecular adaptation of marine environments can be inferred. The database is based on a total of 14 597 ESTs obtained from two seagrass species, Zostera marina and Posidonia oceanica, which have been processed, assembled and comprehensively annotated. Dr. Zompo provides experimentalists with a broad foundation to build experiments and consider challenges associated with the investigation of this class of non-domesticated monocotyledon systems. Our database, based on the Ruby on Rails framework, is rich in features including the retrieval of experimentally determined heat-responsive transcripts, mining for molecular markers (SSRs and SNPs), and weighted key word searches that allow access to annotation gathered on several levels including Pfam domains, GeneOntology and KEGG pathways. Well established plant genome sites such as The Arabidopsis Information Resource (TAIR) and the Rice Genome Annotation Project are interfaced by Dr. Zompo. With this project, we have initialized a valuable resource for plant biologists in general and the seagrass community in particular. The database is expected to grow together with more data to come in the near future, particularly with the recent initiation of the Zostera genome sequencing project.The Dr. Zompo database is available at http://drzompo.uni-muenster.de/