Evolutionary conservation is an invaluable tool for inferring functional significance in the genome, including regions that are crucial across many species and those that have undergone convergent ...evolution. Computational methods to test for sequence conservation are dominated by algorithms that examine the ability of one or more nucleotides to align across large evolutionary distances. While these nucleotide alignment-based approaches have proven powerful for protein-coding genes and some non-coding elements, they fail to capture conservation of many enhancers, distal regulatory elements that control spatial and temporal patterns of gene expression. The function of enhancers is governed by a complex, often tissue- and cell type-specific code that links combinations of transcription factor binding sites and other regulation-related sequence patterns to regulatory activity. Thus, function of orthologous enhancer regions can be conserved across large evolutionary distances, even when nucleotide turnover is high.
We present a new machine learning-based approach for evaluating enhancer conservation that leverages the combinatorial sequence code of enhancer activity rather than relying on the alignment of individual nucleotides. We first train a convolutional neural network model that can predict tissue-specific open chromatin, a proxy for enhancer activity, across mammals. Next, we apply that model to distinguish instances where the genome sequence would predict conserved function versus a loss of regulatory activity in that tissue. We present criteria for systematically evaluating model performance for this task and use them to demonstrate that our models accurately predict tissue-specific conservation and divergence in open chromatin between primate and rodent species, vastly out-performing leading nucleotide alignment-based approaches. We then apply our models to predict open chromatin at orthologs of brain and liver open chromatin regions across hundreds of mammals and find that brain enhancers associated with neuron activity have a stronger tendency than the general population to have predicted lineage-specific open chromatin.
The framework presented here provides a mechanism to annotate tissue-specific regulatory function across hundreds of genomes and to study enhancer evolution using predicted regulatory differences rather than nucleotide-level conservation measurements.
Zoonomia is the largest comparative genomics resource for mammals produced to date. By aligning genomes for 240 species, we identify bases that, when mutated, are likely to affect fitness and alter ...disease risk. At least 332 million bases (~10.7%) in the human genome are unusually conserved across species (evolutionarily constrained) relative to neutrally evolving repeats, and 4552 ultraconserved elements are nearly perfectly conserved. Of 101 million significantly constrained single bases, 80% are outside protein-coding exons and half have no functional annotations in the Encyclopedia of DNA Elements (ENCODE) resource. Changes in genes and regulatory elements are associated with exceptional mammalian traits, such as hibernation, that could inform therapeutic development. Earth's vast and imperiled biodiversity offers distinctive power for identifying genetic variants that affect genome function and organismal phenotypes.
Recent large genome-wide association studies have identified multiple confident risk loci linked to addiction-associated behavioral traits. Most genetic variants linked to addiction-associated traits ...lie in noncoding regions of the genome, likely disrupting
cis
-regulatory element (CRE) function. CREs tend to be highly cell type-specific and may contribute to the functional development of the neural circuits underlying addiction. Yet, a systematic approach for predicting the impact of risk variants on the CREs of specific cell populations is lacking. To dissect the cell types and brain regions underlying addiction-associated traits, we applied stratified linkage disequilibrium score regression to compare genome-wide association studies to genomic regions collected from human and mouse assays for open chromatin, which is associated with CRE activity. We found enrichment of addiction-associated variants in putative CREs marked by open chromatin in neuronal (NeuN
+
) nuclei collected from multiple prefrontal cortical areas and striatal regions known to play major roles in reward and addiction. To further dissect the cell type-specific basis of addiction-associated traits, we also identified enrichments in human orthologs of open chromatin regions of female and male mouse neuronal subtypes: cortical excitatory, D1, D2, and PV. Last, we developed machine learning models to predict mouse cell type-specific open chromatin, enabling us to further categorize human NeuN
+
open chromatin regions into cortical excitatory or striatal D1 and D2 neurons and predict the functional impact of addiction-associated genetic variants. Our results suggest that different neuronal subtypes within the reward system play distinct roles in the variety of traits that contribute to addiction.
SIGNIFICANCE STATEMENT
We combine statistical genetic and machine learning techniques to find that the predisposition to for nicotine, alcohol, and cannabis use behaviors can be partially explained by genetic variants in conserved regulatory elements within specific brain regions and neuronal subtypes of the reward system. Our computational framework can flexibly integrate open chromatin data across species to screen for putative causal variants in a cell type- and tissue-specific manner for numerous complex traits.
Display omitted
•Presence of cryptic species in a cone snail species complex is evaluated.•Nuclear and mitochondrial gene sequences and morphological characters are used to delimit species.•Results ...support recognition of Conus peasei, a species that appears to be restricted to Hawaii.•Conflicts among relationships inferred from different datasets imply a hybrid origin for Conus peasei.
Knowledge concerning the taxonomic diversity of marine organisms is crucial for understanding processes associated with species diversification in geographic areas that are devoid of obvious barriers to dispersal. The marine gastropod family Conidae contains many species complexes due to lack of clear morphological distinctiveness and existence of morphological intergradations among described species. Conus flavidus Lamarck, 1810 and Conus frigidus Reeve, 1848 are currently recognized as distinct taxa, but are often difficult to distinguish by morphological characters and include several synonyms, including Conus peasei Brazier, 1877. C. peasei was originally described by Pease in 1861 (as Conus neglectus) based on slight morphological differences of a population of C. flavidus from Hawaii that distinguished it from C. flavidus from elsewhere. To evaluate the systematics of this group and specifically test the hypothesis of synonymy of C. peasei with C. flavidus, we examined molecular and morphometric data from specimens of C. flavidus, C. frigidus and C. peasei (i.e., C. flavidus from Hawaii). Multiple clades that contain individuals from particular geographic regions are apparent in gene trees constructed from sequences of a mitochondrial gene region. In particular, sequences of C. peasei cluster together separately from sequences of C. flavidus and C. frigidus. Although individuals of C. peasei, C. flavidus and C. frigidus each contain a unique set of alleles for a nuclear locus, a conotoxin gene, alleles of C. peasei are more similar to those of C. flavidus. In addition, sequences of a region of a second nuclear gene are identical among C. peasei and C. flavidus though they are distinct from sequences of C. frigidus. Morphometric data revealed that shells of C. peasei are distinct in some aspects, but are more similar to those of C. flavidus than to those of C. frigidus. Taken together, these results suggest that C. peasei represents a distinct species. Moreover, based on the contradictory relationships inferred from the mitochondrial and nuclear sequences (as well as morphometric data), C. peasei may have originated through past hybridization among the ancestral lineages that gave rise to C. flavidus and C. frigidus.
Neuron subtype dysfunction is a key contributor to neurologic disease circuits, but identifying associated gene regulatory pathways is complicated by the molecular complexity of the brain. For ...example, parvalbumin-expressing (PV
) neurons in the external globus pallidus (GPe) are critically involved in the motor deficits of dopamine-depleted mouse models of Parkinson's disease, where cell type-specific optogenetic stimulation of PV
neurons over other neuron populations rescues locomotion. Despite the distinct roles these cell types play in the neural circuit, the molecular correlates remain unknown because of the difficulty of isolating rare neuron subtypes. To address this issue, we developed a new viral affinity purification strategy, Cre-Specific Nuclear Anchored Independent Labeling, to isolate Cre recombinase-expressing (Cre
) nuclei from the adult mouse brain. Applying this technology, we performed targeted assessments of the cell type-specific transcriptomic and epigenetic effects of dopamine depletion on PV
and PV
cells within three brain regions of male and female mice: GPe, striatum, and cortex. We found GPe PV
neuron-specific gene expression changes that suggested increased hypoxia-inducible factor 2α signaling. Consistent with transcriptomic data, regions of open chromatin affected by dopamine depletion within GPe PV
neurons were enriched for hypoxia-inducible factor family binding motifs. The gene expression and epigenomic experiments performed on PV
neurons isolated by Cre-Specific Nuclear Anchored Independent Labeling identified a transcriptional regulatory network mediated by the neuroprotective factor Hif2a as underlying neural circuit differences in response to dopamine depletion.
Cre-Specific Nuclear Anchored Independent Labeling is an enhanced, virus-based approach to isolate nuclei of a specific cell type for transcriptome and epigenome interrogation that decreases dependency on transgenic animals. Applying this technology to GPe parvalbumin-expressing neurons in a mouse model of Parkinson's disease, we discovered evidence for an upregulation of the oxygen homeostasis maintaining pathway involving Hypoxia-inducible factor 2α. These results provide new insight into how neuron subtypes outside the substantia nigra pars compacta may be compensating at a molecular level for differences in the motor production neural circuit during the progression of Parkinson's disease. Furthermore, they emphasize the utility of cell type-specific technologies, such as Cre-Specific Nuclear Anchored Independent Labeling, for isolated assessment of specific neuron subtypes in complex systems.
Recent discoveries of extreme cellular diversity in the brain warrant rapid development of technologies to access specific cell populations within heterogeneous tissue. Available approaches for ...engineering-targeted technologies for new neuron subtypes are low yield, involving intensive transgenic strain or virus screening. Here, we present Specific Nuclear-Anchored Independent Labeling (SNAIL), an improved virus-based strategy for cell labeling and nuclear isolation from heterogeneous tissue. SNAIL works by leveraging machine learning and other computational approaches to identify DNA sequence features that confer cell type-specific gene activation and then make a probe that drives an affinity purification-compatible reporter gene. As a proof of concept, we designed and validated two novel SNAIL probes that target parvalbumin-expressing (PV+) neurons. Nuclear isolation using SNAIL in wild-type mice is sufficient to capture characteristic open chromatin features of PV+ neurons in the cortex, striatum, and external globus pallidus. The SNAIL framework also has high utility for multispecies cell probe engineering; expression from a mouse PV+ SNAIL enhancer sequence was enriched in PV+ neurons of the macaque cortex. Expansion of this technology has broad applications in cell type-specific observation, manipulation, and therapeutics across species and disease models.
Protein-coding differences between species often fail to explain phenotypic diversity, suggesting the involvement of genomic elements that regulate gene expression such as enhancers. Identifying ...associations between enhancers and phenotypes is challenging because enhancer activity can be tissue-dependent and functionally conserved despite low sequence conservation. We developed the Tissue-Aware Conservation Inference Toolkit (TACIT) to associate candidate enhancers with species' phenotypes using predictions from machine learning models trained on specific tissues. Applying TACIT to associate motor cortex and parvalbumin-positive interneuron enhancers with neurological phenotypes revealed dozens of enhancer-phenotype associations, including brain size-associated enhancers that interact with genes implicated in microcephaly or macrocephaly. TACIT provides a foundation for identifying enhancers associated with the evolution of any convergently evolved phenotype in any large group of species with aligned genomes.
Vocal production learning ("vocal learning") is a convergently evolved trait in vertebrates. To identify brain genomic elements associated with mammalian vocal learning, we integrated genomic, ...anatomical, and neurophysiological data from the Egyptian fruit bat (
) with analyses of the genomes of 215 placental mammals. First, we identified a set of proteins evolving more slowly in vocal learners. Then, we discovered a vocal motor cortical region in the Egyptian fruit bat, an emergent vocal learner, and leveraged that knowledge to identify active cis-regulatory elements in the motor cortex of vocal learners. Machine learning methods applied to motor cortex open chromatin revealed 50 enhancers robustly associated with vocal learning whose activity tended to be lower in vocal learners. Our research implicates convergent losses of motor cortex regulatory elements in mammalian vocal learning evolution.
Abstract
Horizontal transfer of transposable elements (TEs) is an important mechanism contributing to genetic diversity and innovation. Bats (order Chiroptera) have repeatedly been shown to ...experience horizontal transfer of TEs at what appears to be a high rate compared with other mammals. We investigated the occurrence of horizontally transferred (HT) DNA transposons involving bats. We found over 200 putative HT elements within bats; 16 transposons were shared across distantly related mammalian clades, and 2 other elements were shared with a fish and two lizard species. Our results indicate that bats are a hotspot for horizontal transfer of DNA transposons. These events broadly coincide with the diversification of several bat clades, supporting the hypothesis that DNA transposon invasions have contributed to genetic diversification of bats.