A major yet unresolved quest in decoding the human genome is the identification of the regulatory sequences that control the spatial and temporal expression of genes. Distant-acting transcriptional ...enhancers are particularly challenging to uncover because they are scattered among the vast non-coding portion of the genome. Evolutionary sequence constraint can facilitate the discovery of enhancers, but fails to predict when and where they are active in vivo. Here we present the results of chromatin immunoprecipitation with the enhancer-associated protein p300 followed by massively parallel sequencing, and map several thousand in vivo binding sites of p300 in mouse embryonic forebrain, midbrain and limb tissue. We tested 86 of these sequences in a transgenic mouse assay, which in nearly all cases demonstrated reproducible enhancer activity in the tissues that were predicted by p300 binding. Our results indicate that in vivo mapping of p300 binding is a highly accurate means for identifying enhancers and their associated activities, and suggest that such data sets will be useful to study the role of tissue-specific enhancers in human biology and disease on a genome-wide scale.
Across the human genome, there are nearly 500 'ultraconserved' elements: regions of at least 200 contiguous nucleotides that are perfectly conserved in both the mouse and rat genomes. Remarkably, the ...majority of these sequences are non-coding, and many can function as enhancers that activate tissue-specific gene expression during embryonic development. From their first description more than 15 years ago, their extreme conservation has both fascinated and perplexed researchers in genomics and evolutionary biology. The intrigue around ultraconserved elements only grew with the observation that they are dispensable for viability. Here, we review recent progress towards understanding the general importance and the specific functions of ultraconserved sequences in mammalian development and human disease and discuss possible explanations for their extreme conservation.
In contrast to protein-coding sequences, the significance of variation in non-coding DNA in human disease has been minimally explored. A great number of recent genome-wide association studies suggest ...that non-coding variation is a significant risk factor for common disorders, but the mechanisms by which this variation contributes to disease remain largely obscure. Distant-acting transcriptional enhancers--a major category of functional non-coding DNA--are involved in many developmental and disease-relevant processes. Genome-wide approaches to their discovery and functional characterization are now available and provide a growing knowledge base for the systematic exploration of their role in human biology and disease susceptibility.
Analysis of chromatin accessibility can reveal transcriptional regulatory sequences, but heterogeneity of primary tissues poses a significant challenge in mapping the precise chromatin landscape in ...specific cell types. Here we report single-nucleus ATAC-seq, a combinatorial barcoding-assisted single-cell assay for transposase-accessible chromatin that is optimized for use on flash-frozen primary tissue samples. We apply this technique to the mouse forebrain through eight developmental stages. Through analysis of more than 15,000 nuclei, we identify 20 distinct cell populations corresponding to major neuronal and non-neuronal cell types. We further define cell-type-specific transcriptional regulatory sequences, infer potential master transcriptional regulators and delineate developmental changes in forebrain cellular composition. Our results provide insight into the molecular and cellular dynamics that underlie forebrain development in the mouse and establish technical and analytical frameworks that are broadly applicable to other heterogeneous tissues.
Sequence polymorphisms in a 58-kilobase (kb) interval on chromosome 9p21 confer a markedly increased risk of coronary artery disease (CAD), the leading cause of death worldwide. The variants have a ...substantial effect on the epidemiology of CAD and other life-threatening vascular conditions because nearly one-quarter of Caucasians are homozygous for risk alleles. However, the risk interval is devoid of protein-coding genes and the mechanism linking the region to CAD risk has remained enigmatic. Here we show that deletion of the orthologous 70-kb non-coding interval on mouse chromosome 4 affects cardiac expression of neighbouring genes, as well as proliferation properties of vascular cells. Chr4 70kb/ 70kb mice are viable, but show increased mortality both during development and as adults. Cardiac expression of two genes near the non-coding interval, Cdkn2a and Cdkn2b, is severely reduced in chr4 70kb/ 70kb mice, indicating that distant-acting gene regulatory functions are located in the non-coding CAD risk interval. Allele-specific expression of Cdkn2b transcripts in heterozygous mice showed that the deletion affects expression through a cis-acting mechanism. Primary cultures of chr4 70kb/ 70kb aortic smooth muscle cells exhibited excessive proliferation and diminished senescence, a cellular phenotype consistent with accelerated CAD pathogenesis. Taken together, our results provide direct evidence that the CAD risk interval has a pivotal role in regulation of cardiac Cdkn2a/b expression, and suggest that this region affects CAD progression by altering the dynamics of vascular cell proliferation.
Desert plants are hypothesized to survive the environmental stress inherent to these regions in part thanks to symbioses with microorganisms, and yet these microbial species, the communities they ...form, and the forces that influence them are poorly understood. Here we report the first comprehensive investigation of the microbial communities associated with species of Agave, which are native to semiarid and arid regions of Central and North America and are emerging as biofuel feedstocks. We examined prokaryotic and fungal communities in the rhizosphere, phyllosphere, leaf and root endosphere, as well as proximal and distal soil samples from cultivated and native agaves, through Illumina amplicon sequencing. Phylogenetic profiling revealed that the composition of prokaryotic communities was primarily determined by the plant compartment, whereas the composition of fungal communities was mainly influenced by the biogeography of the host species. Cultivated A. tequilana exhibited lower levels of prokaryotic diversity compared with native agaves, although no differences in microbial diversity were found in the endosphere. Agaves shared core prokaryotic and fungal taxa known to promote plant growth and confer tolerance to abiotic stress, which suggests common principles underpinning Agave–microbe interactions.
Mammalian genomes are organized into megabase-scale topologically associated domains (TADs). We demonstrate that disruption of TADs can rewire long-range regulatory architecture and result in ...pathogenic phenotypes. We show that distinct human limb malformations are caused by deletions, inversions, or duplications altering the structure of the TAD-spanning WNT6/IHH/EPHA4/PAX3 locus. Using CRISPR/Cas genome editing, we generated mice with corresponding rearrangements. Both in mouse limb tissue and patient-derived fibroblasts, disease-relevant structural changes cause ectopic interactions between promoters and non-coding DNA, and a cluster of limb enhancers normally associated with Epha4 is misplaced relative to TAD boundaries and drives ectopic limb expression of another gene in the locus. This rewiring occurred only if the variant disrupted a CTCF-associated boundary domain. Our results demonstrate the functional importance of TADs for orchestrating gene expression via genome architecture and indicate criteria for predicting the pathogenicity of human structural variants, particularly in non-coding regions of the human genome.
Display omitted
•Disruptions of TADs lead to de novo enhancer-promoter interactions and misexpression•Misexpression occurs when CTCF-associated TAD boundary elements are disrupted•Structural variations disrupting TAD structures can cause malformation syndromes•Different phenotypes can result from one enhancer acting on different target genes
Disease-associated structural variants, when affecting CTCF-associated boundary elements, cause pathogenicity by disrupting the structure of topologically associated chromatin domains leading to ectopic promoter interactions and altered gene expression.
Bacteriophages from the Inoviridae family (inoviruses) are characterized by their unique morphology, genome content and infection cycle. One of the most striking features of inoviruses is their ...ability to establish a chronic infection whereby the viral genome resides within the cell in either an exclusively episomal state or integrated into the host chromosome and virions are continuously released without killing the host. To date, a relatively small number of inovirus isolates have been extensively studied, either for biotechnological applications, such as phage display, or because of their effect on the toxicity of known bacterial pathogens including Vibrio cholerae and Neisseria meningitidis. Here, we show that the current 56 members of the Inoviridae family represent a minute fraction of a highly diverse group of inoviruses. Using a machine learning approach leveraging a combination of marker gene and genome features, we identified 10,295 inovirus-like sequences from microbial genomes and metagenomes. Collectively, our results call for reclassification of the current Inoviridae family into a viral order including six distinct proposed families associated with nearly all bacterial phyla across virtually every ecosystem. Putative inoviruses were also detected in several archaeal genomes, suggesting that, collectively, members of this supergroup infect hosts across the domains Bacteria and Archaea. Finally, we identified an expansive diversity of inovirus-encoded toxin-antitoxin and gene expression modulation systems, alongside evidence of both synergistic (CRISPR evasion) and antagonistic (superinfection exclusion) interactions with co-infecting viruses, which we experimentally validated in a Pseudomonas model. Capturing this previously obscured component of the global virosphere may spark new avenues for microbial manipulation approaches and innovative biotechnological applications.
During mammalian embryogenesis, differential gene expression gradually builds the identity and complexity of each tissue and organ system
. Here we systematically quantified mouse polyA-RNA from day ...10.5 of embryonic development to birth, sampling 17 tissues and organs. The resulting developmental transcriptome is globally structured by dynamic cytodifferentiation, body-axis and cell-proliferation gene sets that were further characterized by the transcription factor motif codes of their promoters. We decomposed the tissue-level transcriptome using single-cell RNA-seq (sequencing of RNA reverse transcribed into cDNA) and found that neurogenesis and haematopoiesis dominate at both the gene and cellular levels, jointly accounting for one-third of differential gene expression and more than 40% of identified cell types. By integrating promoter sequence motifs with companion ENCODE epigenomic profiles, we identified a prominent promoter de-repression mechanism in neuronal expression clusters that was attributable to known and novel repressors. Focusing on the developing limb, single-cell RNA data identified 25 candidate cell types that included progenitor and differentiating states with computationally inferred lineage relationships. We extracted cell-type transcription factor networks and complementary sets of candidate enhancer elements by using single-cell RNA-seq to decompose integrative cis-element (IDEAS) models that were derived from whole-tissue epigenome chromatin data. These ENCODE reference data, computed network components and IDEAS chromatin segmentations are companion resources to the matching epigenomic developmental matrix, and are available for researchers to further mine and integrate.
The paucity of enzymes that efficiently deconstruct plant polysaccharides represents a major bottleneck for industrial-scale conversion of cellulosic biomass into biofuels. Cow rumen microbes ...specialize in degradation of cellulosic plant material, but most members of this complex community resist cultivation. To characterize biomass-degrading genes and genomes, we sequenced and analyzed 268 gigabases of metagenomic DNA from microbes adherent to plant fiber incubated in cow rumen. From these data, we identified 27,755 putative carbohydrate-active genes and expressed 90 candidate proteins, of which 57% were enzymatically active against cellulosic substrates. We also assembled 15 uncultured microbial genomes, which were validated by complementary methods including single-cell genome sequencing. These data sets provide a substantially expanded catalog of genes and genomes participating in the deconstruction of cellulosic biomass.