Advancements in Next-Generation Sequencing Levy, Shawn E; Myers, Richard M
Annual review of genomics and human genetics,
08/2016, Letnik:
17, Številka:
1
Journal Article
Recenzirano
Odprti dostop
The term next-generation sequencing is almost a decade old, but it remains the colloquial way to describe highly parallel or high-output sequencing methods that produce data at or beyond the genome ...scale. Since the introduction of these technologies, the number of applications and methods that leverage the power of genome-scale sequencing has increased at an exponential pace. This review highlights recent concepts, technologies, and methods from next-generation sequencing to illustrate the breadth and depth of the applications and research areas that are driving progress in genomics.
Most human transcription factors bind a small subset of potential genomic sites and often use different subsets in different cell types. To identify mechanisms that govern cell-type-specific ...transcription factor binding, we used an integrative approach to study estrogen receptor α (ER). We found that ER exhibits two distinct modes of binding. Shared sites, bound in multiple cell types, are characterized by high-affinity estrogen response elements (EREs), inaccessible chromatin, and a lack of DNA methylation, while cell-specific sites are characterized by a lack of EREs, co-occurrence with other transcription factors, and cell-type-specific chromatin accessibility and DNA methylation. These observations enabled accurate quantitative models of ER binding that suggest tethering of ER to one-third of cell-specific sites. The distinct properties of cell-specific binding were also observed with glucocorticoid receptor and for ER in primary mouse tissues, representing an elegant genomic encoding scheme for generating cell-type-specific gene regulation.
Display omitted
•Two types of estrogen receptor α binding sites: shared and cell specific•Shared sites are encoded in the genome as high-affinity estrogen response elements•Cell-specific sites rely on interacting factors and depend on genomic context•Cell-specific binding is predicted from DNA sequence and chromatin accessibility
Genes underlying repeated adaptive evolution in natural populations are still largely unknown. Stickleback fish (Gasterosteus aculeatus) have undergone a recent dramatic evolutionary radiation, ...generating numerous examples of marine-freshwater species pairs and a small number of benthic-limnetic species pairs found within single lakes 1. We have developed a new genome-wide SNP genotyping array to study patterns of genetic variation in sticklebacks over a wide geographic range, and to scan the genome for regions that contribute to repeated evolution of marine-freshwater or benthic-limnetic species pairs. Surveying 34 global populations with 1,159 informative markers revealed substantial genetic variation, with predominant patterns reflecting demographic history and geographic structure. After correcting for geographic structure and filtering for neutral markers, we detected large repeated shifts in allele frequency at some loci, identifying both known and novel loci likely contributing to marine-freshwater and benthic-limnetic divergence. Several novel loci fall close to genes implicated in epithelial barrier or immune functions, which have likely changed as sticklebacks adapt to contrasting environments. Specific alleles differentiating sympatric benthic-limnetic species pairs are shared in nearby solitary populations, suggesting an allopatric origin for adaptive variants and selection pressures unrelated to sympatry in the initial formation of these classic vertebrate species pairs.
Display omitted
► A new genome-wide SNP array facilitates genetic studies of three-spine sticklebacks ► Survey of 34 populations reveals patterns of global geographic structure ► Large shifts in allele frequency underlie adaptive divergence of species pairs ► Allopatric populations share sets of benthic-limnetic species-pair adaptive alleles
Th17 cells have critical roles in mucosal defense and are major contributors to inflammatory disease. Their differentiation requires the nuclear hormone receptor RORγt working with multiple other ...essential transcription factors (TFs). We have used an iterative systems approach, combining genome-wide TF occupancy, expression profiling of TF mutants, and expression time series to delineate the Th17 global transcriptional regulatory network. We find that cooperatively bound BATF and IRF4 contribute to initial chromatin accessibility and, with STAT3, initiate a transcriptional program that is then globally tuned by the lineage-specifying TF RORγt, which plays a focal deterministic role at key loci. Integration of multiple data sets allowed inference of an accurate predictive model that we computationally and experimentally validated, identifying multiple new Th17 regulators, including Fosl2, a key determinant of cellular plasticity. This interconnected network can be used to investigate new therapeutic approaches to manipulate Th17 functions in the setting of inflammatory disease.
Display omitted
► Integrated function of multiple transcription factors in Th17 cell differentiation ► Binding of BATF-IRF4 complexes to DNA mediates chromatin accessibility ► Network-predicted AP-1 factor Fosl2 restricts plasticity of Th17 cells ► Integration of multiple data sets in network most accurately predicts Th17 regulators
Integrating genome-wide regulatory information for key transcription factors involved in the development of proinflammatory Th17 cells produces a highly accurate and predictive network model for lineage specification and function.
Genome-wide patterns of homozygosity runs and their variation across individuals provide a valuable and often untapped resource for studying human genetic diversity and evolutionary history. Using ...genotype data at 577,489 autosomal SNPs, we employed a likelihood-based approach to identify runs of homozygosity (ROH) in 1,839 individuals representing 64 worldwide populations, classifying them by length into three classes—short, intermediate, and long—with a model-based clustering algorithm. For each class, the number and total length of ROH per individual show considerable variation across individuals and populations. The total lengths of short and intermediate ROH per individual increase with the distance of a population from East Africa, in agreement with similar patterns previously observed for locus-wise homozygosity and linkage disequilibrium. By contrast, total lengths of long ROH show large interindividual variations that probably reflect recent inbreeding patterns, with higher values occurring more often in populations with known high frequencies of consanguineous unions. Across the genome, distributions of ROH are not uniform, and they have distinctive continental patterns. ROH frequencies across the genome are correlated with local genomic variables such as recombination rate, as well as with signals of recent positive selection. In addition, long ROH are more frequent in genomic regions harboring genes associated with autosomal-dominant diseases than in regions not implicated in Mendelian diseases. These results provide insight into the way in which homozygosity patterns are produced, and they generate baseline homozygosity patterns that can be used to aid homozygosity mapping of genes associated with recessive diseases.
Microglia are resident immune cells of the CNS that are activated by infection, neuronal injury, and inflammation. Here, we utilize flow cytometry and deep RNA sequencing of acutely isolated spinal ...cord microglia to define their activation in vivo. Analysis of resting microglia identified 29 genes that distinguish microglia from other CNS cells and peripheral macrophages/monocytes. We then analyzed molecular changes in microglia during neurodegenerative disease activation using the SOD1G93A mouse model of amyotrophic lateral sclerosis (ALS). We found that SOD1G93A microglia are not derived from infiltrating monocytes, and that both potentially neuroprotective and toxic factors, including Alzheimer’s disease genes, are concurrently upregulated. Mutant microglia differed from SOD1WT, lipopolysaccharide-activated microglia, and M1/M2 macrophages, defining an ALS-specific phenotype. Concurrent messenger RNA/fluorescence-activated cell sorting analysis revealed posttranscriptional regulation of microglia surface receptors and T cell-associated changes in the transcriptome. These results provide insights into microglia biology and establish a resource for future studies of neuroinflammation.
Display omitted
•Identification of specific marker genes for acutely isolated microglia•Progressive resident microglia transcriptome changes reveal in vivo activation phenotype•Microglial ALS disease activation signature distinct from M1/M2 macrophages•Parallel transcriptome and FACS analyses reveal T cell/microglia crosstalk
Microglia are resident immune cells of the brain that are activated by infection or tissue damage. In this study, Maniatis and colleagues report the acute isolation, transcriptional profiling, and immunological analysis of microglia during disease activation in an ALS mouse model. A neurodegeneration-specific gene-expression signature is identified that includes induction of both neuroprotective and toxic factors and is distinct from that associated with M1/M2 macrophages. The data also provide a resource for future studies of microglia activation in neurodegenerative diseases.
As studies of DNA methylation increase in scope, it has become evident that methylation has a complex relationship with gene expression, plays an important role in defining cell types, and is ...disrupted in many diseases. We describe large-scale single-base resolution DNA methylation profiling on a diverse collection of 82 human cell lines and tissues using reduced representation bisulfite sequencing (RRBS). Analysis integrating RNA-seq and ChIP-seq data illuminates the functional role of this dynamic mark. Loci that are hypermethylated across cancer types are enriched for sites bound by NANOG in embryonic stem cells, which supports and expands the model of a stem/progenitor cell signature in cancer. CpGs that are hypomethylated across cancer types are concentrated in megabase-scale domains that occur near the telomeres and centromeres of chromosomes, are depleted of genes, and are enriched for cancer-specific EZH2 binding and H3K27me3 (repressive chromatin). In noncancer samples, there are cell-type specific methylation signatures preserved in primary cell lines and tissues as well as methylation differences induced by cell culture. The relationship between methylation and expression is context-dependent, and we find that CpG-rich enhancers bound by EP300 in the bodies of expressed genes are unmethylated despite the dense gene-body methylation surrounding them. Non-CpG cytosine methylation occurs in human somatic tissue, is particularly prevalent in brain tissue, and is reproducible across many individuals. This study provides an atlas of DNA methylation across diverse and well-characterized samples and enables new discoveries about DNA methylation and its role in gene regulation and disease.
The methylation of cytosines in CpG dinucleotides is essential for cellular differentiation and the progression of many cancers, and it plays an important role in gametic imprinting. To assess ...variation and inheritance of genome-wide patterns of DNA methylation simultaneously in humans, we applied reduced representation bisulfite sequencing (RRBS) to somatic DNA from six members of a three-generation family. We observed that 8.1% of heterozygous SNPs are associated with differential methylation in cis, which provides a robust signature for Mendelian transmission and relatedness. The vast majority of differential methylation between homologous chromosomes (>92%) occurs on a particular haplotype as opposed to being associated with the gender of the parent of origin, indicating that genotype affects DNA methylation of far more loci than does gametic imprinting. We found that 75% of genotype-dependent differential methylation events in the family are also seen in unrelated individuals and that overall genotype can explain 80% of the variation in DNA methylation. These events are under-represented in CpG islands, enriched in intergenic regions, and located in regions of low evolutionary conservation. Even though they are generally not in functionally constrained regions, 22% (twice as many as expected by chance) of genes harboring genotype-dependent DNA methylation exhibited allele-specific gene expression as measured by RNA-seq of a lymphoblastoid cell line, indicating that some of these events are associated with gene expression differences. Overall, our results demonstrate that the influence of genotype on patterns of DNA methylation is widespread in the genome and greatly exceeds the influence of imprinting on genome-wide methylation patterns.
Genome-Wide Mapping of in Vivo Protein-DNA Interactions Johnson, David S; Mortazavi, Ali; Myers, Richard M ...
Science (American Association for the Advancement of Science),
06/2007, Letnik:
316, Številka:
5830
Journal Article
Recenzirano
In vivo protein-DNA interactions connect each transcription factor with its direct targets to form a gene network scaffold. To map these protein-DNA interactions comprehensively across entire ...mammalian genomes, we developed a large-scale chromatin immunoprecipitation assay (ChIPSeq) based on direct ultrahigh-throughput DNA sequencing. This sequence census method was then used to map in vivo binding of the neuron-restrictive silencer factor (NRSF; also known as REST, for repressor element-1 silencing transcription factor) to 1946 locations in the human genome. The data display sharp resolution of binding position ±50 base pairs (bp), which facilitated our finding motifs and allowed us to identify noncanonical NRSF-binding motifs. These ChIPSeq data also have high sensitivity and specificity ROC (receiver operator characteristic) area >= 0.96 and statistical confidence (P <10⁻⁴), properties that were important for inferring new candidate interactions. These include key transcription factors in the gene network that regulates pancreatic islet cell development.
Single-cell RNA-seq mammalian transcriptome studies are at an early stage in uncovering cell-to-cell variation in gene expression, transcript processing and editing, and regulatory module activity. ...Despite great progress recently, substantial challenges remain, including discriminating biological variation from technical noise. Here we apply the SMART-seq single-cell RNA-seq protocol to study the reference lymphoblastoid cell line GM12878. By using spike-in quantification standards, we estimate the absolute number of RNA molecules per cell for each gene and find significant variation in total mRNA content: between 50,000 and 300,000 transcripts per cell. We directly measure technical stochasticity by a pool/split design and find that there are significant differences in expression between individual cells, over and above technical variation. Specific gene coexpression modules were preferentially expressed in subsets of individual cells, including one enriched for mRNA processing and splicing factors. We assess cell-to-cell variation in alternative splicing and allelic bias and report evidence of significant differences in splice site usage that exceed splice variation in the pool/split comparison. Finally, we show that transcriptomes from small pools of 30-100 cells approach the information content and reproducibility of contemporary RNA-seq from large amounts of input material. Together, our results define an experimental and computational path forward for analyzing gene expression in rare cell types and cell states.