We recently showed that the mammalian genome encodes >1,000 large intergenic noncoding (linc)RNAs that are clearly conserved across mammals and, thus, functional. Gene expression patterns have ...implicated these lincRNAs in diverse biological processes, including cell-cycle regulation, immune surveillance, and embryonic stem cell pluripotency. However, the mechanism by which these lincRNAs function is unknown. Here, we expand the catalog of human lincRNAs to ≈3,300 by analyzing chromatin-state maps of various human cell types. Inspired by the observation that the well-characterized lincRNA HOTAIR binds the polycomb repressive complex (PRC)2, we tested whether many lincRNAs are physically associated with PRC2. Remarkably, we observe that ≈20% of lincRNAs expressed in various cell types are bound by PRC2, and that additional lincRNAs are bound by other chromatin-modifying complexes. Also, we show that siRNAmediated depletion of certain lincRNAs associated with PRC2 leads to changes in gene expression, and that the up-regulated genes are enriched for those normally silenced by PRC2. We propose a model in which some lincRNAs guide chromatin-modifying complexes to specific genomic loci to regulate gene expression.
Learning to read and write the transcriptional regulatory code is of central importance to progress in genetic analysis and engineering. Here we describe a massively parallel reporter assay (MPRA) ...that facilitates the systematic dissection of transcriptional regulatory elements. In MPRA, microarray-synthesized DNA regulatory elements and unique sequence tags are cloned into plasmids to generate a library of reporter constructs. These constructs are transfected into cells and tag expression is assayed by high-throughput sequencing. We apply MPRA to compare >27,000 variants of two inducible enhancers in human cells: a synthetic cAMP-regulated enhancer and the virus-inducible interferon-β enhancer. We first show that the resulting data define accurate maps of functional transcription factor binding sites in both enhancers at single-nucleotide resolution. We then use the data to train quantitative sequence-activity models (QSAMs) of the two enhancers. We show that QSAMs from two cellular states can be combined to design enhancer variants that optimize potentially conflicting objectives, such as maximizing induced activity while minimizing basal activity.
Massively parallel cDNA sequencing (RNA-Seq) provides an unbiased way to study a transcriptome, including both coding and noncoding genes. Until now, most RNA-Seq studies have depended crucially on ...existing annotations and thus focused on expression levels and variation in known transcripts. Here, we present Scripture, a method to reconstruct the transcriptome of a mammalian cell using only RNA-Seq reads and the genome sequence. We applied it to mouse embryonic stem cells, neuronal precursor cells and lung fibroblasts to accurately reconstruct the full-length gene structures for most known expressed genes. We identified substantial variation in protein coding genes, including thousands of novel 5' start sites, 3' ends and internal coding exons. We then determined the gene structures of more than a thousand large intergenic noncoding RNA (lincRNA) and antisense loci. Our results open the way to direct experimental manipulation of thousands of noncoding RNAs and demonstrate the power of ab initio reconstruction to render a comprehensive picture of mammalian transcriptomes.
Massively parallel DNA sequencing technologies are revolutionizing genomics by making it possible to generate billions of relatively short (approximately 100-base) sequence reads at very low cost. ...Whereas such data can be readily used for a wide range of biomedical applications, it has proven difficult to use them to generate high-quality de novo genome assemblies of large, repeat-rich vertebrate genomes. To date, the genome assemblies generated from such data have fallen far short of those obtained with the older (but much more expensive) capillary-based sequencing approach. Here, we report the development of an algorithm for genome assembly, ALLPATHS-LG, and its application to massively parallel DNA sequence data from the human and mouse genomes, generated on the Illumina platform. The resulting draft genome assemblies have good accuracy, short-range contiguity, long-range connectivity, and coverage of the genome. In particular, the base accuracy is high (≥99.95%) and the scaffold sizes (N50 size = 11.5 Mb for human and 7.2 Mb for mouse) approach those obtained with capillary-based sequencing. The combination of improved sequencing technology and improved computational methods should now make it possible to increase dramatically the de novo sequencing of large genomes. The ALLPATHS-LG program is available at http://www.broadinstitute.org/science/programs/genome-biology/crd.
Recently, more than 1000 large intergenic noncoding RNAs (lincRNAs) have been reported. These RNAs are evolutionarily conserved in mammalian genomes and thus presumably function in diverse ...biological processes. Here, we report the identification of lincRNAs that are regulated by p53. One of these lincRNAs (lincRNA-p21) serves as a repressor in p53-dependent transcriptional responses. Inhibition of lincRNA-p21 affects the expression of hundreds of gene targets enriched for genes normally repressed by p53. The observed transcriptional repression by lincRNA-p21 is mediated through the physical association with hnRNP-K. This interaction is required for proper genomic localization of hnRNP-K at repressed genes and regulation of p53 mediates apoptosis. We propose a model whereby transcription factors activate lincRNAs that serve as key repressors by physically associating with repressive complexes and modulate their localization to sets of previously active genes.
Display omitted
Display omitted
► Several lincRNAs are regulated by p53 ► LincRNA-p21 is a bona fide p53 transcriptional target ► LincRNA-p21 mediates global gene repression and apoptosis in the p53 pathway ► LincRNA-p21 represses gene targets through physical association with hnRNP-K
Although several hundred regions of the human genome harbor signals of positive natural selection, few of the relevant adaptive traits and variants have been elucidated. Using full-genome sequence ...variation from the 1000 Genomes (1000G) Project and the composite of multiple signals (CMS) test, we investigated 412 candidate signals and leveraged functional annotation, protein structure modeling, epigenetics, and association studies to identify and extensively annotate candidate causal variants. The resulting catalog provides a tractable list for experimental follow-up; it includes 35 high-scoring nonsynonymous variants, 59 variants associated with expression levels of a nearby coding gene or lincRNA, and numerous variants associated with susceptibility to infectious disease and other phenotypes. We experimentally characterized one candidate nonsynonymous variant in Toll-like receptor 5 (TLR5) and show that it leads to altered NF-κB signaling in response to bacterial flagellin.
Display omitted
Display omitted
► Genome-wide scan in 1000 Genomes sequence data fine maps 412 signals of selection ► Adaptive candidates include 35 nonsynonymous and 59 loci with eQTLs ► L→F variant in TLR5 reduces NF-κB signaling in response to bacterial flagellin ► Catalog provides a tractable set of selected variants for experimental follow up
Analysis of human sequence data from the 1000 Genomes Project reveals hundreds of potential adaptive variants, providing a road map for understanding human biological history and modern-day variability.
Somatic cells can be reprogrammed to a pluripotent state through the ectopic expression of defined transcription factors. Understanding the mechanism and kinetics of this transformation may shed ...light on the nature of developmental potency and suggest strategies with improved efficiency or safety. Here we report an integrative genomic analysis of reprogramming of mouse fibroblasts and B lymphocytes. Lineage-committed cells show a complex response to the ectopic expression involving induction of genes downstream of individual reprogramming factors. Fully reprogrammed cells show gene expression and epigenetic states that are highly similar to embryonic stem cells. In contrast, stable partially reprogrammed cell lines show reactivation of a distinctive subset of stem-cell-related genes, incomplete repression of lineage-specifying transcription factors, and DNA hypermethylation at pluripotency-related loci. These observations suggest that some cells may become trapped in partially reprogrammed states owing to incomplete repression of transcription factors, and that DNA de-methylation is an inefficient step in the transition to pluripotency. We demonstrate that RNA inhibition of transcription factors can facilitate reprogramming, and that treatment with DNA methyltransferase inhibitors can improve the overall efficiency of the reprogramming process.
Loss of the epithelial adhesion molecule E-cadherin is thought to enable metastasis by disrupting intercellular contacts-an early step in metastatic dissemination. To further investigate the ...molecular basis of this notion, we use two methods to inhibit E-cadherin function that distinguish between E-cadherin's cell-cell adhesion and intracellular signaling functions. Whereas the disruption of cell-cell contacts alone does not enable metastasis, the loss of E-cadherin protein does, through induction of an epithelial-to-mesenchymal transition, invasiveness, and anoikis resistance. We find the E-cadherin binding partner beta-catenin to be necessary, but not sufficient, for induction of these phenotypes. In addition, gene expression analysis shows that E-cadherin loss results in the induction of multiple transcription factors, at least one of which, Twist, is necessary for E-cadherin loss-induced metastasis. These findings indicate that E-cadherin loss in tumors contributes to metastatic dissemination by inducing wide-ranging transcriptional and functional changes.
Generation of induced pluripotent stem cells (iPSCs) by somatic cell reprogramming involves global epigenetic remodelling. Whereas several proteins are known to regulate chromatin marks associated ...with the distinct epigenetic states of cells before and after reprogramming, the role of specific chromatin-modifying enzymes in reprogramming remains to be determined. To address how chromatin-modifying proteins influence reprogramming, we used short hairpin RNAs (shRNAs) to target genes in DNA and histone methylation pathways, and identified positive and negative modulators of iPSC generation. Whereas inhibition of the core components of the polycomb repressive complex 1 and 2, including the histone 3 lysine 27 methyltransferase EZH2, reduced reprogramming efficiency, suppression of SUV39H1, YY1 and DOT1L enhanced reprogramming. Specifically, inhibition of the H3K79 histone methyltransferase DOT1L by shRNA or a small molecule accelerated reprogramming, significantly increased the yield of iPSC colonies, and substituted for KLF4 and c-Myc (also known as MYC). Inhibition of DOT1L early in the reprogramming process is associated with a marked increase in two alternative factors, NANOG and LIN28, which play essential functional roles in the enhancement of reprogramming. Genome-wide analysis of H3K79me2 distribution revealed that fibroblast-specific genes associated with the epithelial to mesenchymal transition lose H3K79me2 in the initial phases of reprogramming. DOT1L inhibition facilitates the loss of this mark from genes that are fated to be repressed in the pluripotent state. These findings implicate specific chromatin-modifying enzymes as barriers to or facilitators of reprogramming, and demonstrate how modulation of chromatin-modifying enzymes can be exploited to more efficiently generate iPSCs with fewer exogenous transcription factors.
Genome sequencing has revealed a large number of shared and personal somatic mutations across human cancers. In principle, any genetic alteration affecting a protein-coding region has the potential ...to generate mutated peptides that are presented by surface HLA class I proteins that might be recognized by cytotoxic T cells. To test this possibility, we implemented a streamlined approach for the prediction and validation of such neoantigens derived from individual tumors and presented by patient-specific HLA alleles. We applied our computational pipeline to 91 chronic lymphocytic leukemias (CLLs) that underwent whole-exome sequencing (WES). We predicted ∼22 mutated HLA-binding peptides per leukemia (derived from ∼16 missense mutations) and experimentally confirmed HLA binding for ∼55% of such peptides. Two CLL patients that achieved long-term remission following allogeneic hematopoietic stem cell transplantation were monitored for CD8+ T-cell responses against predicted or confirmed HLA-binding peptides. Long-lived cytotoxic T-cell responses were detected against peptides generated from personal tumor mutations in ALMS1, C6ORF89, and FNDC3B presented on tumor cells. Finally, we applied our computational pipeline to WES data (N = 2488 samples) across 13 different cancer types and estimated dozens to thousands of predicted neoantigens per individual tumor, suggesting that neoantigens are frequent in most tumors.
•Tumor neoantigens are a promising class of immunogens based on exquisite tumor specificity and the lack of central tolerance against them.•Massively parallel DNA sequencing with class I prediction enables systematic identification of tumor neoepitopes (including from CLL).