MicroRNAs (miRNAs) are essential components of gene regulation, but identification of miRNA targets remains a major challenge. Most target prediction and discovery relies on perfect complementarity ...of the miRNA seed to the 3′ untranslated region (UTR). However, it is unclear to what extent miRNAs target sites without seed matches. Here, we performed a transcriptome-wide identification of the endogenous targets of a single miRNA—miR-155—in a genetically controlled manner. We found that approximately 40% of miR-155-dependent Argonaute binding occurs at sites without perfect seed matches. The majority of these noncanonical sites feature extensive complementarity to the miRNA seed with one mismatch. These noncanonical sites confer regulation of gene expression, albeit less potently than canonical sites. Thus, noncanonical miRNA binding sites are widespread, often contain seed-like motifs, and can regulate gene expression, generating a continuum of targeting and regulation.
► Differential CLIP-Seq reveals transcriptome-wide sites of miR-155 binding ► Many miR-155 binding sites are noncanonical and lack a perfect seed match ► Most noncanonical binding sites contain a mismatch to the canonical seed motif ► Noncanonical sites mediate gene regulation, albeit weaker than canonical sites
The detection and quantification of genetic heterogeneity in populations of cells is fundamentally important to diverse fields, ranging from microbial evolution to human cancer genetics. However, ...despite the cost and throughput advances associated with massively parallel sequencing, it remains challenging to reliably detect mutations that are present at a low relative abundance in a given DNA sample. Here we describe smMIP, an assay that combines single molecule tagging with multiplex targeted capture to enable practical and highly sensitive detection of low-frequency or subclonal variation. To demonstrate the potential of the method, we simultaneously resequenced 33 clinically informative cancer genes in eight cell line and 45 clinical cancer samples. Single molecule tagging facilitated extremely accurate consensus calling, with an estimated per-base error rate of 8.4 × 10(-6) in cell lines and 2.6 × 10(-5) in clinical specimens. False-positive mutations in the single molecule consensus base-calls exhibited patterns predominantly consistent with DNA damage, including 8-oxo-guanine and spontaneous deamination of cytosine. Based on mixing experiments with cell line samples, sensitivity for mutations above 1% frequency was 83% with no false positives. At clinically informative sites, we identified seven low-frequency point mutations (0.2%-4.7%), including BRAF p.V600E (melanoma, 0.2% alternate allele frequency), KRAS p.G12V (lung, 0.6%), JAK2 p.V617F (melanoma, colon, two lung, 0.3%-1.4%), and NRAS p.Q61R (colon, 4.7%). We anticipate that smMIP will be broadly adoptable as a practical and effective method for accurately detecting low-frequency mutations in both research and clinical settings.
Next-generation DNA sequencing promises to revolutionize clinical medicine and basic research. However, while this technology has the capacity to generate hundreds of billions of nucleotides of DNA ...sequence in a single experiment, the error rate of ∼1% results in hundreds of millions of sequencing mistakes. These scattered errors can be tolerated in some applications but become extremely problematic when “deep sequencing” genetically heterogeneous mixtures, such as tumors or mixed microbial populations. To overcome limitations in sequencing accuracy, we have developed a method termed Duplex Sequencing. This approach greatly reduces errors by independently tagging and sequencing each of the two strands of a DNA duplex. As the two strands are complementary, true mutations are found at the same position in both strands. In contrast, PCR or sequencing errors result in mutations in only one strand and can thus be discounted as technical error. We determine that Duplex Sequencing has a theoretical background error rate of less than one artifactual mutation per billion nucleotides sequenced. In addition, we establish that detection of mutations present in only one of the two strands of duplex DNA can be used to identify sites of DNA damage. We apply the method to directly assess the frequency and pattern of random mutations in mitochondrial DNA from human cells.
Malformations of cortical development containing dysplastic neuronal and glial elements, including hemimegalencephaly and focal cortical dysplasia, are common causes of intractable paediatric ...epilepsy. In this study we performed multiplex targeted sequencing of 10 genes in the PI3K/AKT pathway on brain tissue from 33 children who underwent surgical resection of dysplastic cortex for the treatment of intractable epilepsy. Sequencing results were correlated with clinical, imaging, pathological and immunohistological phenotypes. We identified mosaic activating mutations in PIK3CA and AKT3 in this cohort, including cancer-associated hotspot PIK3CA mutations in dysplastic megalencephaly, hemimegalencephaly, and focal cortical dysplasia type IIa. In addition, a germline PTEN mutation was identified in a male with hemimegalencephaly but no peripheral manifestations of the PTEN hamartoma tumour syndrome. A spectrum of clinical, imaging and pathological abnormalities was found in this cohort. While patients with more severe brain imaging abnormalities and systemic manifestations were more likely to have detected mutations, routine histopathological studies did not predict mutation status. In addition, elevated levels of phosphorylated S6 ribosomal protein were identified in both neurons and astrocytes of all hemimegalencephaly and focal cortical dysplasia type II specimens, regardless of the presence or absence of detected PI3K/AKT pathway mutations. In contrast, expression patterns of the T308 and S473 phosphorylated forms of AKT and in vitro AKT kinase activities discriminated between mutation-positive dysplasia cortex, mutation-negative dysplasia cortex, and non-dysplasia epilepsy cortex. Our findings identify PI3K/AKT pathway mutations as an important cause of epileptogenic brain malformations and establish megalencephaly, hemimegalencephaly, and focal cortical dysplasia as part of a single pathogenic spectrum.
Exome sequencing studies of autism spectrum disorders (ASDs) have identified many de novo mutations but few recurrently disrupted genes. We therefore developed a modified molecular inversion probe ...method enabling ultra-low-cost candidate gene resequencing in very large cohorts. To demonstrate the power of this approach, we captured and sequenced 44 candidate genes in 2446 ASD probands. We discovered 27 de novo events in 16 genes, 59% of which are predicted to truncate proteins or disrupt splicing. We estimate that recurrent disruptive mutations in six genes—CHD8, DYRK1A, GRIN2B, TBR1, PTEN, and TBL1XR1—may contribute to 1% of sporadic ASDs. Our data support associations between specific genes and reciprocal subphenotypes (CHD8-macrocephaly and DYRK1A-microcephaly) and replicate the importance of a β-catenin—chromatin-remodeling network to ASD etiology.
The functional consequences of genetic variation in mammalian regulatory elements are poorly understood. We report the in vivo dissection of three mammalian enhancers at single-nucleotide resolution ...through a massively parallel reporter assay. For each enhancer, we synthesized a library of >100,000 mutant haplotypes with 2-3% divergence from the wild-type sequence. Each haplotype was linked to a unique sequence tag embedded within a transcriptional cassette. We introduced each enhancer library into mouse liver and measured the relative activities of individual haplotypes en masse by sequencing the transcribed tags. Linear regression analysis yielded highly reproducible estimates of the effect of every possible single-nucleotide change on enhancer activity. The functional consequence of most mutations was modest, with ∼22% affecting activity by >1.2-fold and ∼3% by >2-fold. Several, but not all, positions with higher effects showed evidence for purifying selection, or co-localized with known liver-associated transcription factor binding sites, demonstrating the value of empirical high-resolution functional analysis.
Cell-free DNA (cfDNA) has the potential to inform tumor subtype classification and help guide clinical precision oncology. Here we develop Griffin, a framework for profiling nucleosome protection and ...accessibility from cfDNA to study the phenotype of tumors using as low as 0.1x coverage whole genome sequencing data. Griffin employs a GC correction procedure tailored to variable cfDNA fragment sizes, which generates a better representation of chromatin accessibility and improves the accuracy of cancer detection and tumor subtype classification. We demonstrate estrogen receptor subtyping from cfDNA in metastatic breast cancer. We predict estrogen receptor subtype in 139 patients with at least 5% detectable circulating tumor DNA with an area under the receive operator characteristic curve (AUC) of 0.89 and validate performance in independent cohorts (AUC = 0.96). In summary, Griffin is a framework for accurate tumor subtyping and can be generalizable to other cancer types for precision oncology applications.
The HeLa cell line was established in 1951 from cervical cancer cells taken from a patient, Henrietta Lacks. This was the first successful attempt to immortalize human-derived cells in vitro. The ...robust growth and unrestricted distribution of HeLa cells resulted in its broad adoption--both intentionally and through widespread cross-contamination--and for the past 60 years it has served a role analogous to that of a model organism. The cumulative impact of the HeLa cell line on research is demonstrated by its occurrence in more than 74,000 PubMed abstracts (approximately 0.3%). The genomic architecture of HeLa remains largely unexplored beyond its karyotype, partly because like many cancers, its extensive aneuploidy renders such analyses challenging. We carried out haplotype-resolved whole-genome sequencing of the HeLa CCL-2 strain, examined point- and indel-mutation variations, mapped copy-number variations and loss of heterozygosity regions, and phased variants across full chromosome arms. We also investigated variation and copy-number profiles for HeLa S3 and eight additional strains. We find that HeLa is relatively stable in terms of point variation, with few new mutations accumulating after early passaging. Haplotype resolution facilitated reconstruction of an amplified, highly rearranged region of chromosome 8q24.21 at which integration of the human papilloma virus type 18 (HPV-18) genome occurred and that is likely to be the event that initiated tumorigenesis. We combined these maps with RNA-seq and ENCODE Project data sets to phase the HeLa epigenome. This revealed strong, haplotype-specific activation of the proto-oncogene MYC by the integrated HPV-18 genome approximately 500 kilobases upstream, and enabled global analyses of the relationship between gene dosage and expression. These data provide an extensively phased, high-quality reference genome for past and future experiments relying on HeLa, and demonstrate the value of haplotype resolution for characterizing cancer genomes and epigenomes.
Although ubiquitination plays a critical role in virtually all cellular processes, mechanistic details of ubiquitin (Ub) transfer are still being defined. To identify the molecular determinants ...within E3 ligases that modulate activity, we scored each member of a library of nearly 100,000 protein variants of the murine ubiquitination factor E4B (Ube4b) U-box domain for auto-ubiquitination activity in the presence of the E2 UbcH5c. This assay identified mutations that enhance activity both in vitro and in cellular p53 degradation assays. The activity-enhancing mutations fall into two distinct mechanistic classes: One increases the U-box:E2-binding affinity, and the other allosterically stimulates the formation of catalytically active conformations of the E2∼Ub conjugate. The same mutations enhance E3 activity in the presence of another E2, Ube2w, implying a common allosteric mechanism, and therefore the general applicability of our observations to other E3s. A comparison of the E3 activity with the two different E2s identified an additional variant that exhibits E3:E2 specificity. Our results highlight the general utility of high-throughput mutagenesis in delineating the molecular basis of enzyme activity.
X upregulation in mammals increases levels of expressed X-linked transcripts to compensate for autosomal biallelic expression. Here, we present molecular mechanisms that enhance X expression at ...transcriptional and posttranscriptional levels. Active mouse X-linked promoters are enriched in the initiation form of RNA polymerase II (PolII-S5p) and in specific histone marks, including histone H4 acetylated at lysine 16 (H4K16ac) and histone variant H2AZ. The H4K16 acetyltransferase males absent on the first (MOF), known to mediate the Drosophila X upregulation, is also enriched on the mammalian X. Depletion of MOF or male-specific lethal 1 (MSL1) in mouse ES cells causes a specific decrease in PolII-S5p and in expression of a subset of X-linked genes. Analyses of RNA half-life data sets show increased stability of mammalian X-linked transcripts. Both ancestral X-linked genes, defined as those conserved on chicken autosomes, and newly acquired X-linked genes are upregulated by similar mechanisms but to a different extent, suggesting that subsets of genes are distinctly regulated depending on their evolutionary history.
► PolII-S5p, H4K16ac, H2AZ are high at ancestral and acquired X-linked gene promoters ► MOF knockdown in ES cells lowers H4K16ac and PolII-S5p specially at X-linked genes ► MOF/MSL1 knockdown in ES cells lowers expression of a subset of X-linked genes ► RNA stability is enhanced at ancestral and acquired X-linked genes
Mammalian X-linked gene upregulation occurs to balance biallelic autosomal gene expression. Deng et al. characterize the molecular basis of this, finding that active mouse X-linked promoters are enriched for initiating RNA polymerase II and H4K16 acetylation mediated by MOF, known for Drosophila X upregulation. X-linked transcripts also show increased stability.