Genome-wide chromatin annotations have permitted the mapping of putative regulatory elements across multiple human cell types. However, their experimental dissection by directed regulatory motif ...disruption has remained unfeasible at the genome scale. Here, we use a massively parallel reporter assay (MPRA) to measure the transcriptional levels induced by 145-bp DNA segments centered on evolutionarily conserved regulatory motif instances within enhancer chromatin states. We select five predicted activators (HNF1, HNF4, FOXA, GATA, NFE2L2) and two predicted repressors (GFI1, ZFP161) and measure reporter expression in erythroleukemia (K562) and liver carcinoma (HepG2) cell lines. We test 2104 wild-type sequences and 3314 engineered enhancer variants containing targeted motif disruptions, each using 10 barcode tags and two replicates. The resulting data strongly confirm the enhancer activity and cell-type specificity of enhancer chromatin states, the ability of 145-bp segments to recapitulate both, the necessary role of regulatory motifs in enhancer function, and the complementary roles of activator and repressor motifs. We find statistically robust evidence that (1) disrupting the predicted activator motifs abolishes enhancer function, while silent or motif-improving changes maintain enhancer activity; (2) evolutionary conservation, nucleosome exclusion, binding of other factors, and strength of the motif match are predictive of enhancer activity; (3) scrambling repressor motifs leads to aberrant reporter expression in cell lines where the enhancers are usually inactive. Our results suggest a general strategy for deciphering cis-regulatory elements by systematic large-scale manipulation and provide quantitative enhancer activity measurements across thousands of constructs that can be mined to develop predictive models of gene expression.
Genome-wide association studies (GWAS) have successfully identified thousands of associations between common genetic variants and human disease phenotypes, but the majority of these variants are ...non-coding, often requiring genetic fine-mapping, epigenomic profiling, and individual reporter assays to delineate potential causal variants. We employ a massively parallel reporter assay (MPRA) to simultaneously screen 2,756 variants in strong linkage disequilibrium with 75 sentinel variants associated with red blood cell traits. We show that this assay identifies elements with endogenous erythroid regulatory activity. Across 23 sentinel variants, we conservatively identified 32 MPRA functional variants (MFVs). We used targeted genome editing to demonstrate endogenous enhancer activity across 3 MFVs that predominantly affect the transcription of SMIM1, RBM38, and CD164. Functional follow-up of RBM38 delineates a key role for this gene in the alternative splicing program occurring during terminal erythropoiesis. Finally, we provide evidence for how common GWAS-nominated variants can disrupt cell-type-specific transcriptional regulatory pathways.
Display omitted
•A massively parallel reporter assay was developed to screen for functional variation•Variants identified by this assay are enriched for orthogonal measures of function•Functional GWAS variants alter activity of master transcription factors•The target gene RBM38 was linked to its GWAS phenotype and regulates mRNA splicing
A cost-effective, scalable, and allele-specific assay is used to systematically screen for functional non-coding genetic variation affecting red blood cell traits.
Massively parallel reporter assays (MPRAs) enable nucleotide-resolution dissection of transcriptional regulatory regions, such as enhancers, but only few regions at a time. Here we present a combined ...experimental and computational approach, Systematic high-resolution activation and repression profiling with reporter tiling using MPRA (Sharpr-MPRA), that allows high-resolution analysis of thousands of regions simultaneously. Sharpr-MPRA combines dense tiling of overlapping MPRA constructs with a probabilistic graphical model to recognize functional regulatory nucleotides, and to distinguish activating and repressive nucleotides, using their inferred contribution to reporter gene expression. We used Sharpr-MPRA to test 4.6 million nucleotides spanning 15,000 putative regulatory regions tiled at 5-nucleotide resolution in two human cell types. Our results recovered known cell-type-specific regulatory motifs and evolutionarily conserved nucleotides, and distinguished known activating and repressive motifs. Our results also showed that endogenous chromatin state and DNA accessibility are both predictive of regulatory function in reporter assays, identified retroviral elements with activating roles, and uncovered 'attenuator' motifs with repressive roles in active chromatin.
Learning to read and write the transcriptional regulatory code is of central importance to progress in genetic analysis and engineering. Here we describe a massively parallel reporter assay (MPRA) ...that facilitates the systematic dissection of transcriptional regulatory elements. In MPRA, microarray-synthesized DNA regulatory elements and unique sequence tags are cloned into plasmids to generate a library of reporter constructs. These constructs are transfected into cells and tag expression is assayed by high-throughput sequencing. We apply MPRA to compare >27,000 variants of two inducible enhancers in human cells: a synthetic cAMP-regulated enhancer and the virus-inducible interferon-β enhancer. We first show that the resulting data define accurate maps of functional transcription factor binding sites in both enhancers at single-nucleotide resolution. We then use the data to train quantitative sequence-activity models (QSAMs) of the two enhancers. We show that QSAMs from two cellular states can be combined to design enhancer variants that optimize potentially conflicting objectives, such as maximizing induced activity while minimizing basal activity.
Enhancers regulate gene expression through the binding of sequence-specific transcription factors (TFs) to cognate motifs. Various features influence TF binding and enhancer function—including the ...chromatin state of the genomic locus, the affinities of the binding site, the activity of the bound TFs, and interactions among TFs. However, the precise nature and relative contributions of these features remain unclear. Here, we used massively parallel reporter assays (MPRAs) involving 32,115 natural and synthetic enhancers, together with high-throughput in vivo binding assays, to systematically dissect the contribution of each of these features to the binding and activity of genomic regulatory elements that contain motifs for PPARγ, a TF that serves as a key regulator of adipogenesis. We show that distinct sets of features govern PPARγ binding vs. enhancer activity. PPARγ binding is largely governed by the affinity of the specific motif site and higher-order features of the larger genomic locus, such as chromatin accessibility. In contrast, the enhancer activity of PPARγ binding sites depends on varying contributions from dozens of TFs in the immediate vicinity, including interactions between combinations of these TFs. Different pairs of motifs follow different interaction rules, including subadditive, additive, and superadditive interactions among specific classes of TFs, with both spatially constrained and flexible grammars. Our results provide a paradigm for the systematic characterization of the genomic features underlying regulatory elements, applicable to the design of synthetic regulatory elements or the interpretation of human genetic variation.
Artemisinin-based combination therapies are the first line of treatment for Plasmodium falciparum infections worldwide, but artemisinin resistance has risen rapidly in Southeast Asia over the past ...decade. Mutations in the kelch13 gene have been implicated in this resistance. We used longitudinal genomic surveillance to detect signals in kelch13 and other loci that contribute to artemisinin or partner drug resistance. We retrospectively sequenced the genomes of 194 P. falciparum isolates from five sites in Northwest Thailand, over the period of a rapid increase in the emergence of artemisinin resistance (2001-2014).
We evaluate statistical metrics for temporal change in the frequency of individual SNPs, assuming that SNPs associated with resistance increase in frequency over this period. After Kelch13-C580Y, the strongest temporal change is seen at a SNP in phosphatidylinositol 4-kinase, which is involved in a pathway recently implicated in artemisinin resistance. Furthermore, other loci exhibit strong temporal signatures which warrant further investigation for involvement in artemisinin resistance evolution. Through genome-wide association analysis we identify a variant in a kelch domain-containing gene on chromosome 10 that may epistatically modulate artemisinin resistance.
This analysis demonstrates the potential of a longitudinal genomic surveillance approach to detect resistance-associated gene loci to improve our mechanistic understanding of how resistance develops. Evidence for additional genomic regions outside of the kelch13 locus associated with artemisinin-resistant parasites may yield new molecular markers for resistance surveillance, which may be useful in efforts to reduce the emergence or spread of artemisinin resistance in African parasite populations.
A complex microbiota inhabits various microenvironments of the gut, with some symbiotic bacteria having evolved traits to invade the epithelial mucus layer and reside deep within the intestinal ...tissue of animals. Whether these distinct bacterial communities across gut biogeographies exhibit divergent behaviours is largely unknown. Global transcriptomic analysis to investigate microbial physiology in specific mucosal niches has been hampered technically by an overabundance of host RNA. Here, we employed hybrid selection RNA sequencing (hsRNA-Seq) to enable detailed spatial transcriptomic profiling of a prominent human commensal as it colonizes the colonic lumen, mucus or epithelial tissue of mice. Compared to conventional RNA-Seq, hsRNA-Seq increased reads mapping to the Bacteroides fragilis genome by 48- and 154-fold in mucus and tissue, respectively, allowing for high-fidelity comparisons across biogeographic sites. Near the epithelium, B. fragilis upregulated numerous genes involved in protein synthesis, indicating that bacteria inhabiting the mucosal niche are metabolically active. Further, a specific sulfatase (BF3086) and glycosyl hydrolase (BF3134) were highly induced in mucus and tissue compared to bacteria in the lumen. In-frame deletion of these genes impaired in vitro growth on mucus as a carbon source, as well as mucosal colonization of mice. Mutants in either B. fragilis gene displayed a fitness defect in competing for colonization against bacterial challenge, revealing the importance of site-specific gene expression for robust host-microbial symbiosis. As a versatile tool, hsRNA-Seq can be deployed to explore the in vivo spatial physiology of numerous bacterial pathogens or commensals.
Deep mutational scanning has emerged as a promising tool for mapping sequence-activity relationships in proteins, ribonucleic acid and deoxyribonucleic acid. In this approach, diverse variants of a ...sequence of interest are first ranked according to their activities in a relevant assay, and this ranking is then used to infer the shape of the fitness landscape around the wild-type sequence. Little is currently known, however, about the degree to which such fitness landscapes are dependent on the specific assay conditions from which they are inferred. To explore this issue, we performed comprehensive single-substitution mutational scanning of APH(3')II, a Tn5 transposon-derived kinase that confers resistance to aminoglycoside antibiotics, in Escherichia coli under selection with each of six structurally diverse antibiotics at a range of inhibitory concentrations. We found that the resulting local fitness landscapes showed significant dependence on both antibiotic structure and concentration, and that this dependence can be exploited to guide protein engineering. Specifically, we found that differential analysis of fitness landscapes allowed us to generate synthetic APH(3')II variants with orthogonal substrate specificities.
Targeting genomic loci by massively parallel sequencing requires new methods to enrich templates to be sequenced. We developed a capture method that uses biotinylated RNA 'baits' to fish targets out ...of a 'pond' of DNA fragments. The RNA is transcribed from PCR-amplified oligodeoxynucleotides originally synthesized on a microarray, generating sufficient bait for multiple captures at concentrations high enough to drive the hybridization. We tested this method with 170-mer baits that target >15,000 coding exons (2.5 Mb) and four regions (1.7 Mb total) using Illumina sequencing as read-out. About 90% of uniquely aligning bases fell on or near bait sequence; up to 50% lay on exons proper. The uniformity was such that approximately 60% of target bases in the exonic 'catch', and approximately 80% in the regional catch, had at least half the mean coverage. One lane of Illumina sequence was sufficient to call high-confidence genotypes for 89% of the targeted exon space.
This unit describes a protocol for the targeted enrichment of exons from randomly sheared genomic DNA libraries using an in-solution hybrid selection approach for sequencing on an Illumina Genome ...Analyzer II. The steps for designing and ordering a hybrid selection oligo pool are reviewed, as are critical steps for performing the preparation and hybrid selection of an Illumina paired-end library. Critical parameters, performance metrics, and analysis workflow are discussed.