Cells repair DNA double-strand breaks (DSBs) through a complex set of pathways critical for maintaining genomic integrity. To systematically map these pathways, we developed a high-throughput ...screening approach called Repair-seq that measures the effects of thousands of genetic perturbations on mutations introduced at targeted DNA lesions. Using Repair-seq, we profiled DSB repair products induced by two programmable nucleases (Cas9 and Cas12a) in the presence or absence of oligonucleotides for homology-directed repair (HDR) after knockdown of 476 genes involved in DSB repair or associated processes. The resulting data enabled principled, data-driven inference of DSB end joining and HDR pathways. Systematic interrogation of this data uncovered unexpected relationships among DSB repair genes and demonstrated that repair outcomes with superficially similar sequence architectures can have markedly different genetic dependencies. This work provides a foundation for mapping DNA repair pathways and for optimizing genome editing across diverse modalities.
Display omitted
•Repair-seq maps the genetic dependencies of DNA repair outcomes•High-resolution signatures of gene function identify unexpected gene relationships•DSB-induced mutations with similar sequences can result from distinct mechanisms•Repair-seq can be adapted to study a broad range of genome editing tools
Measuring the effects of many genetic perturbations on the spectrum of mutations produced at targeted DNA breaks allows systematic mapping of DNA repair pathways.
Programmable C•G-to-G•C base editors (CGBEs) have broad scientific and therapeutic potential, but their editing outcomes have proved difficult to predict and their editing efficiency and product ...purity are often low. We describe a suite of engineered CGBEs paired with machine learning models to enable efficient, high-purity C•G-to-G•C base editing. We performed a CRISPR interference (CRISPRi) screen targeting DNA repair genes to identify factors that affect C•G-to-G•C editing outcomes and used these insights to develop CGBEs with diverse editing profiles. We characterized ten promising CGBEs on a library of 10,638 genomically integrated target sites in mammalian cells and trained machine learning models that accurately predict the purity and yield of editing outcomes (R = 0.90) using these data. These CGBEs enable correction to the wild-type coding sequence of 546 disease-related transversion single-nucleotide variants (SNVs) with >90% precision (mean 96%) and up to 70% efficiency (mean 14%). Computational prediction of optimal CGBE-single-guide RNA pairs enables high-purity transversion base editing at over fourfold more target sites than achieved using any single CGBE variant.
Eukaryotic cells execute complex transcriptional programs in which specific loci throughout the genome are regulated in distinct ways by targeted regulatory assemblies. We have applied this principle ...to generate synthetic CRISPR-based transcriptional programs in yeast and human cells. By extending guide RNAs to include effector protein recruitment sites, we construct modular scaffold RNAs that encode both target locus and regulatory action. Sets of scaffold RNAs can be used to generate synthetic multigene transcriptional programs in which some genes are activated and others are repressed. We apply this approach to flexibly redirect flux through a complex branched metabolic pathway in yeast. Moreover, these programs can be executed by inducing expression of the dCas9 protein, which acts as a single master regulatory control point. CRISPR-associated RNA scaffolds provide a powerful way to construct synthetic gene expression programs for a wide range of applications, including rewiring cell fates or engineering metabolic pathways.
Display omitted
•CRISPR scaffold RNAs (scRNAs) encode both target locus and regulatory function•scRNAs function efficiently in mammalian and yeast cells•scRNAs enable transcription programs with simultaneous activation and repression•Combinatorial control of multiple genes enables flexible pathway manipulation
Modular CRISPR RNA scaffolds engineered to encode both guides to a target locus and recruitment of transcriptional regulators allow simultaneous gene activation and repression of multiple different genes in eukaryotic cells, greatly expanding the synthetic biology toolkit.
Functional genomics efforts face tradeoffs between number of perturbations examined and complexity of phenotypes measured. We bridge this gap with Perturb-seq, which combines droplet-based ...single-cell RNA-seq with a strategy for barcoding CRISPR-mediated perturbations, allowing many perturbations to be profiled in pooled format. We applied Perturb-seq to dissect the mammalian unfolded protein response (UPR) using single and combinatorial CRISPR perturbations. Two genome-scale CRISPR interference (CRISPRi) screens identified genes whose repression perturbs ER homeostasis. Subjecting ∼100 hits to Perturb-seq enabled high-precision functional clustering of genes. Single-cell analyses decoupled the three UPR branches, revealed bifurcated UPR branch activation among cells subject to the same perturbation, and uncovered differential activation of the branches across hits, including an isolated feedback loop between the translocon and IRE1α. These studies provide insight into how the three sensors of ER homeostasis monitor distinct types of stress and highlight the ability of Perturb-seq to dissect complex cellular responses.
Display omitted
•Perturb-seq allows parallel screening with rich phenotypic output from single cells•Simultaneous delivery and identification of up to three CRISPR perturbations•Genome-scale screens dissect the mammalian unfolded protein response•Analytical methods separate perturbation responses from confounding effects
A strategy for barcoding CRISPR-mediated perturbations allows pooled expression profiling via single-cell RNA sequencing. Application to the mammalian unfolded protein response then enabled systematic delineation of the transcriptional arms of the response and functional clustering of genes affecting ER homeostasis.
We recently found that nucleosomes directly block access of CRISPR/Cas9 to DNA (Horlbeck et al., 2016). Here, we build on this observation with a comprehensive algorithm that incorporates chromatin, ...position, and sequence features to accurately predict highly effective single guide RNAs (sgRNAs) for targeting nuclease-dead Cas9-mediated transcriptional repression (CRISPRi) and activation (CRISPRa). We use this algorithm to design next-generation genome-scale CRISPRi and CRISPRa libraries targeting human and mouse genomes. A CRISPRi screen for essential genes in K562 cells demonstrates that the large majority of sgRNAs are highly active. We also find CRISPRi does not exhibit any detectable non-specific toxicity recently observed with CRISPR nuclease approaches. Precision-recall analysis shows that we detect over 90% of essential genes with minimal false positives using a compact 5 sgRNA/gene library. Our results establish CRISPRi and CRISPRa as premier tools for loss- or gain-of-function studies and provide a general strategy for identifying Cas9 target sites.
The human genome produces thousands of long noncoding RNAs (lncRNAs)-transcripts >200 nucleotides long that do not encode proteins. Although critical roles in normal biology and disease have been ...revealed for a subset of lncRNAs, the function of the vast majority remains untested. We developed a CRISPR interference (CRISPRi) platform targeting 16,401 lncRNA loci in seven diverse cell lines, including six transformed cell lines and human induced pluripotent stem cells (iPSCs). Large-scale screening identified 499 lncRNA loci required for robust cellular growth, of which 89% showed growth-modifying function exclusively in one cell type. We further found that lncRNA knockdown can perturb complex transcriptional networks in a cell type-specific manner. These data underscore the functional importance and cell type specificity of many lncRNAs.
Noncoding mutations in cancer genomes are frequent but challenging to interpret. PVT1 encodes an oncogenic lncRNA, but recurrent translocations and deletions in human cancers suggest alternative ...mechanisms. Here, we show that the PVT1 promoter has a tumor-suppressor function that is independent of PVT1 lncRNA. CRISPR interference of PVT1 promoter enhances breast cancer cell competition and growth in vivo. The promoters of the PVT1 and the MYC oncogenes, located 55 kb apart on chromosome 8q24, compete for engagement with four intragenic enhancers in the PVT1 locus, thereby allowing the PVT1 promoter to regulate pause release of MYC transcription. PVT1 undergoes developmentally regulated monoallelic expression, and the PVT1 promoter inhibits MYC expression only from the same chromosome via promoter competition. Cancer genome sequencing identifies recurrent mutations encompassing the human PVT1 promoter, and genome editing verified that PVT1 promoter mutation promotes cancer cell growth. These results highlight regulatory sequences of lncRNA genes as potential disease-associated DNA elements.
Display omitted
•Silencing PVT1 promoter enhances breast cancer cell competition•PVT1 promoter inhibits MYC transcription independent of PVT1 lncRNA•PVT1 and MYC promoters compete for enhancer contact in cis•Mutations encompassing PVT1 promoter are recurrent in human cancers
Recurrent mutations in human cancer are found encompassing the promotor for the lncRNA gene PVT1, which regulates MYC transcription via promoter competition for a shared set of enhancers.
A long-standing challenge in drug development is the identification of the mechanisms of action of small molecules with therapeutic potential. A number of methods have been developed to address this ...challenge, each with inherent strengths and limitations. We here provide a brief review of these methods with a focus on chemical-genetic methods that are based on systematically profiling the effects of genetic perturbations on drug sensitivity. In particular, application of these methods to mammalian systems has been facilitated by the recent advent of CRISPR-based approaches, which enable one to readily repress, induce, or delete a given gene and determine the resulting effects on drug sensitivity.
Next-generation sequencing (NGS) informs many biological questions with unprecedented depth and nucleotide resolution. These assays have created a need for analytical tools that enable users to ...manipulate data nucleotide-by-nucleotide robustly and easily. Furthermore, because many NGS assays encode information jointly within multiple properties of read alignments - for example, in ribosome profiling, the locations of ribosomes are jointly encoded in alignment coordinates and length - analytical tools are often required to extract the biological meaning from the alignments before analysis. Many assay-specific pipelines exist for this purpose, but there remains a need for user-friendly, generalized, nucleotide-resolution tools that are not limited to specific experimental regimes or analytical workflows.
Plastid is a Python library designed specifically for nucleotide-resolution analysis of genomics and NGS data. As such, Plastid is designed to extract assay-specific information from read alignments while retaining generality and extensibility to novel NGS assays. Plastid represents NGS and other biological data as arrays of values associated with genomic or transcriptomic positions, and contains configurable tools to convert data from a variety of sources to such arrays. Plastid also includes numerous tools to manipulate even discontinuous genomic features, such as spliced transcripts, with nucleotide precision. Plastid automatically handles conversion between genomic and feature-centric coordinates, accounting for splicing and strand, freeing users of burdensome accounting. Finally, Plastid's data models use consistent and familiar biological idioms, enabling even beginners to develop sophisticated analytical workflows with minimal effort.
Plastid is a versatile toolkit that has been used to analyze data from multiple NGS assays, including RNA-seq, ribosome profiling, and DMS-seq. It forms the genomic engine of our ORF annotation tool, ORF-RATER, and is readily adapted to novel NGS assays. Examples, tutorials, and extensive documentation can be found at https://plastid.readthedocs.io .
Recent studies of transcription have revealed a level of complexity not previously appreciated even a few years ago, both in the intricate use of post-initiation control and the mass production of ...rapidly degraded transcripts. Dissection of these pathways requires strategies for precisely following transcripts as they are being produced. Here we present an approach (native elongating transcript sequencing, NET-seq), based on deep sequencing of 3' ends of nascent transcripts associated with RNA polymerase, to monitor transcription at nucleotide resolution. Application of NET-seq in Saccharomyces cerevisiae reveals that although promoters are generally capable of divergent transcription, the Rpd3S deacetylation complex enforces strong directionality to most promoters by suppressing antisense transcript initiation. Our studies also reveal pervasive polymerase pausing and backtracking throughout the body of transcripts. Average pause density shows prominent peaks at each of the first four nucleosomes, with the peak location occurring in good agreement with in vitro biophysical measurements. Thus, nucleosome-induced pausing represents a major barrier to transcriptional elongation in vivo.