Genome-wide knockout studies, noncoding deletion scans, and other large-scale studies require a simple and lightweight framework that can quickly discover and score thousands of candidate CRISPR ...guides targeting an arbitrary DNA sequence. While several CRISPR web applications exist, there is a need for a high-throughput tool to rapidly discover and process hundreds of thousands of CRISPR targets.
Here, we introduce FlashFry, a fast and flexible command-line tool for characterizing large numbers of CRISPR target sequences. With FlashFry, users can specify an unconstrained number of mismatches to putative off-targets, richly annotate discovered sites, and tag potential guides with commonly used on-target and off-target scoring metrics. FlashFry runs at speeds comparable to commonly used genome-wide sequence aligners, and output is provided as an easy-to-manipulate text file.
FlashFry is a fast and convenient command-line tool to discover and score CRISPR targets within large DNA sequences.
The lineage relationships among the hundreds of cell types generated during development are difficult to reconstruct. A recent method, GESTALT, used CRISPR-Cas9 barcode editing for large-scale ...lineage tracing, but was restricted to early development and did not identify cell types. Here we present scGESTALT, which combines the lineage recording capabilities of GESTALT with cell-type identification by single-cell RNA sequencing. The method relies on an inducible system that enables barcodes to be edited at multiple time points, capturing lineage information from later stages of development. Sequencing of ∼60,000 transcriptomes from the juvenile zebrafish brain identified >100 cell types and marker genes. Using these data, we generate lineage trees with hundreds of branches that help uncover restrictions at the level of cell types, brain regions, and gene expression cascades during differentiation. scGESTALT can be applied to other multicellular organisms to simultaneously characterize molecular identities and lineage histories of thousands of cells during development and disease.
The underpinnings of cancer metastasis remain poorly understood, in part due to a lack of tools for probing their emergence at high resolution. Here we present macsGESTALT, an inducible ...CRISPR-Cas9-based lineage recorder with highly efficient single-cell capture of both transcriptional and phylogenetic information. Applying macsGESTALT to a mouse model of metastatic pancreatic cancer, we recover ∼380,000 CRISPR target sites and reconstruct dissemination of ∼28,000 single cells across multiple metastatic sites. We find that cells occupy a continuum of epithelial-to-mesenchymal transition (EMT) states. Metastatic potential peaks in rare, late-hybrid EMT states, which are aggressively selected from a predominately epithelial ancestral pool. The gene signatures of these late-hybrid EMT states are predictive of reduced survival in both human pancreatic and lung cancer patients, highlighting their relevance to clinical disease progression. Finally, we observe evidence for in vivo propagation of S100 family gene expression across clonally distinct metastatic subpopulations.
Display omitted
•macsGESTALT is an inducible lineage recorder with efficient capture in single cells•Despite genetic competency, most cancer clones are not metastatic•Metastatic aggression peaks at specific late-hybrid EMT states•Expression of S100 genes is propagated across distinct metastatic subpopulations
Simeonov et al. develop an inducible lineage recorder, enabling simultaneous capture of lineages and transcriptomes from single cells. Lineage reconstruction in a metastatic pancreatic cancer model reveals extensive bottlenecking and subpopulation signaling, as well as specific transcriptional states associated with metastatic aggression and predictive of worse outcomes in human cancer.
Multicellular systems develop from single cells through distinct lineages. However, current lineage-tracing approaches scale poorly to whole, complex organisms. Here, we use genome editing to ...progressively introduce and accumulate diverse mutations in a DNA barcode over multiple rounds of cell division. The barcode, an array of clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9 target sites, marks cells and enables the elucidation of lineage relationships via the patterns of mutations shared between cells. In cell culture and zebrafish, we show that rates and patterns of editing are tunable and that thousands of lineage-informative barcode alleles can be generated. By sampling hundreds of thousands of cells from individual zebrafish, we find that most cells in adult organs derive from relatively few embryonic progenitors. In future analyses, genome editing of synthetic target arrays for lineage tracing (GESTALT) can be used to generate large-scale maps of cell lineage in multicellular systems for normal development and disease.
Abstract
Non-homologous end-joining (NHEJ) plays an important role in double-strand break (DSB) repair of DNA. Recent studies have shown that the error patterns of NHEJ are strongly biased by ...sequence context, but these studies were based on relatively few templates. To investigate this more thoroughly, we systematically profiled ∼1.16 million independent mutational events resulting from CRISPR/Cas9-mediated cleavage and NHEJ-mediated DSB repair of 6872 synthetic target sequences, introduced into a human cell line via lentiviral infection. We find that: (i) insertions are dominated by 1 bp events templated by sequence immediately upstream of the cleavage site, (ii) deletions are predominantly associated with microhomology and (iii) targets exhibit variable but reproducible diversity with respect to the number and relative frequency of the mutational outcomes to which they give rise. From these data, we trained a model that uses local sequence context to predict the distribution of mutational outcomes. Exploiting the bias of NHEJ outcomes towards microhomology mediated events, we demonstrate the programming of deletion patterns by introducing microhomology to specific locations in the vicinity of the DSB site. We anticipate that our results will inform investigations of DSB repair mechanisms as well as the design of CRISPR/Cas9 experiments for diverse applications including genome-wide screens, gene therapy, lineage tracing and molecular recording.
Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets ...generated by NGS--the 1000 Genome pilot alone includes nearly five terabases--make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
Brain metastases are associated with a dismal prognosis. Whether brain metastases harbor distinct genetic alterations beyond those observed in primary tumors is unknown. We performed whole-exome ...sequencing of 86 matched brain metastases, primary tumors, and normal tissue. In all clonally related cancer samples, we observed branched evolution, where all metastatic and primary sites shared a common ancestor yet continued to evolve independently. In 53% of cases, we found potentially clinically informative alterations in the brain metastases not detected in the matched primary-tumor sample. In contrast, spatially and temporally separated brain metastasis sites were genetically homogenous. Distal extracranial and regional lymph node metastases were highly divergent from brain metastases. We detected alterations associated with sensitivity to PI3K/AKT/mTOR, CDK, and HER2/EGFR inhibitors in the brain metastases. Genomic analysis of brain metastases provides an opportunity to identify potentially clinically informative alterations not detected in clinically sampled primary tumors, regional lymph nodes, or extracranial metastases.
Decisions for individualized therapies in patients with brain metastasis are often made from primary-tumor biopsies. We demonstrate that clinically actionable alterations present in brain metastases are frequently not detected in primary biopsies, suggesting that sequencing of primary biopsies alone may miss a substantial number of opportunities for targeted therapy.
In Rspondin-based 3D cultures, Lgr5 stem cells from multiple organs form ever-expanding epithelial organoids that retain their tissue identity. We report the establishment of tumor organoid cultures ...from 20 consecutive colorectal carcinoma (CRC) patients. For most, organoids were also generated from adjacent normal tissue. Organoids closely recapitulate several properties of the original tumor. The spectrum of genetic changes within the “living biobank” agrees well with previous large-scale mutational analyses of CRC. Gene expression analysis indicates that the major CRC molecular subtypes are represented. Tumor organoids are amenable to high-throughput drug screens allowing detection of gene-drug associations. As an example, a single organoid culture was exquisitely sensitive to Wnt secretion (porcupine) inhibitors and carried a mutation in the negative Wnt feedback regulator RNF43, rather than in APC. Organoid technology may fill the gap between cancer genetics and patient trials, complement cell-line- and xenograft-based drug studies, and allow personalized therapy design.
Display omitted
Display omitted
•Tumor and normal organoids were derived from colorectal carcinoma patients•Tumor organoids recapitulate somatic copy number and mutation spectra found in CRC•Organoids are amenable to high-throughput drug screening•Patient-derived organoids allow personalized therapy design
3D organoid cultures derived from healthy and tumor tissue from colorectal cancer patients are used for a high throughput drug screen to identify gene-drug associations that may facilitate personalized therapy.
Here, we present ContEst, a tool for estimating the level of cross-individual contamination in next-generation sequencing data. We demonstrate the accuracy of ContEst across a range of contamination ...levels, sources and read depths using sequencing data mixed in silico at known concentrations. We applied our tool to published cancer sequencing datasets and report their estimated contamination levels.
ContEst is a GATK module, and distributed under a BSD style license at http://www.broadinstitute.org/cancer/cga/contest
kcibul@broadinstitute.org; gadgetz@broadinstitute.org
Supplementary data is available at Bioinformatics online.
Recent advances in sequencing technology make it possible to comprehensively catalog genetic variation in population samples, creating a foundation for understanding human disease, ancestry and ...evolution. The amounts of raw data produced are prodigious, and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (i) initial read mapping; (ii) local realignment around indels; (iii) base quality score recalibration; (iv) SNP discovery and genotyping to find all potential variants; and (v) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We here discuss the application of these tools, instantiated in the Genome Analysis Toolkit, to deep whole-genome, whole-exome capture and multi-sample low-pass (∼4×) 1000 Genomes Project datasets.