Embryonic stem (ES) cells can undergo many aspects of mammalian embryogenesis in vitro
, but their developmental potential is substantially extended by interactions with extraembryonic stem cells, ...including trophoblast stem (TS) cells, extraembryonic endoderm stem (XEN) cells and inducible XEN (iXEN) cells
. Here we assembled stem cell-derived embryos in vitro from mouse ES cells, TS cells and iXEN cells and showed that they recapitulate the development of whole natural mouse embryo in utero up to day 8.5 post-fertilization. Our embryo model displays headfolds with defined forebrain and midbrain regions and develops a beating heart-like structure, a trunk comprising a neural tube and somites, a tail bud containing neuromesodermal progenitors, a gut tube, and primordial germ cells. This complete embryo model develops within an extraembryonic yolk sac that initiates blood island development. Notably, we demonstrate that the neurulating embryo model assembled from Pax6-knockout ES cells aggregated with wild-type TS cells and iXEN cells recapitulates the ventral domain expansion of the neural tube that occurs in natural, ubiquitous Pax6-knockout embryos. Thus, these complete embryoids are a powerful in vitro model for dissecting the roles of diverse cell lineages and genes in development. Our results demonstrate the self-organization ability of ES cells and two types of extraembryonic stem cells to reconstitute mammalian development through and beyond gastrulation to neurulation and early organogenesis.
The underpinnings of cancer metastasis remain poorly understood, in part due to a lack of tools for probing their emergence at high resolution. Here we present macsGESTALT, an inducible ...CRISPR-Cas9-based lineage recorder with highly efficient single-cell capture of both transcriptional and phylogenetic information. Applying macsGESTALT to a mouse model of metastatic pancreatic cancer, we recover ∼380,000 CRISPR target sites and reconstruct dissemination of ∼28,000 single cells across multiple metastatic sites. We find that cells occupy a continuum of epithelial-to-mesenchymal transition (EMT) states. Metastatic potential peaks in rare, late-hybrid EMT states, which are aggressively selected from a predominately epithelial ancestral pool. The gene signatures of these late-hybrid EMT states are predictive of reduced survival in both human pancreatic and lung cancer patients, highlighting their relevance to clinical disease progression. Finally, we observe evidence for in vivo propagation of S100 family gene expression across clonally distinct metastatic subpopulations.
Display omitted
•macsGESTALT is an inducible lineage recorder with efficient capture in single cells•Despite genetic competency, most cancer clones are not metastatic•Metastatic aggression peaks at specific late-hybrid EMT states•Expression of S100 genes is propagated across distinct metastatic subpopulations
Simeonov et al. develop an inducible lineage recorder, enabling simultaneous capture of lineages and transcriptomes from single cells. Lineage reconstruction in a metastatic pancreatic cancer model reveals extensive bottlenecking and subpopulation signaling, as well as specific transcriptional states associated with metastatic aggression and predictive of worse outcomes in human cancer.
Molecular inversion probes (MIPs) enable cost-effective multiplex targeted gene resequencing in large cohorts. However, the design of individual MIPs is a critical parameter governing the performance ...of this technology with respect to capture uniformity and specificity. MIPgen is a user-friendly package that simplifies the process of designing custom MIP assays to arbitrary targets. New logistic and SVM-derived models enable in silico predictions of assay success, and assay redesign exhibits improved coverage uniformity relative to previous methods, which in turn improves the utility of MIPs for cost-effective targeted sequencing for candidate gene validation and for diagnostic sequencing in a clinical setting.
MIPgen is implemented in C++. Source code and accompanying Python scripts are available at http://shendurelab.github.io/MIPGEN/.
Mammalian embryogenesis is characterized by rapid cellular proliferation and diversification. Within a few weeks, a single-cell zygote gives rise to millions of cells expressing a panoply of ...molecular programs. Although intensively studied, a comprehensive delineation of the major cellular trajectories that comprise mammalian development in vivo remains elusive. Here, we set out to integrate several single-cell RNA-sequencing (scRNA-seq) datasets that collectively span mouse gastrulation and organogenesis, supplemented with new profiling of ~150,000 nuclei from approximately embryonic day 8.5 (E8.5) embryos staged in one-somite increments. Overall, we define cell states at each of 19 successive stages spanning E3.5 to E13.5 and heuristically connect them to their pseudoancestors and pseudodescendants. Although constructed through automated procedures, the resulting directed acyclic graph (TOME (trajectories of mammalian embryogenesis)) is largely consistent with our contemporary understanding of mammalian development. We leverage TOME to systematically nominate transcription factors (TFs) as candidate regulators of each cell type's specification, as well as 'cell-type homologs' across vertebrate evolution.
The majority of common variants associated with common diseases, as well as an unknown proportion of causal mutations for rare diseases, fall in noncoding regions of the genome. Although catalogs of ...noncoding regulatory elements are steadily improving, we have a limited understanding of the functional effects of mutations within them. Here, we perform saturation mutagenesis in conjunction with massively parallel reporter assays on 20 disease-associated gene promoters and enhancers, generating functional measurements for over 30,000 single nucleotide substitutions and deletions. We find that the density of putative transcription factor binding sites varies widely between regulatory elements, as does the extent to which evolutionary conservation or integrative scores predict functional effects. These data provide a powerful resource for interpreting the pathogenicity of clinically observed mutations in these disease-associated regulatory elements, and comprise a rich dataset for the further development of algorithms that aim to predict the regulatory effects of noncoding mutations.
Candidate enhancers can be identified on the basis of chromatin modifications, the binding of chromatin modifiers and transcription factors and cofactors, or chromatin accessibility. However, ...validating such candidates as bona fide enhancers requires functional characterization, typically achieved through reporter assays that test whether a sequence can increase expression of a transcriptional reporter via a minimal promoter. A longstanding concern is that reporter assays are mainly implemented on episomes, which are thought to lack physiological chromatin. However, the magnitude and determinants of differences in cis-regulation for regulatory sequences residing in episomes versus chromosomes remain almost completely unknown. To address this systematically, we developed and applied a novel lentivirus-based massively parallel reporter assay (lentiMPRA) to directly compare the functional activities of 2236 candidate liver enhancers in an episomal versus a chromosomally integrated context. We find that the activities of chromosomally integrated sequences are substantially different from the activities of the identical sequences assayed on episomes, and furthermore are correlated with different subsets of ENCODE annotations. The results of chromosomally based reporter assays are also more reproducible and more strongly predictable by both ENCODE annotations and sequence-based models. With a linear model that combines chromatin annotations and sequence information, we achieve a Pearson's R
of 0.362 for predicting the results of chromosomally integrated reporter assays. This level of prediction is better than with either chromatin annotations or sequence information alone and also outperforms predictive models of episomal assays. Our results have broad implications for how cis-regulatory elements are identified, prioritized and functionally validated.
Exome sequencing studies of autism spectrum disorders (ASDs) have identified many de novo mutations but few recurrently disrupted genes. We therefore developed a modified molecular inversion probe ...method enabling ultra-low-cost candidate gene resequencing in very large cohorts. To demonstrate the power of this approach, we captured and sequenced 44 candidate genes in 2446 ASD probands. We discovered 27 de novo events in 16 genes, 59% of which are predicted to truncate proteins or disrupt splicing. We estimate that recurrent disruptive mutations in six genes—CHD8, DYRK1A, GRIN2B, TBR1, PTEN, and TBL1XR1—may contribute to 1% of sporadic ASDs. Our data support associations between specific genes and reciprocal subphenotypes (CHD8-macrocephaly and DYRK1A-microcephaly) and replicate the importance of a β-catenin—chromatin-remodeling network to ASD etiology.
Single-cell combinatorial indexing RNA sequencing (sci-RNA-seq) is a powerful method for recovering gene expression data from an exponentially scalable number of individual cells or nuclei. However, ...sci-RNA-seq is a complex protocol that has historically exhibited variable performance on different tissues, as well as lower sensitivity than alternative methods. Here, we report a simplified, optimized version of the sci-RNA-seq protocol with three rounds of split-pool indexing that is faster, more robust and more sensitive and has a higher yield than the original protocol, with reagent costs on the order of 1 cent per cell or less. The total hands-on time from nuclei isolation to final library preparation takes 2-3 d, depending on the number of samples sharing the experiment. The improvements also allow RNA profiling from tissues rich in RNases like older mouse embryos or adult tissues that were problematic for the original method. We showcase the optimized protocol via whole-organism analysis of an E16.5 mouse embryo, profiling ~380,000 nuclei in a single experiment. Finally, we introduce a 'Tiny-Sci' protocol for experiments in which input material is very limited.