The risks associated with re-identification of human genetic data are severely limiting open data sharing in life sciences, even in studies where donor-related genetic variant information is not of ...primary interest. Here, we developed BAMboozle, a versatile tool to eliminate critical types of sensitive genetic information in human sequence data by reverting aligned reads to the genome reference sequence. Applying BAMboozle to functional genomics data, such as single-cell RNA-seq (scRNA-seq) and scATAC-seq datasets, confirmed the removal of donor-related single nucleotide polymorphisms (SNPs) and indels in a manner that did not disclose the altered positions. Importantly, BAMboozle only removes the genetic sequence variants of the sample (i.e., donor) while preserving other important aspects of the raw sequence data. For example, BAMboozled scRNA-seq data contained accurate cell-type associated gene expression signatures, splice kinetic information, and can be used for methods benchmarking. Altogether, BAMboozle efficiently removes genetic variation in aligned sequence data, which represents a step forward towards open data sharing in many areas of genomics where the genetic variant information is not of primary interest.
Large-scale sequencing of RNA from individual cells can reveal patterns of gene, isoform and allelic expression across cell types and states
. However, current short-read single-cell RNA-sequencing ...methods have limited ability to count RNAs at allele and isoform resolution, and long-read sequencing techniques lack the depth required for large-scale applications across cells
. Here we introduce Smart-seq3, which combines full-length transcriptome coverage with a 5' unique molecular identifier RNA counting strategy that enables in silico reconstruction of thousands of RNA molecules per cell. Of the counted and reconstructed molecules, 60% could be directly assigned to allelic origin and 30-50% to specific isoforms, and we identified substantial differences in isoform usage in different mouse strains and human cell types. Smart-seq3 greatly increased sensitivity compared to Smart-seq2, typically detecting thousands more transcripts per cell. We expect that Smart-seq3 will enable large-scale characterization of cell types and states across tissues and organisms.
Random monoallelic expression (RME) of genes represents a striking example of how stochastic molecular processes can result in cellular heterogeneity. Recent transcriptome-wide studies have revealed ...both mitotically stable and cell-to-cell dynamic forms of autosomal RME, with the latter presumably resulting from burst-like stochastic transcription. Here, we discuss the distinguishing features of these two forms of RME and revisit literature on their nature, pervasiveness and regulation. Finally, we explore how RME may contribute to phenotypic variation, including the incomplete penetrance and variable expressivity often seen in genetic disease.
Expression from both alleles is generally observed in analyses of diploid cell populations, but studies addressing allelic expression patterns genome-wide in single cells are lacking. Here, we ...present global analyses of allelic expression across individual cells of mouse preimplantation embryos of mixed background (CAST/EiJ × C57BL/6J). We discovered abundant (12 to 24%) monoallelic expression of autosomal genes and that expression of the two alieles occurs independently. The monoallelic expression appeared random and dynamic because there was considerable variation among closely related embryonic cells. Similar patterns of monoallelic expression were observed in mature cells. Our allelic expression analysis also demonstrates the de novo inactivation of the paternal × chromosome. We conclude that independent and stochastic allelic transcription generates abundant random monoallelic expression in the mammalian cell.
Notch signaling is evolutionarily conserved and operates in many cell types and at various stages during development. Notch signaling must therefore be able to generate appropriate signaling outputs ...in a variety of cellular contexts. This need for versatility in Notch signaling is in apparent contrast to the simple molecular design of the core pathway. Here, we review recent studies in nematodes, Drosophila and vertebrate systems that begin to shed light on how versatility in Notch signaling output is generated, how signal strength is modulated, and how cross-talk between the Notch pathway and other intracellular signaling systems, such as the Wnt, hypoxia and BMP pathways, contributes to signaling diversity.
Genome-wide transcriptome analyses are routinely used to monitor tissue-, disease- and cell type–specific gene expression, but it has been technically challenging to generate expression profiles from ...single cells. Here we describe a robust mRNA-Seq protocol (Smart-Seq) that is applicable down to single cell levels. Compared with existing methods, Smart-Seq has improved read coverage across transcripts, which enhances detailed analyses of alternative transcript isoforms and identification of single-nucleotide polymorphisms. We determined the sensitivity and quantitative accuracy of Smart-Seq for single-cell transcriptomics by evaluating it on total RNA dilution series. We found that although gene expression estimates from single cells have increased noise, hundreds of differentially expressed genes could be identified using few cells per cell type. Applying Smart-Seq to circulating tumor cells from melanomas, we identified distinct gene expression patterns, including candidate biomarkers for melanoma circulating tumor cells. Our protocol will be useful for addressing fundamental biological problems requiring genome-wide transcriptome profiling in rare cells.
Skin homeostasis is orchestrated by dozens of cell types that together direct stem cell renewal, lineage commitment, and differentiation. Here, we use single-cell RNA sequencing and single-molecule ...RNA FISH to provide a systematic molecular atlas of full-thickness skin, determining gene expression profiles and spatial locations that define 56 cell types and states during hair growth and rest. These findings reveal how the outer root sheath (ORS) and inner hair follicle layers coordinate hair production. We found that the ORS is composed of two intermingling but transcriptionally distinct cell types with differing capacities for interactions with stromal cell types. Inner layer cells branch from transcriptionally uncommitted progenitors, and each lineage differentiation passes through an intermediate state. We also provide an online tool to explore this comprehensive skin cell atlas, including epithelial and stromal cells such as fibroblasts, vascular, and immune cells, to spur further discoveries in skin biology.
Display omitted
•Deconstruction of full-thickness skin by single-cell RNA-seq and in situ RNA staining•Basal ORS, suprabasal ORS, companion layer, and LPC cells constitute outer HF layers•Inner HF layers form from unfated progenitors and mature via intermediate states•Fibroblasts spatiotemporally separate into three major subtypes and one cell state
Joost et al. use single-cell RNA-seq and in situ mRNA staining to characterize mouse skin during hair growth and rest. 56 identified epithelial and stromal cell populations reveal unprecedented molecular details of cell types and states coordinating hair growth, underlying progenitor commitment and lineage differentiation, spatiotemporal fibroblast heterogeneity, and potential epithelial-stromal interactions.
Mouse studies have been instrumental in forming our current understanding of early cell-lineage decisions; however, similar insights into the early human development are severely limited. Here, we ...present a comprehensive transcriptional map of human embryo development, including the sequenced transcriptomes of 1,529 individual cells from 88 human preimplantation embryos. These data show that cells undergo an intermediate state of co-expression of lineage-specific genes, followed by a concurrent establishment of the trophectoderm, epiblast, and primitive endoderm lineages, which coincide with blastocyst formation. Female cells of all three lineages achieve dosage compensation of X chromosome RNA levels prior to implantation. However, in contrast to the mouse, XIST is transcribed from both alleles throughout the progression of this expression dampening, and X chromosome genes maintain biallelic expression while dosage compensation proceeds. We envision broad utility of this transcriptional atlas in future studies on human development as well as in stem cell research.
Display omitted
•Transcriptomes of 1,529 individual cells from 88 human preimplantation embryos•Lineage segregation of trophectoderm, primitive endoderm, and pluripotent epiblast•X chromosome dosage compensation in the human blastocyst
A comprehensive transcriptional map of human preimplantation development reveals a concurrent establishment of trophectoderm, epiblast, and primitive endoderm lineages and unique features of X chromosome dosage compensation in human.
•We examine outcome additionality of prestigious early-stage government subsidies.•We use a novel matching approach when comparing approved and rejected applications.•Subsidized new ventures attract ...more human and financial resources than others.•These resources in turn have long-term and substantial influence on performance.•The result is explained by signaling effects.
This paper examines the outcome additionality of prestigious early-stage government subsidies. Drawing on arguments from liabilities of newness and certification literatures we develop a mediated model that unpacks the outcome additionality of the subsidy. We hypothesize that subsidized new ventures attract more human and financial capital than their non-subsidized counterparts because the association with a prestigious government organization signals legitimacy of the new venture. Such legitimacy is crucial for attracting qualified employees and financiers. The effect of the access to human and financial capital, in turn, has long-term and substantial influence on performance, whereas the effect of the subsidy itself is marginal and short-lived. Applying a novel matching approach, we compare 130 approved applicants of a prestigious government subsidy with a control group of 154 applications rejected at the very last stage, thereby overcoming some of the selection and endogeneity biases associated with similar studies. The hypothesized model receives strong support by the data. These findings have several implications for government support of new ventures as well as scholars in the field.
Massively parallel DNA sequencing of thousands of samples in a single machine-run is now possible, but the preparation of the individual sequencing libraries is expensive and time-consuming. ...Tagmentation-based library construction, using the Tn5 transposase, is efficient for generating sequencing libraries but currently relies on undisclosed reagents, which severely limits development of novel applications and the execution of large-scale projects. Here, we present simple and robust procedures for Tn5 transposase production and optimized reaction conditions for tagmentation-based sequencing library construction. We further show how molecular crowding agents both modulate library lengths and enable efficient tagmentation from subpicogram amounts of cDNA. The comparison of single-cell RNA-sequencing libraries generated using produced and commercial Tn5 demonstrated equal performances in terms of gene detection and library characteristics. Finally, because naked Tn5 can be annealed to any oligonucleotide of choice, for example, molecular barcodes in single-cell assays or methylated oligonucleotides for bisulfite sequencing, custom Tn5 production and tagmentation enable innovation in sequencing-based applications.