New normal linear modeling strategies are presented for analyzing read counts from RNA-seq experiments. The voom method estimates the mean-variance relationship of the log-counts, generates a ...precision weight for each observation and enters these into the limma empirical Bayes analysis pipeline. This opens access for RNA-seq analysts to a large body of methodology developed for microarrays. Simulation studies show that voom performs as well or better than count-based RNA-seq methods even when the data are generated according to the assumptions of the earlier methods. Two case studies illustrate the use of linear modeling and gene set testing methods.
RNA sequencing (RNA-seq) has been rapidly adopted for the profiling of transcriptomes in many areas of biology, including studies into gene regulation, development and disease. Of particular interest ...is the discovery of differentially expressed genes across different conditions (e.g., tissues, perturbations) while optionally adjusting for other systematic factors that affect the data-collection process. There are a number of subtle yet crucial aspects of these analyses, such as read counting, appropriate treatment of biological variability, quality control checks and appropriate setup of statistical modeling. Several variations have been presented in the literature, and there is a need for guidance on current best practices. This protocol presents a state-of-the-art computational and statistical RNA-seq differential expression analysis workflow largely based on the free open-source R language and Bioconductor software and, in particular, on two widely used tools, DESeq and edgeR. Hands-on time for typical small experiments (e.g., 4-10 samples) can be <1 h, with computation time <1 d using a standard desktop PC.
To examine global changes in breast heterogeneity across different states, we determined the single‐cell transcriptomes of > 340,000 cells encompassing normal breast, preneoplastic BRCA1+/– tissue, ...the major breast cancer subtypes, and pairs of tumors and involved lymph nodes. Elucidation of the normal breast microenvironment revealed striking changes in the stroma of post‐menopausal women. Single‐cell profiling of 34 treatment‐naive primary tumors, including estrogen receptor (ER)+, HER2+, and triple‐negative breast cancers, revealed comparable diversity among cancer cells and a discrete subset of cycling cells. The transcriptomes of preneoplastic BRCA1+/– tissue versus tumors highlighted global changes in the immune microenvironment. Within the tumor immune landscape, proliferative CD8+ T cells characterized triple‐negative and HER2+ cancers but not ER+ tumors, while all subtypes comprised cycling tumor‐associated macrophages, thus invoking potentially different immunotherapy targets. Copy number analysis of paired ER+ tumors and lymph nodes indicated seeding by genetically distinct clones or mass migration of primary tumor cells into axillary lymph nodes. This large‐scale integration of patient samples provides a high‐resolution map of cell diversity in normal and cancerous human breast.
Synopsis
To examine global changes in breast heterogeneity across different states, this gene expression resource integrates large‐scale patient samples from diverse tissue states and breast cancer subtypes, offering a refined high‐resolution map of cell diversity in the normal and cancerous human mammary gland.
Single‐cell transcriptome analyses profile > 340,000 cells encompassing normal breast, preneoplastic BRCA1+/– tissue, the major breast cancer subtypes, and metastatic lymph nodes.
Pre‐ to post‐menopause transition is associated with marked stromal changes, with decreased PDGFRb and matrix‐associated genes in fibroblasts.
Progression from preneoplasia to tumors correlates with increased immune infiltration in BRCA1 mutation carriers.
Tumor epithelial compartments show comparable diversity in different breast cancer subtypes.
Cycling CD8+ T‐cells are reduced in estrogen receptor (ER)+ tumors, suggesting different immunoregulatory patterns.
Both clonal selection and mass migration contribute to lymph node metastases in patients with ER+ cancer.
A large‐scale gene expression resource integrates diverse tissue samples and reveals unexpected heterogeneity of breast cancer subtypes.
Bone marrow is a preferred metastatic site for multiple solid tumours and is associated with poor prognosis and significant morbidity. Accumulating evidence indicates that cancer cells colonise ...specialised niches within the bone marrow to support their long-term propagation, but the precise location and mechanisms that mediate niche interactions are unknown. Using breast cancer as a model of solid tumour metastasis to the bone marrow, we applied large-scale quantitative three-dimensional imaging to characterise temporal changes in the bone marrow microenvironment during disease progression. We show that mouse mammary tumour cells preferentially home to a pre-existing metaphyseal domain enriched for type H vessels. Metastatic lesion outgrowth rapidly remodelled the local vasculature through extensive sprouting to establish a tumour-supportive microenvironment. The evolution of this tumour microenvironment reflects direct remodelling of the vascular endothelium through tumour-derived granulocyte-colony stimulating factor (G-CSF) in a hematopoietic cell-independent manner. Therapeutic targeting of the metastatic niche by blocking G-CSF receptor inhibited pathological blood vessel remodelling and reduced bone metastasis burden. These findings elucidate a mechanism of 'host' microenvironment hijacking by mammary tumour cells to subvert the local microvasculature to form a specialised, pro-tumorigenic niche.
In recent years, RNA sequencing (RNA-seq) has become a very widely used technology for profiling gene expression. One of the most common aims of RNA-seq profiling is to identify genes or molecular ...pathways that are differentially expressed (DE) between two or more biological conditions. This article demonstrates a computational workflow for the detection of DE genes and pathways from RNA-seq data by providing a complete analysis of an RNA-seq experiment profiling epithelial cell subsets in the mouse mammary gland. The workflow uses R software packages from the open-source Bioconductor project and covers all steps of the analysis pipeline, including alignment of read sequences, data exploration, differential expression analysis, visualization and pathway analysis. Read alignment and count quantification is conducted using the Rsubread package and the statistical analyses are performed using the edgeR package. The differential expression analysis uses the quasi-likelihood functionality of edgeR.
The mammary epithelium comprises two primary cellular lineages, but the degree of heterogeneity within these compartments and their lineage relationships during development remain an open question. ...Here we report single-cell RNA profiling of mouse mammary epithelial cells spanning four developmental stages in the post-natal gland. Notably, the epithelium undergoes a large-scale shift in gene expression from a relatively homogeneous basal-like program in pre-puberty to distinct lineage-restricted programs in puberty. Interrogation of single-cell transcriptomes reveals different levels of diversity within the luminal and basal compartments, and identifies an early progenitor subset marked by CD55. Moreover, we uncover a luminal transit population and a rare mixed-lineage cluster amongst basal cells in the adult mammary gland. Together these findings point to a developmental hierarchy in which a basal-like gene expression program prevails in the early post-natal gland prior to the specification of distinct lineage signatures, and the presence of cellular intermediates that may serve as transit or lineage-primed cells.
Breast tumors are inherently heterogeneous, but the evolving cellular organization through neoplastic progression is poorly understood. Here we report a rapid, large-scale single-cell resolution 3D ...imaging protocol based on a one-step clearing agent that allows visualization of normal tissue architecture and entire tumors at cellular resolution. Imaging of multicolor lineage-tracing models of breast cancer targeted to either basal or luminal progenitor cells revealed profound clonal restriction during progression. Expression profiling of clones arising in Pten/Trp53-deficient tumors identified distinct molecular signatures. Strikingly, most clones harbored cells that had undergone an epithelial-to-mesenchymal transition, indicating widespread, inherent plasticity. Hence, an integrative pipeline that combines lineage tracing, 3D imaging, and clonal RNA sequencing technologies offers a comprehensive path for studying mechanisms underlying heterogeneity in whole tumors.
Display omitted
•A single-step, non-toxic clearing agent for 3D imaging of whole organs and tumors•Derivation of an integrative platform to interrogate intratumoral heterogeneity•Profound clonal restriction occurs during neoplastic progression•The epithelial-mesenchymal transition occurs clonally as a frequent event
Rios et al. develop a rapid, large-scale single-cell resolution 3D imaging protocol and use the protocol together with RNA sequencing to explore the cellular dynamics of mammary tumorigenesis and show that a molecular epithelial-to-mesenchymal transition is a prominent feature of tumor clones.
Breast cancer is a common and highly heterogeneous disease. Understanding cellular diversity in the mammary gland and its surrounding micro-environment across different states can provide insight ...into cancer development in the human breast. Recently, we published a large-scale single-cell RNA expression atlas of the human breast spanning normal, preneoplastic and tumorigenic states. Single-cell expression profiles of nearly 430,000 cells were obtained from 69 distinct surgical tissue specimens from 55 patients. This article extends the study by providing quality filtering thresholds, downstream processed R data objects, complete cell annotation and R code to reproduce all the analyses. Data quality assessment measures are presented and details are provided for all the bioinformatic analyses that produced results described in the study.
Background
Single-cell RNA sequencing (scRNA-seq) technologies have rapidly developed in recent years. The droplet-based single cell platforms enable the profiling of gene expression in tens of ...thousands of cells per sample. The goal of a typical scRNA-seq analysis is to identify different cell subpopulations and their respective marker genes. Additionally, trajectory analysis can be used to infer the developmental or differentiation trajectories of cells.
Methods
This article demonstrates a comprehensive workflow for performing trajectory inference and time course analysis on a multi-sample single-cell RNA-seq experiment of the mouse mammary gland. The workflow uses open-source R software packages and covers all steps of the analysis pipeline, including quality control, doublet prediction, normalization, integration, dimension reduction, cell clustering, trajectory inference, and pseudo-bulk time course analysis. Sample integration and cell clustering follows the Seurat pipeline while the trajectory inference is conducted using the monocle3 package. The pseudo-bulk time course analysis uses the quasi-likelihood framework of edgeR.
Results
Cells are ordered and positioned along a pseudotime trajectory that represented a biological process of cell differentiation and development. The study successfully identified genes that were significantly associated with pseudotime in the mouse mammary gland.
Conclusions
The demonstrated workflow provides a valuable resource for researchers conducting scRNA-seq analysis using open-source software packages. The study successfully demonstrated the usefulness of trajectory analysis for understanding the developmental or differentiation trajectories of cells. This analysis can be applied to various biological processes such as cell development or disease progression, and can help identify potential biomarkers or therapeutic targets.
Mutations in genes encoding general transcription factors cause neurological disorders. Despite clinical prominence, the consequences of defects in the basal transcription machinery during brain ...development are unclear. We found that loss of the TATA-box binding protein-associated factor TAF8, a component of the general transcription factor TFIID, in the developing central nervous system affected the expression of many, but notably not all genes. Taf8 deletion caused apoptosis, unexpectedly restricted to forebrain regions. Nuclear levels of the transcription factor p53 were elevated in the absence of TAF8, as were the mRNAs of the pro-apoptotic p53 target genes Noxa, Puma and Bax. The cell death in Taf8 forebrain regions was completely rescued by additional loss of p53, but Taf8 and p53 brains failed to initiate a neuronal expression program. Taf8 deletion caused aberrant transcription of promoter regions and splicing anomalies. We propose that TAF8 supports the directionality of transcription and co-transcriptional splicing, and that failure of these processes causes p53-induced apoptosis of neuronal cells in the developing mouse embryo.