Gene set scoring provides a useful approach for quantifying concordance between sample transcriptomes and selected molecular signatures. Most methods use information from all samples to score an ...individual sample, leading to unstable scores in small data sets and introducing biases from sample composition (e.g. varying numbers of samples for different cancer subtypes). To address these issues, we have developed a truly single sample scoring method, and associated R/Bioconductor package singscore ( https://bioconductor.org/packages/singscore ).
We use multiple cancer data sets to compare singscore against widely-used methods, including GSVA, z-score, PLAGE, and ssGSEA. Our approach does not depend upon background samples and scores are thus stable regardless of the composition and number of samples being scored. In contrast, scores obtained by GSVA, z-score, PLAGE and ssGSEA can be unstable when less data are available (N
< 25). The singscore method performs as well as the best performing methods in terms of power, recall, false positive rate and computational time, and provides consistently high and balanced performance across all these criteria. To enhance the impact and utility of our method, we have also included a set of functions implementing visual analysis and diagnostics to support the exploration of molecular phenotypes in single samples and across populations of data.
The singscore method described here functions independent of sample composition in gene expression data and thus it provides stable scores, which are particularly useful for small data sets or data integration. Singscore performs well across all performance criteria, and includes a suite of powerful visualization functions to assist in the interpretation of results. This method performs as well as or better than other scoring approaches in terms of its power to distinguish samples with distinct biology and its ability to call true differential gene sets between two conditions. These scores can be used for dimensional reduction of transcriptomic data and the phenotypic landscapes obtained by scoring samples against multiple molecular signatures may provide insights for sample stratification.
Despite increasing recognition of the importance of GM-CSF in autoimmune disease, it remains unclear how GM-CSF is regulated at sites of tissue inflammation. Using GM-CSF fate reporter mice, we show ...that synovial NK cells produce GM-CSF in autoantibody-mediated inflammatory arthritis. Synovial NK cells promote a neutrophilic inflammatory cell infiltrate, and persistent arthritis, via GM-CSF production, as deletion of NK cells, or specific ablation of GM-CSF production in NK cells, abrogated disease. Synovial NK cell production of GM-CSF is IL-18-dependent. Furthermore, we show that cytokine-inducible SH2-containing protein (CIS) is crucial in limiting GM-CSF signaling not only during inflammatory arthritis but also in experimental allergic encephalomyelitis (EAE), a murine model of multiple sclerosis. Thus, a cellular cascade of synovial macrophages, NK cells, and neutrophils mediates persistent joint inflammation via production of IL-18 and GM-CSF. Endogenous CIS provides a key brake on signaling through the GM-CSF receptor. These findings shed new light on GM-CSF biology in sterile tissue inflammation and identify several potential therapeutic targets.
Natural killer (NK) cell activity is essential for initiating antitumor responses and may be linked to immunotherapy success. NK cells and other innate immune components could be exploitable for ...cancer treatment, which drives the need for tools and methods that identify therapeutic avenues. Here, we extend our gene-set scoring method
to investigate NK cell infiltration by applying RNA-seq analysis to samples from bulk tumors. Computational methods have been developed for the deconvolution of immune cell types within solid tumors. We have taken the NK cell gene signatures from several such tools, then curated the gene list using a comparative analysis of tumors and immune cell types. Using a gene-set scoring method to investigate RNA-seq data from The Cancer Genome Atlas (TCGA), we show that patients with metastatic cutaneous melanoma have an improved survival rate if their tumor shows evidence of NK cell infiltration. Furthermore, these survival effects are enhanced in tumors that show higher expression of genes that encode NK cell stimuli such as the cytokine
Using this signature, we then examine transcriptomic data to identify tumor and stromal components that may influence the penetrance of NK cells into solid tumors. Our results provide evidence that NK cells play a role in the regulation of human tumors and highlight potential survival effects associated with increased NK cell activity. Our computational analysis identifies putative gene targets that may be of therapeutic value for boosting NK cell antitumor immunity.
Abstract
Gene expression signatures have been critical in defining the molecular phenotypes of cells, tissues, and patient samples. Their most notable and widespread clinical application is ...stratification of breast cancer patients into molecular (PAM50) subtypes. The cost and relatively large amounts of fresh starting material required for whole-transcriptome sequencing has limited clinical application of thousands of existing gene signatures captured in repositories such as the Molecular Signature Database. We identified genes with stable expression across a range of abundances, and with a preserved relative ordering across thousands of samples, allowing signature scoring and supporting general data normalisation for transcriptomic data. Our new method, stingscore, quantifies and summarises relative expression levels of signature genes from individual samples through the inclusion of these ‘stably-expressed genes’. We show that our list of stable genes has better stability across cancer and normal tissue data than previously proposed gene sets. Additionally, we show that signature scores computed from targeted transcript measurements using stingscore can predict docetaxel response in breast cancer patients. This new approach to gene expression signature analysis will facilitate the development of panel-type tests for gene expression signatures, thus supporting clinical translation of the powerful insights gained from cancer transcriptomic studies.
The mammary epithelium comprises two primary cellular lineages, but the degree of heterogeneity within these compartments and their lineage relationships during development remain an open question. ...Here we report single-cell RNA profiling of mouse mammary epithelial cells spanning four developmental stages in the post-natal gland. Notably, the epithelium undergoes a large-scale shift in gene expression from a relatively homogeneous basal-like program in pre-puberty to distinct lineage-restricted programs in puberty. Interrogation of single-cell transcriptomes reveals different levels of diversity within the luminal and basal compartments, and identifies an early progenitor subset marked by CD55. Moreover, we uncover a luminal transit population and a rare mixed-lineage cluster amongst basal cells in the adult mammary gland. Together these findings point to a developmental hierarchy in which a basal-like gene expression program prevails in the early post-natal gland prior to the specification of distinct lineage signatures, and the presence of cellular intermediates that may serve as transit or lineage-primed cells.
Caveolin proteins drive formation of caveolae, specialized cell-surface microdomains that influence cell signaling. Signaling proteins are proposed to use conserved caveolin-binding motifs (CBMs) to ...associate with caveolae via the caveolin scaffolding domain (CSD). However, structural and bioinformatic analyses argue against such direct physical interactions: in the majority of signaling proteins, the CBM is buried and inaccessible. Putative CBMs do not form a common structure for caveolin recognition, are not enriched among caveolin-binding proteins, and are even more common in yeast, which lack caveolae. We propose that CBM/CSD-dependent interactions are unlikely to mediate caveolar signaling, and the basis for signaling effects should therefore be reassessed.
Elucidation of regulatory networks, including identification of regulatory mechanisms specific to a given biological context, is a key aim in systems biology. This has motivated the move from ...co-expression to differential co-expression analysis and numerous methods have been developed subsequently to address this task; however, evaluation of methods and interpretation of the resulting networks has been hindered by the lack of known context-specific regulatory interactions.
In this study, we develop a simulator based on dynamical systems modelling capable of simulating differential co-expression patterns. With the simulator and an evaluation framework, we benchmark and characterise the performance of inference methods. Defining three different levels of "true" networks for each simulation, we show that accurate inference of causation is difficult for all methods, compared to inference of associations. We show that a z-score-based method has the best general performance. Further, analysis of simulation parameters reveals five network and simulation properties that explained the performance of methods. The evaluation framework and inference methods used in this study are available in the dcanr R/Bioconductor package.
Our analysis of networks inferred from simulated data show that hub nodes are more likely to be differentially regulated targets than transcription factors. Based on this observation, we propose an interpretation of the inferred differential network that can reconstruct a putative causal network.
Abstract
To gain a better understanding of the complexity of gene expression in normal and diseased tissues it is important to account for the spatial context and identity of cells in situ. ...State-of-the-art spatial profiling technologies, such as the Nanostring GeoMx Digital Spatial Profiler (DSP), now allow quantitative spatially resolved measurement of the transcriptome in tissues. However, the bioinformatics pipelines currently used to analyse GeoMx data often fail to successfully account for the technical variability within the data and the complexity of experimental designs, thus limiting the accuracy and reliability of the subsequent analysis. Carefully designed quality control workflows, that include in-depth experiment-specific investigations into technical variation and appropriate adjustment for such variation can address this issue. Here, we present standR, an R/Bioconductor package that enables an end-to-end analysis of GeoMx DSP data. With four case studies from previously published experiments, we demonstrate how the standR workflow can enhance the statistical power of GeoMx DSP data analysis and how the application of standR enables scientists to develop in-depth insights into the biology of interest.
Graphical Abstract
Graphical Abstract
Mass spectrometry (MS) enables high-throughput identification and quantification of proteins in complex biological samples and can provide insights into the global function of biological systems. ...Label-free quantification is cost-effective and suitable for the analysis of human samples. Despite rapid developments in label-free data acquisition workflows, the number of proteins quantified across samples can be limited by technical and biological variability. This variation can result in missing values which can in turn challenge downstream data analysis tasks. General purpose or gene expression-specific imputation algorithms are widely used to improve data completeness. Here, we propose an imputation algorithm designated for label-free MS data that is aware of the type of missingness affecting data. On published datasets acquired by data-dependent and data-independent acquisition workflows with variable degrees of biological complexity, we demonstrate that the proposed missing value estimation procedure by barycenter computation competes closely with the state-of-the-art imputation algorithms in differential abundance tasks while outperforming them in the accuracy of variance estimates of the peptide abundance measurements, and better controls the false discovery rate in label-free MS experiments. The barycenter estimation procedure is implemented in the msImpute software package and is available from the Bioconductor repository.
Display omitted
•msImpute provides imputation that is aware of the type of missingness in data•More-accurate estimates of variance and better control of the false discovery rate•The msImpute software package is available from the Bioconductor repository
The number of proteins quantified across samples by label-free mass-spectrometry (MS) is limited by technical and biological variability resulting in missing values that challenge downstream analysis. We present an imputation algorithm for label-free MS data that is aware of the type of missingness affecting data. Missing value estimation by msImpute outperforms state-of-the-art imputation methods in the accuracy of variance estimates for peptide abundance and better controls the false discovery rate in MS experiments. msImpute is available from the Bioconductor repository.
Abstract
Small interfering RNA (siRNA)-based drugs require chemical modifications or formulation to promote stability, minimize innate immunity, and enable delivery to target tissues. Partially ...modified siRNAs (up to 70% of the nucleotides) provide significant stabilization in vitro and are commercially available; thus are commonly used to evaluate efficacy of bio-conjugates for in vivo delivery. In contrast, most clinically-advanced non-formulated compounds, using conjugation as a delivery strategy, are fully chemically modified (100% of nucleotides). Here, we compare partially and fully chemically modified siRNAs in conjugate mediated delivery. We show that fully modified siRNAs are retained at 100x greater levels in various tissues, independently of the nature of the conjugate or siRNA sequence, and support productive mRNA silencing. Thus, fully chemically stabilized siRNAs may provide a better platform to identify novel moieties (peptides, aptamers, small molecules) for targeted RNAi delivery.