Recent technological breakthroughs have made it possible to measure RNA expression at the single-cell level, thus paving the way for exploring expression heterogeneity among individual cells. Current ...single-cell RNA sequencing (scRNA-seq) protocols are complex and introduce technical biases that vary across cells, which can bias downstream analysis without proper adjustment. To account for cell-to-cell technical differences, we propose a statistical framework, TASC (Toolkit for Analysis of Single Cell RNA-seq), an empirical Bayes approach to reliably model the cell-specific dropout rates and amplification bias by use of external RNA spike-ins. TASC incorporates the technical parameters, which reflect cell-to-cell batch effects, into a hierarchical mixture model to estimate the biological variance of a gene and detect differentially expressed genes. More importantly, TASC is able to adjust for covariates to further eliminate confounding that may originate from cell size and cell cycle differences. In simulation and real scRNA-seq data, TASC achieves accurate Type I error control and displays competitive sensitivity and improved robustness to batch effects in differential expression analysis, compared to existing methods. TASC is programmed to be computationally efficient, taking advantage of multi-threaded parallelization. We believe that TASC will provide a robust platform for researchers to leverage the power of scRNA-seq.
Poly(urea‐urethane) thermosets containing the 1‐tert‐butylethylurea (TBEU) structure feature a reversible dissociation/association process of their covalent linkages under mild conditions. Unlike ...conventional thermosets, TBEU‐based poly(urea‐urethane) thermosets maintain their malleability after curing. Under high temperature (100 °C) and applied pressure (300 kPa), ground TBEU thermoset powder can be remolded to bulk after 20 min.
In this article, we propose a nonparametric graphical test based on optimal matching, for assessing the equality of multiple unknown multivariate probability distributions. Our procedure pools the ...data from the different classes to create a graph based on the minimum non-bipartite matching, and then utilizes the number of edges connecting data points from different classes to examine the closeness between the distributions. The proposed test is exactly distribution-free (the null distribution does not depend on the distribution of the data) and can be efficiently applied to multivariate as well as non-Euclidean data, whenever the inter-point distances are well-defined. We show that the test is universally consistent, and prove a distributional limit theorem for the test statistic under general alternatives. Through simulation studies, we demonstrate its superior performance against other common and well-known multisample tests. The method is applied to single cell transcriptomics data obtained from the peripheral blood, cancer tissue, and tumor-adjacent normal tissue of human subjects with hepatocellular carcinoma and non-small-cell lung cancer. Our method unveils patterns in how biochemical metabolic pathways are altered across immune cells in a cancer setting, depending on the tissue location. All of the methods described herein are implemented in the R package multicross.
Supplementary materials
for this article are available online.
Abstract
Summary
Copy number variation is an important and abundant source of variation in the human genome, which has been associated with a number of diseases, especially cancer. Massively parallel ...next-generation sequencing allows copy number profiling with fine resolution. Such efforts, however, have met with mixed successes, with setbacks arising partly from the lack of reliable analytical methods to meet the diverse and unique challenges arising from the myriad experimental designs and study goals in genetic studies. In cancer genomics, detection of somatic copy number changes and profiling of allele-specific copy number (ASCN) are complicated by experimental biases and artifacts as well as normal cell contamination and cancer subclone admixture. Furthermore, careful statistical modeling is warranted to reconstruct tumor phylogeny by both somatic ASCN changes and single nucleotide variants. Here we describe a flexible computational pipeline, MARATHON, which integrates multiple related statistical software for copy number profiling and downstream analyses in disease genetic studies.
Availability and implementation
MARATHON is publicly available at https://github.com/yuchaojiang/MARATHON.
Supplementary information
Supplementary data are available at Bioinformatics online.
Understanding the molecular parameters that regulate cross-species transmission and host adaptation of potential pathogens is crucial to control emerging infectious disease. Although microbial ...pathotype diversity is conventionally associated with gene gain or loss, the role of pathoadaptive nonsynonymous single-nucleotide polymorphisms (nsSNPs) has not been systematically evaluated. Here, our genome-wide analysis of core genes within Salmonella enterica serovar Typhimurium genomes reveals a high degree of allelic variation in surface-exposed molecules, including adhesins that promote host colonization. Subsequent multinomial logistic regression, MultiPhen and Random Forest analyses of known/suspected adhesins from 580 independent Typhimurium isolates identifies distinct host-specific nsSNP signatures. Moreover, population and functional analyses of host-associated nsSNPs for FimH, the type 1 fimbrial adhesin, highlights the role of key allelic residues in host-specific adherence in vitro. Together, our data provide the first concrete evidence that functional differences between allelic variants of bacterial proteins likely contribute to pathoadaption to diverse hosts.
We consider the problem of variable selection in regression modeling in high-dimensional spaces where there is known structure among the covariates. This is an unconventional variable selection ...problem for two reasons: (1) The dimension of the covariate space is comparable, and often much larger, than the number of subjects in the study, and (2) the covariate space is highly structured, and in some cases it is desirable to incorporate this structural information in to the model building process. We approach this problem through the Bayesian variable selection framework, where we assume that the covariates lie on an undirected graph and formulate an Ising prior on the model space for incorporating structural information. Certain computational and statistical problems arise that are unique to such high-dimensional, structured settings, the most interesting being the phenomenon of phase transitions. We propose theoretical and computational schemes to mitigate these problems. We illustrate our methods on two different graph structures: the linear chain and the regular graph of degree k. Finally, we use our methods to study a specific application in genomics: the modeling of transcription factor binding sites in DNA sequences.
Dinoflagellates are important components of marine ecosystems and essential coral symbionts, yet little is known about their genomes. We report here on the analysis of a high-quality assembly from ...the 1180-megabase genome of Symbiodinium kawagutii. We annotated protein-coding genes and identified Symbiodinium-specific gene families. No whole-genome duplication was observed, but instead we found active (retro) transposition and gene family expansion, especially in processes important for successful symbiosis with corals. We also documented genes potentially governing sexual reproduction and cyst formation, novel promoter elements, and a microRNA system potentially regulating gene expression in both symbiont and coral. We found biochemical complementarity between genomes of S. kawagutii and the anthozoan Acropora, indicative of host-symbiont coevolution, providing a resource for studying the molecular basis and evolution of coral symbiosis.
Complement dysregulation is a feature of many retinal diseases, yet mechanistic understanding at the cellular level is limited. Given this knowledge gap about which retinal cells express complement, ...we performed single-cell RNA sequencing on ∼92,000 mouse retinal cells and validated our results in five major purified retinal cell types. We found evidence for a distributed cell-type-specific complement expression across 11 cell types. Notably, Müller cells are the major contributor of complement activators c1s, c3, c4, and cfb. Retinal pigment epithelium (RPE) mainly expresses cfh and the terminal complement components, whereas cfi and cfp transcripts are most abundant in neurons. Aging enhances c1s, cfb, cfp, and cfi expression, while cfh expression decreases. Transient retinal ischemia increases complement expression in microglia, Müller cells, and RPE. In summary, we report a unique complement expression signature for murine retinal cell types suggesting a well-orchestrated regulation of local complement expression in the retinal microenvironment.
Display omitted
•Each retinal cell type expresses a specific signature of complement components•Müller and RPE cells are the main source of retinal complement transcripts•Components of the alternative and classical activating pathways were detected•The cell-type-specific complement signature changes with aging and degeneration
Overshooting complement activity contributes to retinal degeneration. Pauly et al. demonstrate a distinct complement expression profile of retinal cell types that changes with aging and during retinal degeneration. This prompts the intriguing concept of a local retinal complement activation possibly independent of the systemic components typically produced by the liver.
The epigenetic control of gene expression is highly cell-type and context specific. Yet, despite its complexity, gene regulatory logic can be broken down into modular components consisting of a ...transcription factor (TF) activating or repressing the target gene expression through its binding to a cis-regulatory region. We propose a nonparametric approach, TRIPOD, to detect and characterize the three-way relationships between a TF, its target gene, and the accessibility of the TF’s binding site using single-cell RNA and ATAC multiomic data. We apply TRIPOD to interrogate the cell-type-specific regulatory logic in peripheral blood mononuclear cells and contrast our results to detections from enhancer databases, cis-eQTL studies, ChIP-seq experiments, and TF knockdown/knockout studies. We then apply TRIPOD to mouse embryonic brain data and identify regulatory relationships, validated by ChIP-seq and PLAC-seq. Finally, we demonstrate TRIPOD on the SHARE-seq data of differentiating mouse hair follicle cells and identify lineage-specific regulation supported by histone marks and super-enhancer annotations. A record of this paper’s transparent peer review process is included in the supplemental information.
Display omitted
•TRIPOD interrogates gene regulation by scRNA and ATAC multiomic data•TRIPOD is a nonparametric approach to identify and characterize TF-gene-peak trios•TRIPOD identifies cell-type- and cell-state-specific transcription regulation•Trios identified by TRIPOD corroborate and complement the existing data
Jiang et al. propose TRIPOD, a nonparametric approach to interrogate transcriptional regulation using single-cell multiomic RNA and chromatin accessibility data. They demonstrate how to harness single-cell multiomic technologies in the study of gene regulation and how the data from these technologies corroborate and complement the existing omics data.