The ability to quantify cellular heterogeneity is a major advantage of single-cell technologies. However, statistical methods often treat cellular heterogeneity as a nuisance. We present a novel ...method to characterize differences in expression in the presence of distinct expression states within and among biological conditions. We demonstrate that this framework can detect differential expression patterns under a wide range of settings. Compared to existing approaches, this method has higher power to detect subtle differences in gene expression distributions that are more complex than a mean shift, and can characterize those differences. The freely available R package scDD implements the approach.
In high-throughput studies, hundreds to millions of hypotheses are typically tested. Statistical methods that control the false discovery rate (FDR) have emerged as popular and powerful tools for ...error rate control. While classic FDR methods use only p values as input, more modern FDR methods have been shown to increase power by incorporating complementary information as informative covariates to prioritize, weight, and group hypotheses. However, there is currently no consensus on how the modern methods compare to one another. We investigate the accuracy, applicability, and ease of use of two classic and six modern FDR-controlling methods by performing a systematic benchmark comparison using simulation studies as well as six case studies in computational biology.
Methods that incorporate informative covariates are modestly more powerful than classic approaches, and do not underperform classic approaches, even when the covariate is completely uninformative. The majority of methods are successful at controlling the FDR, with the exception of two modern methods under certain settings. Furthermore, we find that the improvement of the modern FDR methods over the classic methods increases with the informativeness of the covariate, total number of hypothesis tests, and proportion of truly non-null hypotheses.
Modern FDR methods that use an informative covariate provide advantages over classic FDR-controlling procedures, with the relative gain dependent on the application and informativeness of available covariates. We present our findings as a practical guide and provide recommendations to aid researchers in their choice of methods to correct for false discoveries.
Androgen receptor (AR) in prostate cancer (PCa) can drive transcriptional repression of multiple genes including MYC, and supraphysiological androgen is effective in some patients. Here, we show that ...this repression is independent of AR chromatin binding and driven by coactivator redistribution, and through chromatin conformation capture methods show disruption of the interaction between the MYC super-enhancer within the PCAT1 gene and the MYC promoter. Conversely, androgen deprivation in vitro and in vivo increases MYC expression. In parallel, global AR activity is suppressed by MYC overexpression, consistent with coactivator redistribution. These suppressive effects of AR and MYC are mitigated at shared AR/MYC binding sites, which also have markedly higher levels of H3K27 acetylation, indicating enrichment for functional enhancers. These findings demonstrate an intricate balance between AR and MYC, and indicate that increased MYC in response to androgen deprivation contributes to castration-resistant PCa, while decreased MYC may contribute to responses to supraphysiological androgen therapy.
With recent advances in sequencing technology, it is now feasible to measure DNA methylation at tens of millions of sites across the entire genome. In most applications, biologists are interested in ...detecting differentially methylated regions, composed of multiple sites with differing methylation levels among populations. However, current computational approaches for detecting such regions do not provide accurate statistical inference. A major challenge in reporting uncertainty is that a genome-wide scan is involved in detecting these regions, which needs to be accounted for. A further challenge is that sample sizes are limited due to the costs associated with the technology. We have developed a new approach that overcomes these challenges and assesses uncertainty for differentially methylated regions in a rigorous manner. Region-level statistics are obtained by fitting a generalized least squares regression model with a nested autoregressive correlated error structure for the effect of interest on transformed methylation proportions. We develop an inferential approach, based on a pooled null distribution, that can be implemented even when as few as two samples per population are available. Here, we demonstrate the advantages of our method using both experimental data and Monte Carlo simulation. We find that the new method improves the specificity and sensitivity of lists of regions and accurately controls the false discovery rate.
Identifying and prioritizing somatic mutations is an important and challenging area of cancer research that can provide new insights into gene function as well as new targets for drug development. ...Most methods for prioritizing mutations rely primarily on frequency-based criteria, where a gene is identified as having a driver mutation if it is altered in significantly more samples than expected according to a background model. Although useful, frequency-based methods are limited in that all mutations are treated equally. It is well known, however, that some mutations have no functional consequence, while others may have a major deleterious impact. The spatial pattern of mutations within a gene provides further insight into their functional consequence. Properly accounting for these factors improves both the power and accuracy of inference. Also important is an accurate background model.
Here, we develop a Model-based Approach for identifying Driver Genes in Cancer (termed MADGiC) that incorporates both frequency and functional impact criteria and accommodates a number of factors to improve the background model. Simulation studies demonstrate advantages of the approach, including a substantial increase in power over competing methods. Further advantages are illustrated in an analysis of ovarian and lung cancer data from The Cancer Genome Atlas (TCGA) project.
Improving early cancer detection has the potential to substantially reduce cancer-related mortality. Cell-free methylated DNA immunoprecipitation and high-throughput sequencing (cfMeDIP-seq) is a ...highly sensitive assay capable of detecting early-stage tumors. We report accurate classification of patients across all stages of renal cell carcinoma (RCC) in plasma (area under the receiver operating characteristic (AUROC) curve of 0.99) and demonstrate the validity of this assay to identify patients with RCC using urine cell-free DNA (cfDNA; AUROC of 0.86).
Increased androgen receptor (AR) activity drives therapeutic resistance in advanced prostate cancer. The most common resistance mechanism is amplification of this locus presumably targeting the AR ...gene. Here, we identify and characterize a somatically acquired AR enhancer located 650 kb centromeric to the AR. Systematic perturbation of this enhancer using genome editing decreased proliferation by suppressing AR levels. Insertion of an additional copy of this region sufficed to increase proliferation under low androgen conditions and to decrease sensitivity to enzalutamide. Epigenetic data generated in localized prostate tumors and benign specimens support the notion that this region is a developmental enhancer. Collectively, these observations underscore the importance of epigenomic profiling in primary specimens and the value of deploying genome editing to functionally characterize noncoding elements. More broadly, this work identifies a therapeutic vulnerability for targeting the AR and emphasizes the importance of regulatory elements as highly recurrent oncogenic drivers.
Display omitted
•An AR enhancer becomes activated in castrate-resistant prostate cancer (CRPC)•The AR enhancer is frequently amplified in CRPC•The enhancer amplification desensitizes cancer cells to hormone deprivation treatment•The AR enhancer is likely to be a developmental enhancer that is reactivated in CRPC
Activation and amplification of an enhancer upstream of the androgen receptor locus drives progression of metastatic castration-resistant prostate cancer.
In the post‐genomic era, thousands of putative noncoding regulatory regions have been identified, such as enhancers, promoters, long noncoding RNAs (lncRNAs), and a cadre of small peptides. These ...ever‐growing catalogs require high‐throughput assays to test their functionality at scale. Massively parallel reporter assays have greatly enhanced the understanding of noncoding DNA elements en masse. Here, we present a massively parallel RNA assay (MPRNA) that can assay 10,000 or more RNA segments for RNA‐based functionality. We applied MPRNA to identify RNA‐based nuclear localization domains harbored in lncRNAs. We examined a pool of 11,969 oligos densely tiling 38 human lncRNAs that were fused to a cytosolic transcript. After cell fractionation and barcode sequencing, we identified 109 unique RNA regions that significantly enriched this cytosolic transcript in the nucleus including a cytosine‐rich motif. These nuclear enrichment sequences are highly conserved and over‐represented in global nuclear fractionation sequencing. Importantly, many of these regions were independently validated by single‐molecule RNA fluorescence in situ hybridization. Overall, we demonstrate the utility of MPRNA for future investigation of RNA‐based functionalities.
Synopsis
A new tiling‐based method maps functional domains in RNAs in a high‐throughput manner, allowing the identification of sequence motifs that contribute to the nuclear enrichment of long non‐coding RNAs.
Massively Parallel RNA Assay (MPRNA) is a universally applicable method to survey RNA‐based functionalities in a high‐throughput manner.
MPRNA identifies 109 nuclear enrichment sequences across 29 of 38 lncRNAs tested.
A C‐rich motif is generally enriched in nuclear versus cytoplasmic transcripts.
RNA‐FISH Fish reveals that large sequence domains are sufficient to localize otherwise cytoplasmic RNA to the nucleus.
MPRNA shows that nuclear lncRNAs have unique and large (˜500 bp) nuclear localization domains
A new tiling‐based method maps functional domains across a panel of long non‐coding RNAs, identifying sequence motifs that retain these RNAs in the nucleus.
Lineage plasticity, the ability of a cell to alter its identity, is an increasingly common mechanism of adaptive resistance to targeted therapy in cancer. An archetypal example is the development of ...neuroendocrine prostate cancer (NEPC) after treatment of prostate adenocarcinoma (PRAD) with inhibitors of androgen signaling. NEPC is an aggressive variant of prostate cancer that aberrantly expresses genes characteristic of neuroendocrine (NE) tissues and no longer depends on androgens. Here, we investigate the epigenomic basis of this resistance mechanism by profiling histone modifications in NEPC and PRAD patient-derived xenografts (PDXs) using chromatin immunoprecipitation and sequencing (ChIP-seq). We identify a vast network of cis-regulatory elements (N~15,000) that are recurrently activated in NEPC. The FOXA1 transcription factor (TF), which pioneers androgen receptor (AR) chromatin binding in the prostate epithelium, is reprogrammed to NE-specific regulatory elements in NEPC. Despite loss of dependence upon AR, NEPC maintains FOXA1 expression and requires FOXA1 for proliferation and expression of NE lineage-defining genes. Ectopic expression of the NE lineage TFs ASCL1 and NKX2-1 in PRAD cells reprograms FOXA1 to bind to NE regulatory elements and induces enhancer activity as evidenced by histone modifications at these sites. Our data establish the importance of FOXA1 in NEPC and provide a principled approach to identifying cancer dependencies through epigenomic profiling.
Plasma cell-free DNA (cfDNA) variant analysis is commonly used in many cancer subtypes. Cell-free methylated DNA immunoprecipitation sequencing (cfMeDIP-seq) has shown high sensitivity for cancer ...detection. To date, studies have not compared the sensitivity of both methods in a single cancer subtype.
cfDNA from 40 metastatic RCC (mRCC) patients was subjected to targeted panel variant analysis. For 34 of 40, cfMeDIP-seq was also performed. A separate cohort of 38 mRCC patients were used in cfMeDIP-seq analysis to train an RCC classifier.
cfDNA variant analysis detected 21 candidate variants in 11 of 40 mRCC patients (28%), after exclusion of 2 germline variants and 6 variants reflecting clonal hematopoiesis. Among 23 patients with parallel tumor sequencing, cfDNA analysis alone identified variants in 9 patients (39%), while cfDNA analysis focused on tumor sequencing variant findings improved the sensitivity to 52%. In 34 mRCC patients undergoing cfMeDIP-seq, cfDNA variant analysis identified variants in 7 (21%), while cfMeDIP-seq detected all mRCC cases (100% sensitivity) with 88% specificity in 34 control subjects. In 5 patients with cfDNA variants and serial samples, variant frequency correlated with response to therapy.
cfMeDIP-seq is significantly more sensitive for mRCC detection than cfDNA variant analysis. However, cfDNA variant analysis may be useful for monitoring response to therapy.