Abstract
We report a new class of artifacts in DNA methylation measurements from Illumina HumanMethylation450 and MethylationEPIC arrays. These artifacts reflect failed hybridization to target DNA, ...often due to germline or somatic deletions and manifest as incorrectly reported intermediate methylation. The artifacts often survive existing preprocessing pipelines, masquerade as epigenetic alterations and can confound discoveries in epigenome-wide association studies and studies of methylation-quantitative trait loci. We implement a solution, P-value with out-of-band (OOB) array hybridization (pOOBAH), in the R package SeSAMe. Our method effectively masks deleted and hyperpolymorphic regions, reducing or eliminating spurious reports of epigenetic silencing at oft-deleted tumor suppressor genes such as CDKN2A and RB1 in cases with somatic deletions. Furthermore, our method substantially decreases technical variation whilst retaining biological variation, both within and across HM450 and EPIC platform measurements. SeSAMe provides a light-weight, modular DNA methylation data analysis suite, with a performant implementation suitable for efficient analysis of thousands of samples.
The minfi package is widely used for analyzing Illumina DNA methylation array data. Here we describe modifications to the minfi package required to support the HumanMethylationEPIC ('EPIC') array ...from Illumina. We discuss methods for the joint analysis and normalization of data from the HumanMethylation450 ('450k') and EPIC platforms. We introduce the single-sample Noob ( ssNoob ) method, a normalization procedure suitable for incremental preprocessing of individual methylation arrays and conclude that this method should be used when integrating data from multiple generations of Infinium methylation arrays. We show how to use reference 450k datasets to estimate cell type composition of samples on EPIC arrays. The cumulative effect of these updates is to ensure that minfi provides the tools to best integrate existing and forthcoming Illumina methylation array data.
The minfi package version 1.19.12 or higher is available for all platforms from the Bioconductor project.
khansen@jhsph.edu.
Supplementary data are available at Bioinformatics online.
We propose a novel approach to background correction for Infinium HumanMethylation data to account for technical variation in background fluorescence signal. Our approach capitalizes on a new use for ...the Infinium I design bead types to measure non-specific fluorescence in the colour channel opposite of their design (Cy3/Cy5). This provides tens of thousands of features for measuring background instead of the much smaller number of negative control probes on the platforms (n = 32 for HumanMethylation27 and n = 614 for HumanMethylation450, respectively). We compare the performance of our methods with existing approaches, using technical replicates of both mixture samples and biological samples, and demonstrate that within- and between-platform artefacts can be substantially reduced, with concomitant improvement in sensitivity, by the proposed methods.
Abstract
Data from both bulk and single-cell whole-genome DNA methylation experiments are under-utilized in many ways. This is attributable to inefficient mapping of methylation sequencing reads, ...routinely discarded genetic information, and neglected read-level epigenetic and genetic linkage information. We introduce the BISulfite-seq Command line User Interface Toolkit (BISCUIT) and its companion R/Bioconductor package, biscuiteer, for simultaneous extraction of genetic and epigenetic information from bulk and single-cell DNA methylation sequencing. BISCUIT’s performance, flexibility and standards-compliant output allow large, complex experimental designs to be characterized on clinical timescales. BISCUIT is particularly suited for processing data from single-cell DNA methylation assays, with its excellent scalability, efficiency, and ability to greatly enhance mappability, a key challenge for single-cell studies. We also introduce the epiBED format for single-molecule analysis of coupled epigenetic and genetic information, facilitating the study of cellular and tissue heterogeneity from DNA methylation sequencing.
Graphical Abstract
Graphical Abstract
Abstract Deconvolution methods infer quantitative cell type estimates from bulk measurement of mixed samples including blood and tissue. DNA methylation sequencing measures multiple CpGs per read, ...but few existing deconvolution methods leverage this within-read information. We develop CelFiE-ISH, which extends an existing method (CelFiE) to use within-read haplotype information. CelFiE-ISH outperforms CelFiE and other existing methods, achieving 30% better accuracy and more sensitive detection of rare cell types. We also demonstrate the importance of marker selection and of tailoring markers for haplotype-aware methods. While here we use gold-standard short-read sequencing data, haplotype-aware methods will be well-suited for long-read sequencing.
Optimized strategies for risk classification are essential to tailor therapy for patients with biologically distinctive disease. Risk classification in pediatric acute myeloid leukemia (pAML) relies ...on detection of translocations and gene mutations. Long noncoding RNA (lncRNA) transcripts have been shown to associate with and mediate malignant phenotypes in acute myeloid leukemia (AML) but have not been comprehensively evaluated in pAML.
To identify lncRNA transcripts associated with outcomes, we evaluated the annotated lncRNA landscape by transcript sequencing of 1,298 pediatric and 96 adult AML specimens. Upregulated lncRNAs identified in the pAML training set were used to establish a regularized Cox regression model of event-free survival (EFS), yielding a 37 lncRNA signature (lncScore). Discretized lncScores were correlated with initial and postinduction treatment outcomes using Cox proportional hazards models in validation sets. Predictive model performance was compared with standard stratification methods by concordance analysis.
Training set cases with positive lncScores had 5-year EFS and overall survival rates of 26.7% and 42.7%, respectively, compared with 56.9% and 76.3% with negative lncScores (hazard ratio, 2.48 and 3.16;
< .001). Pediatric validation cohorts and an adult AML group yielded comparable results in magnitude and significance. lncScore remained independently prognostic in multivariable models, including key factors used in preinduction and postinduction risk stratification. Subgroup analysis suggested that lncScores provide additional outcome information in heterogeneous subgroups currently classified as indeterminate risk. Concordance analysis showed that lncScore adds to overall classification accuracy with at least comparable predictive performance to current stratification methods that rely on multiple assays.
Inclusion of the lncScore enhances predictive power of traditional cytogenetic and mutation-defined stratification in pAML with potential, as a single assay, to replace these complex stratification schemes with comparable predictive accuracy.
Cell-to-cell communication through secreted Wnt ligands that bind to members of the Frizzled (Fzd) family of transmembrane receptors is critical for development and homeostasis. Wnt9a signals through ...Fzd9b, the co-receptor LRP5 or LRP6 (LRP5/6), and the epidermal growth factor receptor (EGFR) to promote early proliferation of zebrafish and human hematopoietic stem cells during development. Here, we developed fluorescently labeled, biologically active Wnt9a and Fzd9b fusion proteins to demonstrate that EGFR-dependent endocytosis of the ligand-receptor complex was required for signaling. In human cells, the Wnt9a-Fzd9b complex was rapidly endocytosed and trafficked through early and late endosomes, lysosomes, and the endoplasmic reticulum. Using small-molecule inhibitors and genetic and knockdown approaches, we found that Wnt9a-Fzd9b endocytosis required EGFR-mediated phosphorylation of the Fzd9b tail, caveolin, and the scaffolding protein EGFR protein substrate 15 (EPS15). LRP5/6 and the downstream signaling component AXIN were required for Wnt9a-Fzd9b signaling but not for endocytosis. Knockdown or loss of EPS15 impaired hematopoietic stem cell development in zebrafish. Other Wnt ligands do not require endocytosis for signaling activity, implying that specific modes of endocytosis and trafficking may represent a method by which Wnt-Fzd specificity is established.
Infant Acute Myeloid Leukemia (AML) is a poorly-addressed, heterogeneous malignancy distinguished by surprisingly few mutations per patient but accompanied by myriad age-specific translocations. ...These characteristics make treatment of infant AML challenging. While infant AML is a relatively rare disease, it has enormous impact on families, and in terms of life-years-lost and life limiting morbidities. To better understand the mechanisms that drive infant AML, we performed integrative analyses of genome-wide mRNA, miRNA, and DNA-methylation data in diagnosis-stage patient samples. Here, we report the activation of an onco-fetal B-cell developmental gene regulatory network in infant AML. AML in infants is genomically distinct from AML in older children/adults in that it has more structural genomic aberrations and fewer mutations. Differential expression analysis of ~1500 pediatric AML samples revealed a large number of infant-specific genes, many of which are associated with B cell development and function. 18 of these genes form a well-studied B-cell gene regulatory network that includes the epigenetic regulators BRD4 and POU2AF1, and their onco-fetal targets LIN28B and IGF2BP3. All four genes are hypo-methylated in infant AML. Moreover, micro-RNA Let7a-2 is expressed in a mutually exclusive manner with its target and regulator LIN28B. These findings suggest infant AML may respond to bromodomain inhibitors and immune therapies targeting CD19, CD20, CD22, and CD79A.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Little is known about the spectrum of mitochondrial DNA (mtDNA) mutations across pediatric malignancies. In this study, we analyzed matched tumor and normal whole genome sequencing data from 616 ...pediatric patients with hematopoietic malignancies, solid tumors, and brain tumors. We identified 391 mtDNA mutations in 284 tumors including 45 loss-of-function mutations, which clustered at four statistically significant hotspots in
,
, and
, and at a mutation hotspot in
. A skewed ratio (4.83) of nonsynonymous versus synonymous (dN/dS) mtDNA mutations with high statistical significance was identified on the basis of Monte Carlo simulations in the tumors. In comparison, opposite ratios of 0.44 and 0.93 were observed in 616 matched normal tissues and in 249 blood samples from children without cancer, respectively. mtDNA mutations varied by cancer type and mtDNA haplogroup. Collectively, these results suggest that deleterious mtDNA mutations play a role in the development and progression of pediatric cancers. SIGNIFICANCE: This pan-cancer mtDNA study establishes the landscape of germline and tumor mtDNA mutations and identifies hotspots of tumor mtDNA mutations to pinpoint key mitochondrial functions in pediatric malignancies.
Ten quick tips for deep learning in biology Lee, Benjamin D; Gitter, Anthony; Greene, Casey S ...
PLOS computational biology/PLoS computational biology,
03/2022, Letnik:
18, Številka:
3
Journal Article
Recenzirano
Odprti dostop
Each layer receives input from previous layers (the first of which represents the input data), and then transmits a transformed version of its own weighted output that serves as input into subsequent ...layers of the network. ...the process of “training” a neural network is the tuning of the layers’ weights to minimize a cost or loss function that serves as a surrogate of the prediction error. In many circumstances, deep learning can learn more complex relationships and make more accurate predictions than other methods. ...deep learning has become its own subfield of machine learning. While large amounts of high-quality data may be available in the areas of biology where data collection is thoroughly automated, such as DNA sequencing, areas of biology that rely on manual data collection may not possess enough data to train and apply deep learning models effectively. ...to the large-scale computational demands of deep learning, traditional machine learning models can often be trained on laptops (or even on a $5 computer 31) in seconds to minutes. ...due to this enormous disparity in resource demand alone, traditional machine learning approaches may be desirable in various biological applications.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK