Abstract
Motivation
Deciphering the language of non-coding DNA is one of the fundamental problems in genome research. Gene regulatory code is highly complex due to the existence of polysemy and ...distant semantic relationship, which previous informatics methods often fail to capture especially in data-scarce scenarios.
Results
To address this challenge, we developed a novel pre-trained bidirectional encoder representation, named DNABERT, to capture global and transferrable understanding of genomic DNA sequences based on up and downstream nucleotide contexts. We compared DNABERT to the most widely used programs for genome-wide regulatory elements prediction and demonstrate its ease of use, accuracy and efficiency. We show that the single pre-trained transformers model can simultaneously achieve state-of-the-art performance on prediction of promoters, splice sites and transcription factor binding sites, after easy fine-tuning using small task-specific labeled data. Further, DNABERT enables direct visualization of nucleotide-level importance and semantic relationship within input sequences for better interpretability and accurate identification of conserved sequence motifs and functional genetic variant candidates. Finally, we demonstrate that pre-trained DNABERT with human genome can even be readily applied to other organisms with exceptional performance. We anticipate that the pre-trained DNABERT model can be fined tuned to many other sequence analyses tasks.
Availability and implementation
The source code, pretrained and finetuned model for DNABERT are available at GitHub (https://github.com/jerryji1993/DNABERT).
Supplementary information
Supplementary data are available at Bioinformatics online.
Adenosine deaminases acting on RNA (ADARs) are involved in RNA editing that converts adenosine residues to inosine specifically in double-stranded RNAs. In this study, we investigated the interaction ...of the RNA editing mechanism with the RNA interference (RNAi) machinery and found that ADAR1 forms a complex with Dicer through direct protein-protein interaction. Most importantly, ADAR1 increases the maximum rate (Vmax) of pre-microRNA (miRNA) cleavage by Dicer and facilitates loading of miRNA onto RNA-induced silencing complexes, identifying a new role of ADAR1 in miRNA processing and RNAi mechanisms. ADAR1 differentiates its functions in RNA editing and RNAi by the formation of either ADAR1/ADAR1 homodimer or Dicer/ADAR1 heterodimer complexes, respectively. As expected, the expression of miRNAs is globally inhibited in ADAR1−/− mouse embryos, which, in turn, alters the expression of their target genes and might contribute to their embryonic lethal phenotype.
Display omitted
► ADAR1 forms a complex with Dicer and enhances cleavage of miRNA and siRNA ► The ADAR1-Dicer heterodimer facilitates RISC loading and RNA silencing ► Global expression of miRNAs and target genes depends on ADAR1 in developing embryos ► Although ADAR1 homodimers antagonize RNAi, ADAR1-Dicer heterodimers stimulate it
The RNA-editing enzyme ADAR1 also plays a completely different role as an RNAi regulator. When ADAR1 binds Dicer, it acts as an RNA silencer by promoting miRNA processing, RISC loading, and RNAi efficacy.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
In recent years, the notion of "one gene makes one protein that functions in one signaling pathway" in mammalian cells has been shown to be overly simplistic. Recent genome-wide studies suggest that ...at least half of the human genes, including many therapeutic target genes, produce multiple protein isoforms through alternative splicing and alternative usage of transcription initiation and/or termination. For example, alternative splicing of the vascular endothelial growth factor gene (VEGFA) produces multiple protein isoforms, which display either pro-angiogenic or anti-angiogenic activities. Similarly, for the majority of human genes, the inclusion or exclusion of exonic sequences enhances the generation of transcript variants and/or protein isoforms that can vary in structure and functional properties. Many of the isoforms produced in this manner are tightly regulated during normal development but are misregulated in cancer cells. Altered expression of transcript variants and protein isoforms for numerous genes is linked with disease and its prognosis, and cancer cells manipulate regulatory mechanisms to express specific isoforms that confer drug resistance and survival advantages. Emerging insights indicate that modulating the expression of transcript and protein isoforms of a gene may hold the key to impeding tumor growth and act as a model for efficient targeting of disease-associated genes at the isoform level. This review highlights the role and regulation of alternative transcription and splicing mechanisms in generating the transcriptome, and the misuse and diagnostic/prognostic potential of alternative transcription and splicing in cancer.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UL, UM, UPCLJ, UPUK
-Methyladenosine (m
A) is the most abundant modification of mammalian mRNAs. RNA methylation fine tunes RNA stability and translation, altering cell fate. The fat mass- and obesity-associated protein ...(FTO) is an m
A demethylase with oncogenic properties in leukemia. Here, we show that
expression is suppressed in ovarian tumors and cancer stem cells (CSC). FTO inhibited the self-renewal of ovarian CSC and suppressed tumorigenesis
, both of which required FTO demethylase activity. Integrative RNA sequencing and m
A mapping analysis revealed significant transcriptomic changes associated with
overexpression and m
A loss involving stem cell signaling, RNA transcription, and mRNA splicing pathways. By reducing m
A levels at the 3'UTR and the mRNA stability of two phosphodiesterase genes (
and
), FTO augmented second messenger 3', 5'-cyclic adenosine monophosphate (cAMP) signaling and suppressed stemness features of ovarian cancer cells. Our results reveal a previously unappreciated tumor suppressor function of FTO in ovarian CSC mediated through inhibition of cAMP signaling. SIGNIFICANCE: A new tumor suppressor function of the RNA demethylase FTO implicates m
A RNA modifications in the regulation of cyclic AMP signaling involved in stemness and tumor initiation.
Patients with metastatic cancer experience a severe loss of skeletal muscle mass and function known as cachexia. Cachexia is associated with poor prognosis and accelerated death in patients with ...cancer, yet its underlying mechanisms remain poorly understood. Here, we identify the metal-ion transporter ZRT- and IRT-like protein 14 (ZIP14) as a critical mediator of cancer-induced cachexia. ZIP14 is upregulated in cachectic muscles of mice and in patients with metastatic cancer and can be induced by TNF-α and TGF-β cytokines. Strikingly, germline ablation or muscle-specific depletion of Zip14 markedly reduces muscle atrophy in metastatic cancer models. We find that ZIP14-mediated zinc uptake in muscle progenitor cells represses the expression of MyoD and Mef2c and blocks muscle-cell differentiation. Importantly, ZIP14-mediated zinc accumulation in differentiated muscle cells induces myosin heavy chain loss. These results highlight a previously unrecognized role for altered zinc homeostasis in metastatic cancer-induced muscle wasting and implicate ZIP14 as a therapeutic target for its treatment.
BackgroundImmune checkpoint inhibitors (ICIs) have modest activity in ovarian cancer (OC). To augment their activity, we used priming with the hypomethylating agent guadecitabine in a phase II ...study.MethodsEligible patients had platinum-resistant OC, normal organ function, measurable disease, and received up to 5 prior regimens. The treatment included guadecitabine (30 mg/m2) on days 1-4, and pembrolizumab (200 mg i.v.) on day 5, every 21 days. The primary endpoint was the response rate. Tumor biopsies, plasma, and PBMCs were obtained at baseline and after treatment.ResultsAmong 35 evaluable patients, 3 patients had partial responses (8.6%), and 8 (22.9%) patients had stable disease, resulting in a clinical benefit rate of 31.4% (95% CI: 16.9%-49.3%). The median duration of clinical benefit was 6.8 months. Long-interspersed element 1 (LINE1) was hypomethylated in post-treatment PBMCs, and methylomic and transcriptomic analyses showed activation of antitumor immunity in post-treatment biopsies. High-dimensional immune profiling of PBMCs showed a higher frequency of naive and/or central memory CD4+ T cells and of classical monocytes in patients with a durable clinical benefit or response (CBR). A higher baseline density of CD8+ T cells and CD20+ B cells and the presence of tertiary lymphoid structures in tumors were associated with a durable CBR.ConclusionEpigenetic priming using a hypomethylating agent with an ICI was feasible and resulted in a durable clinical benefit associated with immune responses in selected patients with recurrent OC.Trial registrationClinicalTrials.gov NCT02901899.FundingUS Army Medical Research and Material Command/Congressionally Directed Medical Research Programs (USAMRMC/CDMRP) grant W81XWH-17-0141; the Diana Princess of Wales Endowed Professorship and LCCTRAC funds from the Robert H. Lurie Comprehensive Cancer Center; Walter S. and Lucienne Driskill Immunotherapy Research funds; Astex Pharmaceuticals; Merck & Co.; National Cancer Institute (NCI), NIH grants CCSG P30 CA060553, CCSG P30 CA060553, and CA060553.
Abstract
Motivation
Access to large-scale genomics and transcriptomics data from various tissues and cell lines allowed the discovery of wide-spread alternative splicing events and alternative ...promoter usage in mammalians. Between human and mouse, gene-level orthology is currently present for nearly 16k protein-coding genes spanning a diverse repertoire of over 200k total transcript isoforms.
Results
Here, we describe a novel method, ExTraMapper, which leverages sequence conservation between exons of a pair of organisms and identifies a fine-scale orthology mapping at the exon and then transcript level. ExTraMapper identifies more than 350k exon mappings, as well as 30k transcript mappings between human and mouse using only sequence and gene annotation information. We demonstrate that ExTraMapper identifies a larger number of exon and transcript mappings compared to previous methods. Further, it identifies exon fusions, splits and losses due to splice site mutations, and finds mappings between microexons that are previously missed. By reanalysis of RNA-seq data from 13 matched human and mouse tissues, we show that ExTraMapper improves the correlation of transcript-specific expression levels suggesting a more accurate mapping of human and mouse transcripts. We also applied the method to detect conserved exon and transcript pairs between human and rhesus macaque genomes to highlight the point that ExTraMapper is applicable to any pair of organisms that have orthologous gene pairs.
Availability and implementation
The source code and the results are available at https://github.com/ay-lab/ExTraMapper and http://ay-lab-tools.lji.org/extramapper.
Supplementary information
Supplementary data are available at Bioinformatics online.
MRI characteristics of brain gliomas have been used to predict clinical outcome and molecular tumor characteristics. However, previously reported imaging biomarkers have not been sufficiently ...accurate or reproducible to enter routine clinical practice and often rely on relatively simple MRI measures. The current study leverages advanced image analysis and machine learning algorithms to identify complex and reproducible imaging patterns predictive of overall survival and molecular subtype in glioblastoma (GB).
One hundred five patients with GB were first used to extract approximately 60 diverse features from preoperative multiparametric MRIs. These imaging features were used by a machine learning algorithm to derive imaging predictors of patient survival and molecular subtype. Cross-validation ensured generalizability of these predictors to new patients. Subsequently, the predictors were evaluated in a prospective cohort of 29 new patients.
Survival curves yielded a hazard ratio of 10.64 for predicted long versus short survivors. The overall, 3-way (long/medium/short survival) accuracy in the prospective cohort approached 80%. Classification of patients into the 4 molecular subtypes of GB achieved 76% accuracy.
By employing machine learning techniques, we were able to demonstrate that imaging patterns are highly predictive of patient survival. Additionally, we found that GB subtypes have distinctive imaging phenotypes. These results reveal that when imaging markers related to infiltration, cell density, microvascularity, and blood-brain barrier compromise are integrated via advanced pattern analysis methods, they form very accurate predictive biomarkers. These predictive markers used solely preoperative images, hence they can significantly augment diagnosis and treatment of GB patients.
Glioblastoma (GBM) is one of the most difficult cancers to effectively treat, in part because of the lack of precision therapies and limited therapeutic access to intracranial tumor sites due to the ...presence of the blood-brain and blood-tumor barriers. We have developed a precision medicine approach for GBM treatment that involves the use of brain-penetrant RNA interference-based spherical nucleic acids (SNAs), which consist of gold nanoparticle cores covalently conjugated with radially oriented and densely packed small interfering RNA (siRNA) oligonucleotides. On the basis of previous preclinical evaluation, we conducted toxicology and toxicokinetic studies in nonhuman primates and a single-arm, open-label phase 0 first-in-human trial (NCT03020017) to determine safety, pharmacokinetics, intratumoral accumulation and gene-suppressive activity of systemically administered SNAs carrying siRNA specific for the GBM oncogene Bcl2Like12 (Bcl2L12). Patients with recurrent GBM were treated with intravenous administration of siBcl2L12-SNAs (drug moniker: NU-0129), at a dose corresponding to 1/50th of the no-observed-adverse-event level, followed by tumor resection. Safety assessment revealed no grade 4 or 5 treatment-related toxicities. Inductively coupled plasma mass spectrometry, x-ray fluorescence microscopy, and silver staining of resected GBM tissue demonstrated that intravenously administered SNAs reached patient tumors, with gold enrichment observed in the tumor-associated endothelium, macrophages, and tumor cells. NU-0129 uptake into glioma cells correlated with a reduction in tumor-associated Bcl2L12 protein expression, as indicated by comparison of matched primary tumor and NU-0129-treated recurrent tumor. Our results establish SNA nanoconjugates as a potential brain-penetrant precision medicine approach for the systemic treatment of GBM.
Context:
By genome-wide association studies, the risk allele A of SNP rs965513 predisposes strongly to papillary thyroid carcinoma (PTC). It is located in a gene-poor region of 9q22, some 60 kb from ...the FOXE1 gene. The underlying mechanisms remain to be discovered.
Objective:
Our objective was to identify novel transcripts in the 9q22 locus and correlate gene expression levels with the genotypes of rs965513.
Design:
We performed 3′ and 5′ rapid amplification of cDNA ends and RT-PCR to detect novel transcripts. One novel transcript was forcibly expressed in a cell line followed by gene expression array analysis. We genotyped rs965513 from PTC patients and measured gene expression levels by real-time RT-PCR in unaffected thyroid tissue and matched tumor.
Setting:
This was a laboratory-based study using cells from clinical tissue samples and a cancer cell line.
Main Outcome Measures:
We detected previously uncharacterized transcripts and evaluated the gene expression levels and the correlation with the risk allele of rs965513, age, gender, chronic lymphocyte thyroiditis (CLT), and TSH levels.
Results:
We found a novel long intergenic noncoding RNA gene and named it papillary thyroid cancer susceptibility candidate 2 (PTCSC2). Transcripts of PTCSC2 are down-regulated in PTC tumors. The risk allele A of rs965513 was significantly associated with low expression of unspliced PTCSC2, FOXE1, and TSHR in unaffected thyroid tissue. We also observed a significant association of age and CLT with PTCSC2 unspliced transcript levels. The correlation between the rs965513 genotype and the PTCSC2 unspliced transcript levels remained significant after adjusting for age, gender, and CLT. Forced expression of PTCSC2 in the BCPAP cell line affected the expression of a subset of noncoding and coding transcripts with enrichment of genes functionally involved in cell cycle and cancer.
Conclusions:
Our data suggest a role for PTCSC2, FOXE1, and TSHR in the predisposition to PTC.