Gene maps, or annotations, enable us to navigate the functional landscape of our genome. They are a resource upon which virtually all studies depend, from single-gene to genome-wide scales and from ...basic molecular biology to medical genetics. Yet present-day annotations suffer from trade-offs between quality and size, with serious but often unappreciated consequences for downstream studies. This is particularly true for long non-coding RNAs (lncRNAs), which are poorly characterized compared to protein-coding genes. Long-read sequencing technologies promise to improve current annotations, paving the way towards a complete annotation of lncRNAs expressed throughout a human lifetime.
Mitochondria are organelles with their own genomes, but they rely on the import of nuclear-encoded proteins that are translated by cytosolic ribosomes. Therefore, it is important to understand ...whether failures in the mitochondrial uptake of these nuclear-encoded proteins can cause proteotoxic stress and identify response mechanisms that may counteract it. Here, we report that upon impairments in mitochondrial protein import, high-risk precursor and immature forms of mitochondrial proteins form aberrant deposits in the cytosol. These deposits then cause further cytosolic accumulation and consequently aggregation of other mitochondrial proteins and disease-related proteins, including α-synuclein and amyloid β. This aggregation triggers a cytosolic protein homeostasis imbalance that is accompanied by specific molecular chaperone responses at both the transcriptomic and protein levels. Altogether, our results provide evidence that mitochondrial dysfunction, specifically protein import defects, contributes to impairments in protein homeostasis, thus revealing a possible molecular mechanism by which mitochondria are involved in neurodegenerative diseases.
Abstract
The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and ...clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.
Abstract
The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and ...clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.
Metazoan genomes produce thousands of long-noncoding RNAs (lncRNAs), of which just a small fraction have been well characterized. Understanding their biological functions requires accurate ...annotations, or maps of the precise location and structure of genes and transcripts in the genome. Current lncRNA annotations are limited by compromises between quality and size, with many gene models being fragmentary or uncatalogued. To overcome this, the GENCODE consortium has developed RNA capture long-read sequencing (CLS), an approach combining targeted RNA capture with third-generation long-read sequencing. CLS provides accurate annotations at high-throughput rates. It eliminates the need for noisy transcriptome assembly from short reads, and requires minimal manual curation. The full-length transcript models produced are of quality comparable to present-day manually curated annotations. Here we describe a detailed CLS protocol, from probe design through long-read sequencing to creation of final annotations.
Abstract Long-read RNA sequencing is essential to produce accurate and exhaustive annotation of eukaryotic genomes. Despite advancements in throughput and accuracy, achieving reliable end-to-end ...identification of RNA transcripts remains a challenge for long-read sequencing methods. To address this limitation, we develop CapTrap-seq, a cDNA library preparation method, which combines the Cap-trapping strategy with oligo(dT) priming to detect 5’ capped, full-length transcripts. In our study, we evaluate the performance of CapTrap-seq alongside other widely used RNA-seq library preparation protocols in human and mouse tissues, employing both ONT and PacBio sequencing technologies. To explore the quantitative capabilities of CapTrap-seq and its accuracy in reconstructing full-length RNA molecules, we implement a capping strategy for synthetic RNA spike-in sequences that mimics the natural 5’cap formation. Our benchmarks, incorporating the Long-read RNA-seq Genome Annotation Assessment Project (LRGASP) data, demonstrate that CapTrap-seq is a competitive, platform-agnostic RNA library preparation method for generating full-length transcript sequences.
Display omitted
•LINC01087 is significantly downregulated in TNBCs and upregulated in the luminal BC subtypes.•Variation of the level of LINC01087 in the different BC subtypes coincided with ...modulated expression of factors (mostly secreted) regulating genome integrity and expression, cell survival, cell proliferation, adhesion, invasion, inflammation and drug sensitivity.•Deregulation of LINC01087 allowed to accurately distinguish luminal and TNBC specimens, independently of the clinic-pathological parameters, and of the histological and TP53 or BRCA1/2 mutational status.•The expression of LINC01087 seemed to exhibit tumor suppressive properties, allowing to predict a better survival in BC subtypes.•LINC01087 is a highly specific biomarker for diagnosing and prognosing luminal and TNBC BC subtypes and for predicting the response to pharmacological interventions.
The molecular complexity of human breast cancer (BC) renders the clinical management of the disease challenging. Long non-coding RNAs (lncRNAs) are promising biomarkers for BC patient stratification, early detection, and disease monitoring. Here, we identified the involvement of the long intergenic non-coding RNA 01087 (LINC01087) in breast oncogenesis. LINC01087 appeared significantly downregulated in triple-negative BCs (TNBCs) and upregulated in the luminal BC subtypes in comparison to mammary samples from cancer-free women and matched normal cancer pairs. Interestingly, deregulation of LINC01087 allowed to accurately distinguish between luminal and TNBC specimens, independently of the clinicopathological parameters, and of the histological and TP53 or BRCA1/2 mutational status. Moreover, increased expression of LINC01087 predicted a better prognosis in luminal BCs, while TNBC tumors that harbored lower levels of LINC01087 were associated with reduced relapse-free survival. Furthermore, bioinformatics analyses were performed on TNBC and luminal BC samples and suggested that the putative tumor suppressor activity of LINC01087 may rely on interferences with pathways involved in cell survival, proliferation, adhesion, invasion, inflammation and drug sensitivity. Altogether, these data suggest that the assessment of LINC01087 deregulation could represent a novel, specific and promising biomarker not only for the diagnosis and prognosis of luminal BC subtypes and TNBCs, but also as a predictive biomarker of pharmacological interventions.
Long non-coding RNAs (lncRNAs) constitute a large, yet mostly uncharacterized fraction of the mammalian transcriptome. Such characterization requires a comprehensive, high-quality annotation of their ...gene structure and boundaries, which is currently lacking. Here we describe RACE-Seq, an experimental workflow designed to address this based on RACE (rapid amplification of cDNA ends) and long-read RNA sequencing. We apply RACE-Seq to 398 human lncRNA genes in seven tissues, leading to the discovery of 2,556 on-target, novel transcripts. About 60% of the targeted loci are extended in either 5' or 3', often reaching genomic hallmarks of gene boundaries. Analysis of the novel transcripts suggests that lncRNAs are as long, have as many exons and undergo as much alternative splicing as protein-coding genes, contrary to current assumptions. Overall, we show that RACE-Seq is an effective tool to annotate an organism's deep transcriptome, and compares favourably to other targeted sequencing techniques.