Pseudogenes are mutated copies of protein-coding genes that cannot be translated into proteins, but a small subset of pseudogenes has been detected at the protein level. Although ubiquitin ...pseudogenes represent one of the most abundant pseudogene families in many organisms, little is known about their expression and signaling potential. By re-analyzing public RNA-sequencing and proteomics datasets, we here provide evidence for the expression of several ubiquitin pseudogenes including UBB pseudogene 4 (UBBP4), which encodes Ub
(Q2K, K33E, Q49K, N60S). The functional consequences of Ub
conjugation appear to differ from canonical ubiquitylation. Quantitative proteomics shows that Ub
modifies specific proteins including lamins. Knockout of UBBP4 results in slower cell division, and accumulation of lamin A within the nucleolus. Our work suggests that a subset of proteins reported as ubiquitin targets may instead be modified by ubiquitin variants that are the products of wrongly annotated pseudogenes and induce different functional effects.
Recent functional, proteomic and ribosome profiling studies in eukaryotes have concurrently demonstrated the translation of alternative open-reading frames (altORFs) in addition to annotated protein ...coding sequences (CDSs). We show that a large number of small proteins could in fact be coded by these altORFs. The putative alternative proteins translated from altORFs have orthologs in many species and contain functional domains. Evolutionary analyses indicate that altORFs often show more extreme conservation patterns than their CDSs. Thousands of alternative proteins are detected in proteomic datasets by reanalysis using a database containing predicted alternative proteins. This is illustrated with specific examples, including altMiD51, a 70 amino acid mitochondrial fission-promoting protein encoded in
/
/
, a gene encoding an annotated protein promoting mitochondrial fission. Our results suggest that many genes are multicoding genes and code for a large protein and one or several small proteins.
Abstract
Advances in proteomics and sequencing have highlighted many non-annotated open reading frames (ORFs) in eukaryotic genomes. Genome annotations, cornerstones of today's research, mostly rely ...on protein prior knowledge and on ab initio prediction algorithms. Such algorithms notably enforce an arbitrary criterion of one coding sequence (CDS) per transcript, leading to a substantial underestimation of the coding potential of eukaryotes. Here, we present OpenProt, the first database fully endorsing a polycistronic model of eukaryotic genomes to date. OpenProt contains all possible ORFs longer than 30 codons across 10 species, and cumulates supporting evidence such as protein conservation, translation and expression. OpenProt annotates all known proteins (RefProts), novel predicted isoforms (Isoforms) and novel predicted proteins from alternative ORFs (AltProts). It incorporates cutting-edge algorithms to evaluate protein orthology and re-interrogate publicly available ribosome profiling and mass spectrometry datasets, supporting the annotation of thousands of predicted ORFs. The constantly growing database currently cumulates evidence from 87 ribosome profiling and 114 mass spectrometry studies from several species, tissues and cell lines. All data is freely available and downloadable from a web platform (www.openprot.org) supporting a genome browser and advanced queries for each species. Thus, OpenProt enables a more comprehensive landscape of eukaryotic genomes’ coding potential.
The ovarian follicle reserve, formed pre- or perinatally, comprises all oocytes for lifetime reproduction. Depletion of this reserve results in infertility. Steroidogenic factor 1 (SF-1;
) and liver ...receptor homolog 1 (LRH-1;
) are two orphan nuclear receptors that regulate adult endocrine function, but their role in follicle formation is unknown. We developed models of conditional depletion of SF-1 or LRH-1 from prenatal ovaries. Depletion of SF-1, but not LRH-1, resulted in dramatically smaller ovaries and fewer primordial follicles. This was mediated by increased oocyte death, resulting from increased ovarian inflammation and increased Notch signaling. Major dysregulated genes were Iroquois homeobox 3 and 5 and their downstream targets involved in the establishment of the ovarian laminin matrix and oocyte-granulosa cell gap junctions. Disruptions of these pathways resulted in follicles with impaired basement membrane formation and compromised oocyte-granulosa communication networks, believed to render them more prone to atresia. This study identifies SF-1 as a key regulator of the formation of the ovarian reserve.
Proteogenomics and ribosome profiling concurrently show that genes may code for both a large and one or more small proteins translated from annotated coding sequences (CDSs) and unannotated ...alternative open reading frames (named alternative ORFs or altORFs), respectively, but the stoichiometry between large and small proteins translated from a same gene is unknown. MIEF1, a gene recently identified as a dual-coding gene, harbors a CDS and a newly annotated and actively translated altORF located in the 5′UTR. Here, we use absolute quantification with stable isotope-labeled peptides and parallel reaction monitoring to determine levels of both proteins in two human cells lines and in human colon. We report that the main MIEF1 translational product is not the canonical 463 amino acid MiD51 protein but the small 70 amino acid alternative MiD51 protein (altMiD51). These results demonstrate the inadequacy of the single CDS concept and provide a strong argument for incorporating altORFs and small proteins in functional annotations.
Recent proteogenomic approaches have led to the discovery that regions of the transcriptome previously annotated as non-coding regions i.e., untranslated regions (UTRs), open reading frames ...overlapping annotated coding sequences in a different reading frame, and non-coding RNAs frequently encode proteins (termed alternative proteins). This suggests that previously identified protein–protein interaction networks are partially incomplete because alternative proteins are not present in conventional protein databases. Here we used the proteogenomic resource OpenProt and a combined spectrum- and peptide-centric analysis for the re-analysis of a high-throughput human network proteomics dataset thereby revealing the presence of 261 alternative proteins in the network. We found 19 genes encoding both an annotated (reference) and an alternative protein interacting with each other. Of the 117 alternative proteins encoded by pseudogenes, 38 are direct interactors of reference proteins encoded by their respective parental gene. Finally, we experimentally validate several interactions involving alternative proteins. These data improve the blueprints of the human protein–protein interaction network and suggest functional roles for hundreds of alternative proteins.
The analysis of genomic data such as ChIP-Seq usually involves representing the signal intensity level over genes or other genetic features. This is often illustrated as a curve (representing the ...aggregate profile of a group of genes) or as a heatmap (representing individual genes). However, no specific resource dedicated to easily generating such profiles is currently available. We therefore built the versatile aggregate profiler (VAP), designed to be used by experimental and computational biologists to generate profiles of genomic datasets over groups of regions of interest, using either an absolute or a relative method. Graphical representation of the results is automatically generated, and subgrouping can be performed easily, based on the orientation of the flanking annotations. The outputs include statistical measures to facilitate comparisons between groups or datasets. We show that, through its intuitive design and flexibility, VAP can help avoid misinterpretations of genomics data. VAP is highly efficient and designed to run on laptop computers by using a memory footprint control, but can also be easily compiled and run on servers. VAP is accessible at http://lab-jacques.recherche.usherbrooke.ca/vap/.
Because of its profound influence on DNA accessibility for protein binding and thus on the regulation of diverse biological processes, nucleosome positioning has been studied for many years. In the ...past decade, high-throughput sequencing technologies have opened new perspectives in this research field by allowing the study of nucleosome positioning and occupancy on a genome-wide scale, therefore providing understanding on important aspects of chromatin packaging, as well as on various chromatin-template processes like transcription. In this chapter, we provide the protocol of MNase sequencing for the genome-wide mapping of nucleosomes using MNase to generate mononucleosomal DNA fragments and next-generation sequencing technology to identify their individual location.
In the analysis of experimental data corresponding to the signal enrichment of chromatin features such as histone modifications throughout the genome, it is often useful to represent the signal over ...known regions of interest, such as genes, using aggregate or individual profiles. In the present chapter, we describe and explain the best practices on how to generate such profiles as well as other usages of the versatile aggregate profiler (VAP) tool (Coulombe et al., Nucleic Acids Res 42:W485-W493, 2014), with a particular focus on the new functionalities introduced in version 1.1.0 of VAP.
Tumor characteristics are decisive in the determination of treatment strategy for patients with breast cancer. Patients with estrogen receptor α (ERα)-positive breast cancer can benefit from ...long-term hormonal treatment. Nonetheless, the majority of patients will develop resistance to these therapies. Here, we investigated the role of the nuclear receptor liver receptor homolog-1 (LRH-1, NR5A2) in antiestrogen-sensitive and -resistant breast cancer cells. We identified genome-wide LRH-1-binding sites using ChIP-seq (chromatin immunoprecipitation sequencing), uncovering preferential binding to regions distal to transcriptional start sites. We further characterized these LRH-1-binding sites by integrating overlapping layers of specific chromatin marks, revealing that many LRH-1-binding sites are active and could be involved in long-range enhancer-promoter looping. Combined with transcriptome analysis of LRH-1-depleted cells, these results show that LRH-1 regulates specific subsets of genes involved in cell proliferation in antiestrogen-sensitive and antiestrogen-resistant breast cancer cells. Furthermore, the LRH-1 transcriptional program is highly associated with a signature of poor outcome and high-grade breast cancer tumors in vivo. Herein, we report the genome-wide location and molecular function of LRH-1 in breast cancer cells and reveal its therapeutic potential for the treatment of breast cancers, notably for tumors resistant to treatments currently used in therapies.