A fundamental goal of genomics is to identify the complete set of expressed proteins. Automated annotation strategies rely on assumptions about protein-coding sequences (CDSs), e.g., they are ...conserved, do not overlap, and exceed a minimum length. However, an increasing number of newly discovered proteins violate these rules. Here we present an experimental and analytical framework, based on ribosome profiling and linear regression, for systematic identification and quantification of translation. Application of this approach to lipopolysaccharide-stimulated mouse dendritic cells and HCMV-infected human fibroblasts identifies thousands of novel CDSs, including micropeptides and variants of known proteins, that bear the hallmarks of canonical translation and exhibit translation levels and dynamics comparable to that of annotated CDSs. Remarkably, many translation events are identified in both mouse and human cells even when the peptide sequence is not conserved. Our work thus reveals an unexpected complexity to mammalian translation suited to provide both conserved regulatory or protein-based functions.
Display omitted
•ORF-RATER robustly identifies and quantifies translation from ribosome profiling data•ORF-RATER reveals thousands of novel micropeptides and variants of mammalian proteins•Hundreds of novel CDSs show evidence of protein-coding conservation among mammals•Many ORFs are translated in both mice and humans but lack protein-coding conservation
Fields et al. describe a ribosome profiling-based approach for empirical annotation of protein-coding regions of the genome. Of the thousands of previously unknown translated ORFs they identify in mouse and human, many are conserved or dynamically regulated. Surprisingly, a considerable subset is translated in both species despite weak sequence conservation.
Ribosome Profiling: Global Views of Translation Ingolia, Nicholas T; Hussmann, Jeffrey A; Weissman, Jonathan S
Cold Spring Harbor perspectives in biology,
05/2019, Letnik:
11, Številka:
5
Journal Article
Recenzirano
Odprti dostop
The translation of messenger RNA (mRNA) into protein and the folding of the resulting protein into an active form are prerequisites for virtually every cellular process and represent the single ...largest investment of energy by cells. Ribosome profiling-based approaches have revolutionized our ability to monitor every step of protein synthesis in vivo, allowing one to measure the rate of protein synthesis across the proteome, annotate the protein coding capacity of genomes, monitor localized protein synthesis, and explore cotranslational folding and targeting. The rich and quantitative nature of ribosome profiling data provides an unprecedented opportunity to explore and model complex cellular processes. New analytical techniques and improved experimental protocols will provide a deeper understanding of the factors controlling translation speed and its impact on protein function and cell physiology as well as the role of ribosomal RNA and mRNA modifications in regulating translation.
Abstract
The translation initiation machinery and the ribosome orchestrate a highly dynamic scanning process to distinguish proper start codons from surrounding nucleotide sequences. Here, we ...performed genome-wide CRISPRi screens in human K562 cells to systematically identify modulators of the frequency of translation initiation at near-cognate start codons. We observed that depletion of any eIF3 core subunit promoted near-cognate start codon usage, though sensitivity thresholds of each subunit to sgRNA-mediated depletion varied considerably. Double sgRNA depletion experiments suggested that enhanced near-cognate usage in eIF3D depleted cells required canonical eIF4E cap-binding and was not driven by eIF2A or eIF2D-dependent leucine tRNA initiation. We further characterized the effects of eIF3D depletion and found that the N-terminus of eIF3D was strictly required for accurate start codon selection, whereas disruption of the cap-binding properties of eIF3D had no effect. Lastly, depletion of eIF3D activated TNFα signaling via NF-κB and the interferon gamma response. Similar transcriptional profiles were observed upon knockdown of eIF1A and eIF4G2, which also promoted near-cognate start codon usage, suggesting that enhanced near-cognate usage could potentially contribute to NF-κB activation. Our study thus provides new avenues to study the mechanisms and consequences of alternative start codon usage.
Graphical Abstract
Graphical Abstract
Genome-wide CRISPRi screening reveals modulators of CUG near-cognate start coding usage in mammalian cells.
How cellular and organismal complexity emerges from combinatorial expression of genes is a central question in biology. High-content phenotyping approaches such as Perturb-seq (single-cell ...RNA-sequencing pooled CRISPR screens) present an opportunity for exploring such genetic interactions (GIs) at scale. Here, we present an analytical framework for interpreting high-dimensional landscapes of cell states (manifolds) constructed from transcriptional phenotypes. We applied this approach to Perturb-seq profiling of strong GIs mined from a growth-based, gain-of-function GI map. Exploration of this manifold enabled ordering of regulatory pathways, principled classification of GIs (e.g., identifying suppressors), and mechanistic elucidation of synergistic interactions, including an unexpected synergy between
and
driving erythroid differentiation. Finally, we applied recommender system machine learning to predict interactions, facilitating exploration of vastly larger GI manifolds.
Meiosis is a complex developmental process that generates haploid cells from diploid progenitors. We measured messenger RNA (mRNA) abundance and protein production through the yeast meiotic ...sporulation program and found strong, stage-specific expression for most genes, achieved through control of both mRNA levels and translational efficiency. Monitoring of protein production timing revealed uncharacterized recombination factors and extensive organellar remodeling. Meiotic translation is also shifted toward noncanonical sites, including short open reading frames (ORFs) on unannnotated transcripts and upstream regions of known transcripts (uORFs). Ribosome occupancy at near-cognate uORFs was associated with more efficient ORF translation; by contrast, some AUG uORFs, often exposed by regulated 5' leader extensions, acted competitively. This work reveals pervasive translational control in meiosis and helps to illuminate the molecular basis of the broad restructuring of meiotic cells.
Localized protein synthesis is a fundamental mechanism for creating distinct subcellular environments. Here we developed a generalizable proximity-specific ribosome profiling strategy that enables ...global analysis of translation in defined subcellular locations. We applied this approach to the endoplasmic reticulum (ER) in yeast and mammals. We observed the large majority of secretory proteins to be cotranslationally translocated, including substrates capable of posttranslational insertion in vitro. Distinct translocon complexes engaged nascent chains at different points during synthesis. Whereas most proteins engaged the ER immediately after or even before signal sequence (SS) emergence, a class of Sec66-dependent proteins entered with a looped SS conformation. Finally, we observed rapid ribosome exchange into the cytosol after translation termination. These data provide insights into how distinct translocation mechanisms act in concert to promote efficient cotranslational recruitment.
Recent studies highlight the importance of translational control in determining protein abundance, underscoring the value of measuring gene expression at the level of translation. We present a ...protocol for genome-wide, quantitative analysis of in vivo translation by deep sequencing. This ribosome profiling approach maps the exact positions of ribosomes on transcripts by nuclease footprinting. The nuclease-protected mRNA fragments are converted into a DNA library suitable for deep sequencing using a strategy that minimizes bias. The abundance of different footprint fragments in deep sequencing data reports on the amount of translation of a gene. In addition, footprints reveal the exact regions of the transcriptome that are translated. To better define translated reading frames, we describe an adaptation that reveals the sites of translation initiation by pretreating cells with harringtonine to immobilize initiating ribosomes. The protocol we describe requires 5-7 days to generate a completed ribosome profiling sequencing library. Sequencing and data analysis require a further 4-5 days.
Cells repair DNA double-strand breaks (DSBs) through a complex set of pathways critical for maintaining genomic integrity. To systematically map these pathways, we developed a high-throughput ...screening approach called Repair-seq that measures the effects of thousands of genetic perturbations on mutations introduced at targeted DNA lesions. Using Repair-seq, we profiled DSB repair products induced by two programmable nucleases (Cas9 and Cas12a) in the presence or absence of oligonucleotides for homology-directed repair (HDR) after knockdown of 476 genes involved in DSB repair or associated processes. The resulting data enabled principled, data-driven inference of DSB end joining and HDR pathways. Systematic interrogation of this data uncovered unexpected relationships among DSB repair genes and demonstrated that repair outcomes with superficially similar sequence architectures can have markedly different genetic dependencies. This work provides a foundation for mapping DNA repair pathways and for optimizing genome editing across diverse modalities.
Display omitted
•Repair-seq maps the genetic dependencies of DNA repair outcomes•High-resolution signatures of gene function identify unexpected gene relationships•DSB-induced mutations with similar sequences can result from distinct mechanisms•Repair-seq can be adapted to study a broad range of genome editing tools
Measuring the effects of many genetic perturbations on the spectrum of mutations produced at targeted DNA breaks allows systematic mapping of DNA repair pathways.
Communication between organelles is an important feature of all eukaryotic cells. To uncover components involved in mitochondria/endoplasmic reticulum (ER) junctions, we screened for mutants that ...could be complemented by a synthetic protein designed to artificially tether the two organelles. We identified the Mmm1/Mdm10/Mdm12/Mdm34 complex as a molecular tether between ER and mitochondria. The tethering complex was composed of proteins resident of both ER and mitochondria. With the use of genome-wide mapping of genetic interactions, we showed that the components of the tethering complex were functionally connected to phospholipid biosynthesis and calcium-signaling genes. In mutant cells, phospholipid biosynthesis was impaired. The tethering complex localized to discrete foci, suggesting that discrete sites of close apposition between ER and mitochondria facilitate interorganelle calcium and phospholipid exchange.
Programmable C•G-to-G•C base editors (CGBEs) have broad scientific and therapeutic potential, but their editing outcomes have proved difficult to predict and their editing efficiency and product ...purity are often low. We describe a suite of engineered CGBEs paired with machine learning models to enable efficient, high-purity C•G-to-G•C base editing. We performed a CRISPR interference (CRISPRi) screen targeting DNA repair genes to identify factors that affect C•G-to-G•C editing outcomes and used these insights to develop CGBEs with diverse editing profiles. We characterized ten promising CGBEs on a library of 10,638 genomically integrated target sites in mammalian cells and trained machine learning models that accurately predict the purity and yield of editing outcomes (R = 0.90) using these data. These CGBEs enable correction to the wild-type coding sequence of 546 disease-related transversion single-nucleotide variants (SNVs) with >90% precision (mean 96%) and up to 70% efficiency (mean 14%). Computational prediction of optimal CGBE-single-guide RNA pairs enables high-purity transversion base editing at over fourfold more target sites than achieved using any single CGBE variant.