Mutagenesis provides insight into proteins, but only recently have assays that couple genotype to phenotype been used to assess the activities of as many as 1 million mutant versions of a protein in ...a single experiment. This approach-'deep mutational scanning'-yields large-scale data sets that can reveal intrinsic protein properties, protein behavior within cells and the consequences of human genetic variation. Deep mutational scanning is transforming the study of proteins, but many challenges must be tackled for it to fulfill its promise.
Dimensionality reduction is often used to visualize complex expression profiling data. Here, we use the Uniform Manifold Approximation and Projection (UMAP) method on published transcript profiles of ...1484 single gene deletions of Saccharomyces cerevisiae. Proximity in low-dimensional UMAP space identifies groups of genes that correspond to protein complexes and pathways, and finds novel protein interactions, even within well-characterized complexes. This approach is more sensitive than previous methods and should be broadly useful as additional transcriptome datasets become available for other organisms.
Single cell RNA sequencing can yield high-resolution cell-type-specific expression signatures that reveal new cell types and the developmental trajectories of cell lineages. Here, we apply this ...approach to Arabidopsis (
) root cells to capture gene expression in 3,121 root cells. We analyze these data with Monocle 3, which orders single cell transcriptomes in an unsupervised manner and uses machine learning to reconstruct single cell developmental trajectories along pseudotime. We identify hundreds of genes with cell-type-specific expression, with pseudotime analysis of several cell lineages revealing both known and novel genes that are expressed along a developmental trajectory. We identify transcription factor motifs that are enriched in early and late cells, together with the corresponding candidate transcription factors that likely drive the observed expression patterns. We assess and interpret changes in total RNA expression along developmental trajectories and show that trajectory branch points mark developmental decisions. Finally, by applying heat stress to whole seedlings, we address the longstanding question of possible heterogeneity among cell types in the response to an abiotic stress. Although the response of canonical heat-shock genes dominates expression across cell types, subtle but significant differences in other genes can be detected among cell types. Taken together, our results demonstrate that single cell transcriptomics holds promise for studying plant development and plant physiology with unprecedented resolution.
The initial yeast two‐hybrid experiment – published in 1989 – described an approach to detecting protein–protein interactions that has flourished over the last two decades, leading to the assembly of ...large‐scale data sets of these interactions. Yet the yeast assay originated because of the laboratory's interests in technology development, not because of its need to identify partners of any protein then under study. In addition to such motivating forces, other features of the process of originating a technology can be revealed by considering the lessons of the two‐hybrid approach. These include the value of timeliness in a method's development, the willingness of an investigator to try experimental approaches that prove fruitless, the ability of biological macromolecules to display surprising attributes, the benefits of a community expending efforts to expand the uses of a technology platform, and the role of scientific training of those who work in technology.
Our ability to predict protein expression from DNA sequence alone remains poor, reflecting our limited understanding of
-regulatory grammar and hampering the design of engineered genes for synthetic ...biology applications. Here, we generate a model that predicts the protein expression of the 5' untranslated region (UTR) of mRNAs in the yeast
We constructed a library of half a million 50-nucleotide-long random 5' UTRs and assayed their activity in a massively parallel growth selection experiment. The resulting data allow us to quantify the impact on protein expression of Kozak sequence composition, upstream open reading frames (uORFs), and secondary structure. We trained a convolutional neural network (CNN) on the random library and showed that it performs well at predicting the protein expression of both a held-out set of the random 5' UTRs as well as native
5' UTRs. The model additionally was used to computationally evolve highly active 5' UTRs. We confirmed experimentally that the great majority of the evolved sequences led to higher protein expression rates than the starting sequences, demonstrating the predictive power of this model.
Translation elongation efficiency is largely thought of as the sum of decoding efficiencies for individual codons. Here, we find that adjacent codon pairs modulate translation efficiency. Deploying ...an approach in Saccharomyces cerevisiae that scored the expression of over 35,000 GFP variants in which three adjacent codons were randomized, we have identified 17 pairs of adjacent codons associated with reduced expression. For many pairs, codon order is obligatory for inhibition, implying a more complex interaction than a simple additive effect. Inhibition mediated by adjacent codons occurs during translation itself as GFP expression is restored by increased tRNA levels or by non-native tRNAs with exact-matching anticodons. Inhibition operates in endogenous genes, based on analysis of ribosome profiling data. Our findings suggest translation efficiency is modulated by an interplay between tRNAs at adjacent sites in the ribosome and that this concerted effect needs to be considered in predicting the functional consequences of codon choice.
Display omitted
•17 codon pairs in yeast mediate strong inhibition of translation•Inhibition by codon pairs is distinct from dipeptide and individual codon effects•Inhibitory pairs slow the ribosome on native mRNAs and involve wobble decoding•Codon order is key to inhibition, implying distinct roles for each position
Rather than protein synthesis relying solely on readout of individual codons, pairs of codons dictate translational efficiency, suggesting unexpected coupling between tRNA binding sites within the ribosome.
Cross-talk between different types of post-translational modifications on the same protein molecule adds specificity and combinatorial logic to signal processing, but it has not been characterized on ...a large-scale basis. We developed two methods to identify protein isoforms that are both phosphorylated and ubiquitylated in the yeast Saccharomyces cerevisiae, identifying 466 proteins with 2,100 phosphorylation sites co-occurring with 2,189 ubiquitylation sites. We applied these methods quantitatively to identify phosphorylation sites that regulate protein degradation via the ubiquitin-proteasome system. Our results demonstrate that distinct phosphorylation sites are often used in conjunction with ubiquitylation and that these sites are more highly conserved than the entire set of phosphorylation sites. Finally, we investigated how the phosphorylation machinery can be regulated by ubiquitylation. We found evidence for novel regulatory mechanisms of kinases and 14-3-3 scaffold proteins via proteasome-independent ubiquitylation.
The scarcity of accessible sites that are dynamic or cell type-specific in plants may be due in part to tissue heterogeneity in bulk studies. To assess the effects of tissue heterogeneity, we apply ...single-cell ATAC-seq to Arabidopsis thaliana roots and identify thousands of differentially accessible sites, sufficient to resolve all major cell types of the root. We find that the entirety of a cell's regulatory landscape and its transcriptome independently capture cell type identity. We leverage this shared information on cell identity to integrate accessibility and transcriptome data to characterize developmental progression, endoreduplication and cell division. We further use the combined data to characterize cell type-specific motif enrichments of transcription factor families and link the expression of family members to changing accessibility at specific loci, resolving direct and indirect effects that shape expression. Our approach provides an analytical framework to infer the gene regulatory networks that execute plant development.
The two‐hybrid method detects the interaction of two proteins by their ability to reconstitute the activity of a split transcription factor, thus allowing the use of a simple growth selection in ...yeast to identify new interactions. Since its introduction about 15 years ago, the assay largely has been applied to single proteins, successfully uncovering thousands of novel protein partners. In the last few years, however, two‐hybrid experiments have been scaled up to focus on the entire complement of proteins found in an organism. Although a single such effort can itself result in thousands of interactions, the validity of these high‐throughput approaches has been questioned as a result of the prevalence of numerous false positives in these large data sets. Such artifacts may not be an obstacle to continued scale‐up of the method, because the classification of true and false positives has proven to be a computational challenge that can be met by a growing number of creative strategies. Two examples are provided of this combination of high‐throughput experimentation and computational analysis, focused on the interaction of Plasmodium falciparum proteins and of Saccharomyces cerevisiae membrane proteins.
Deep mutational scanning marries selection for protein function to high-throughput DNA sequencing in order to quantify the activity of variants of a protein on a massive scale. First, an appropriate ...selection system for the protein function of interest is identified and validated. Second, a library of variants is created, introduced into the selection system and subjected to selection. Third, library DNA is recovered throughout the selection and deep-sequenced. Finally, a functional score for each variant is calculated on the basis of the change in the frequency of the variant during the selection. This protocol describes the steps that must be carried out to generate a large-scale mutagenesis data set consisting of functional scores for up to hundreds of thousands of variants of a protein of interest. Establishing an assay, generating a library of variants and carrying out a selection and its accompanying sequencing takes on the order of 4-6 weeks; the initial data analysis can be completed in 1 week.