Deep mutational scanning marries selection for protein function to high-throughput DNA sequencing in order to quantify the activity of variants of a protein on a massive scale. First, an appropriate ...selection system for the protein function of interest is identified and validated. Second, a library of variants is created, introduced into the selection system and subjected to selection. Third, library DNA is recovered throughout the selection and deep-sequenced. Finally, a functional score for each variant is calculated on the basis of the change in the frequency of the variant during the selection. This protocol describes the steps that must be carried out to generate a large-scale mutagenesis data set consisting of functional scores for up to hundreds of thousands of variants of a protein of interest. Establishing an assay, generating a library of variants and carrying out a selection and its accompanying sequencing takes on the order of 4-6 weeks; the initial data analysis can be completed in 1 week.
We present a large-scale approach to investigate the functional consequences of sequence variation in a protein. The approach entails the display of hundreds of thousands of protein variants, ...moderate selection for activity and high-throughput DNA sequencing to quantify the performance of each variant. Using this strategy, we tracked the performance of >600,000 variants of a human WW domain after three and six rounds of selection by phage display for binding to its peptide ligand. Binding properties of these variants defined a high-resolution map of mutational preference across the WW domain; each position had unique features that could not be captured by a few representative mutations. Our approach could be applied to many in vitro or in vivo protein assays, providing a general means for understanding how protein function relates to sequence.
Sequencing-based, massively parallel genetic assays have revolutionized our ability to quantify the relationship between many genotypes and a phenotype of interest. Unfortunately, variant library ...expression platforms in mammalian cells are far from ideal, hindering the study of human gene variants in their physiologically relevant cellular contexts. Here, we describe a platform for phenotyping variant libraries in transfectable mammalian cell lines in two steps. First, a landing pad cell line with a genomically integrated, Tet-inducible cassette containing a Bxb1 recombination site is created. Second, a single variant from a library of transfected, promoter-less plasmids is recombined into the landing pad in each cell. Thus, every cell in the recombined pool expresses a single variant, allowing for parallel, sequencing-based assessment of variant effect. We describe a method for incorporating a single landing pad into a defined site of a cell line of interest, and show that our approach can be used generate more than 20 000 recombinant cells in a single experiment. Finally, we use our platform in combination with a sequencing-based assay to explore the N-end rule by simultaneously measuring the effects of all possible N-terminal amino acids on protein expression.
Abstract
Multiplex genetic assays can simultaneously test thousands of genetic variants for a property of interest. However, limitations of existing multiplex assay methods in cultured mammalian ...cells hinder the breadth, speed and scale of these experiments. Here, we describe a series of improvements that greatly enhance the capabilities of a Bxb1 recombinase-based landing pad system for conducting different types of multiplex genetic assays in various mammalian cell lines. We incorporate the landing pad into a lentiviral vector, easing the process of generating new landing pad cell lines. We also develop several new landing pad versions, including one where the Bxb1 recombinase is expressed from the landing pad itself, improving recombination efficiency more than 2-fold and permitting rapid prototyping of transgenic constructs. Other versions incorporate positive and negative selection markers that enable drug-based enrichment of recombinant cells, enabling the use of larger libraries and reducing costs. A version with dual convergent promoters allows enrichment of recombinant cells independent of transgene expression, permitting the assessment of libraries of transgenes that perturb cell growth and survival. Lastly, we demonstrate these improvements by assessing the effects of a combinatorial library of oncogenes and tumor suppressors on cell growth. Collectively, these advancements make multiplex genetic assays in diverse cultured cell lines easier, cheaper and more effective, facilitating future studies probing how proteins impact cell function, using transgenic variant libraries tested individually or in combination.
Determining the pathogenicity of genetic variants is a critical challenge, and functional assessment is often the only option. Experimentally characterizing millions of possible missense variants in ...thousands of clinically important genes requires generalizable, scalable assays. We describe variant abundance by massively parallel sequencing (VAMP-seq), which measures the effects of thousands of missense variants of a protein on intracellular abundance simultaneously. We apply VAMP-seq to quantify the abundance of 7,801 single-amino-acid variants of PTEN and TPMT, proteins in which functional variants are clinically actionable. We identify 1,138 PTEN and 777 TPMT variants that result in low protein abundance, and may be pathogenic or alter drug metabolism, respectively. We observe selection for low-abundance PTEN variants in cancer, and show that p.Pro38Ser, which accounts for ~10% of PTEN missense variants in melanoma, functions via a dominant-negative mechanism. Finally, we demonstrate that VAMP-seq is applicable to other genes, highlighting its generalizability.
CRISPR-Cas9 nucleases are powerful genome engineering tools, but unwanted cleavage at off-target and previously edited sites remains a major concern. Numerous strategies to reduce unwanted cleavage ...have been devised, but all are imperfect. Here, we report that off-target sites can be shielded from the active Cas9•single guide RNA (sgRNA) complex through the co-administration of dead-RNAs (dRNAs), truncated guide RNAs that direct Cas9 binding but not cleavage. dRNAs can effectively suppress a wide-range of off-targets with minimal optimization while preserving on-target editing, and they can be multiplexed to suppress several off-targets simultaneously. dRNAs can be combined with high-specificity Cas9 variants, which often do not eliminate all unwanted editing. Moreover, dRNAs can prevent cleavage of homology-directed repair (HDR)-corrected sites, facilitating scarless editing by eliminating the need for blocking mutations. Thus, we enable precise genome editing by establishing a flexible approach for suppressing unwanted editing of both off-targets and HDR-corrected sites.
Microscopy is a powerful tool for characterizing complex cellular phenotypes, but linking these phenotypes to genotype or RNA expression at scale remains challenging. Here, we present Visual Cell ...Sorting, a method that physically separates hundreds of thousands of live cells based on their visual phenotype. Automated imaging and phenotypic analysis directs selective illumination of Dendra2, a photoconvertible fluorescent protein expressed in live cells; these photoactivated cells are then isolated using fluorescence‐activated cell sorting. First, we use Visual Cell Sorting to assess hundreds of nuclear localization sequence variants in a pooled format, identifying variants that improve nuclear localization and enabling annotation of nuclear localization sequences in thousands of human proteins. Second, we recover cells that retain normal nuclear morphologies after paclitaxel treatment, and then derive their single‐cell transcriptomes to identify pathways associated with paclitaxel resistance in cancers. Unlike alternative methods, Visual Cell Sorting depends on inexpensive reagents and commercially available hardware. As such, it can be readily deployed to uncover the relationships between visual cellular phenotypes and internal states, including genotypes and gene expression programs.
Synopsis
This study describes an imaging‐based approach for pooled genetic screening and morphology‐based transcriptomics that uses high‐throughput photoactivation followed by FACS to separate differentially labeled cells.
Expression of the photoactivatable fluorescent protein Dendra2 permits selective, irreversible, and high‐throughput labeling of cells exhibiting different visual phenotypes. These labeled cell subpopulations can be sorted and thus subject to diverse downstream genomics assays.
Photoactivation using a digital micromirror device affixed to a 405 nm laser is accurate, non‐toxic, and can be tuned to produce four discrete levels of fluorescence.
Human cells expressing sequence variant libraries can be sorted according to a visual phenotype followed by sequencing, which provides sequence‐function maps for phenotypes such as protein subcellular localization.
Cell populations that respond in a visually heterogeneous fashion to drug treatment can be sorted and subject to transcriptomic analyses, revealing the molecular states associated with complex drug responses.
This study describes an imaging‐based approach for pooled genetic screening and morphology‐based transcriptomics that uses high‐throughput photoactivation followed by FACS to separate differentially labeled cells.
Vitamin K epoxide reductase (VKOR) drives the vitamin K cycle, activating vitamin K-dependent blood clotting factors. VKOR is also the target of the widely used anticoagulant drug, warfarin. Despite ...VKOR's pivotal role in coagulation, its structure and active site remain poorly understood. In addition, VKOR variants can cause vitamin K-dependent clotting factor deficiency or alter warfarin response. Here, we used multiplexed, sequencing-based assays to measure the effects of 2,695 VKOR missense variants on abundance and 697 variants on activity in cultured human cells. The large-scale functional data, along with an evolutionary coupling analysis, supports a four transmembrane domain topology, with variants in transmembrane domains exhibiting strongly deleterious effects on abundance and activity. Functionally constrained regions of the protein define the active site, and we find that, of four conserved cysteines putatively critical for function, only three are absolutely required. Finally, 25% of human VKOR missense variants show reduced abundance or activity, possibly conferring warfarin sensitivity or causing disease.
Hsp90 is a molecular chaperone involved in the refolding and activation of numerous protein substrates referred to as clients. While the molecular determinants of Hsp90 client specificity are poorly ...understood and limited to a handful of client proteins, strong clients are thought to be destabilized and conformationally extended. Here, we measured the phosphotransferase activity of 3929 variants of the tyrosine kinase Src in both the presence and absence of an Hsp90 inhibitor. We identified 84 previously unknown functionally dependent client variants. Unexpectedly, many destabilized or extended variants were not functionally dependent on Hsp90. Instead, functionally dependent client variants were clustered in the αF pocket and β1–β2 strand regions of Src, which have yet to be described in driving Hsp90 dependence. Hsp90 dependence was also strongly correlated with kinase activity. We found that a combination of activation, global extension, and general conformational flexibility, primarily induced by variants at the αF pocket and β1–β2 strands, was necessary to render Src functionally dependent on Hsp90. Moreover, the degree of activation and flexibility required to transform Src into a functionally dependent client varied with variant location, suggesting that a combination of regulatory domain disengagement and catalytic domain flexibility are required for chaperone dependence. Thus, by studying the chaperone dependence of a massive number of variants, we highlight factors driving Hsp90 client specificity and propose a model of chaperone‐kinase interactions.
PTEN is a multi-functional tumor suppressor protein regulating cell growth, immune signaling, neuronal function, and genome stability. Experimental characterization can help guide the clinical ...interpretation of the thousands of germline or somatic PTEN variants observed in patients. Two large-scale mutational datasets, one for PTEN variant intracellular abundance encompassing 4112 missense variants and one for lipid phosphatase activity encompassing 7244 variants, were recently published. The combined information from these datasets can reveal variant-specific phenotypes that may underlie various clinical presentations, but this has not been comprehensively examined, particularly for somatic PTEN variants observed in cancers.
Here, we add to these efforts by measuring the intracellular abundance of 764 new PTEN variants and refining abundance measurements for 3351 previously studied variants. We use this expanded and refined PTEN abundance dataset to explore the mutational patterns governing PTEN intracellular abundance, and then incorporate the phosphatase activity data to subdivide PTEN variants into four functionally distinct groups.
This analysis revealed a set of highly abundant but lipid phosphatase defective variants that could act in a dominant-negative fashion to suppress PTEN activity. Two of these variants were, indeed, capable of dysregulating Akt signaling in cells harboring a WT PTEN allele. Both variants were observed in multiple breast or uterine tumors, demonstrating the disease relevance of these high abundance, inactive variants.
We show that multidimensional, large-scale variant functional data, when paired with public cancer genomics datasets and follow-up assays, can improve understanding of uncharacterized cancer-associated variants, and provide better insights into how they contribute to oncogenesis.