Predicting mutation-induced changes in protein thermodynamic stability (ΔΔG) is of great interest in protein engineering, variant interpretation, and protein biophysics. We introduce ThermoNet, a ...deep, 3D-convolutional neural network (3D-CNN) designed for structure-based prediction of ΔΔGs upon point mutation. To leverage the image-processing power inherent in CNNs, we treat protein structures as if they were multi-channel 3D images. In particular, the inputs to ThermoNet are uniformly constructed as multi-channel voxel grids based on biophysical properties derived from raw atom coordinates. We train and evaluate ThermoNet with a curated data set that accounts for protein homology and is balanced with direct and reverse mutations; this provides a framework for addressing biases that have likely influenced many previous ΔΔG prediction methods. ThermoNet demonstrates performance comparable to the best available methods on the widely used Ssym test set. In addition, ThermoNet accurately predicts the effects of both stabilizing and destabilizing mutations, while most other methods exhibit a strong bias towards predicting destabilization. We further show that homology between Ssym and widely used training sets like S2648 and VariBench has likely led to overestimated performance in previous studies. Finally, we demonstrate the practical utility of ThermoNet in predicting the ΔΔGs for two clinically relevant proteins, p53 and myoglobin, and for pathogenic and benign missense variants from ClinVar. Overall, our results suggest that 3D-CNNs can model the complex, non-linear interactions perturbed by mutations, directly from biophysical properties of atoms.
We describe a chemical method to label and purify 4-thiouridine (s4U)-containing RNA. We demonstrate that methanethiosulfonate (MTS) reagents form disulfide bonds with s4U more efficiently than the ...commonly used HPDP-biotin, leading to higher yields and less biased enrichment. This increase in efficiency allowed us to use s4U labeling to study global microRNA (miRNA) turnover in proliferating cultured human cells without perturbing global miRNA levels or the miRNA processing machinery. This improved chemistry will enhance methods that depend on tracking different populations of RNA, such as 4-thiouridine tagging to study tissue-specific transcription and dynamic transcriptome analysis (DTA) to study RNA turnover.
Display omitted
•Current methods to track s4U-RNA are inefficient, giving low yields and high bias•MTS chemistry efficiently labels s4U-RNA, which improves methods that rely on s4U•Increased sensitivity provides greater insight into RNA dynamics and miRNA turnover
Duffy et al. demonstrate that previously used chemistry to enrich 4-thiouridine-containing RNA (s4U-RNA) is inefficient and produces biased results, whereas their MTS chemistry allows for efficient s4U-RNA labeling and enrichment. The authors use this chemistry to improve s4U-based metabolic labeling experiments and study the turnover of microRNAs.
Retroduplications come from reverse transcription of mRNAs and their insertion back into the genome. Here, we performed comprehensive discovery and analysis of retroduplications in a large cohort of ...2,535 individuals from 26 human populations, as part of 1000 Genomes Phase 3. We developed an integrated approach to discover novel retroduplications combining high-coverage exome and low-coverage whole-genome sequencing data, utilizing information from both exon-exon junctions and discordant paired-end reads. We found 503 parent genes having novel retroduplications absent from the reference genome. Based solely on retroduplication variation, we built phylogenetic trees of human populations; these represent superpopulation structure well and indicate that variable retroduplications are effective population markers. We further identified 43 retroduplication parent genes differentiating superpopulations. This group contains several interesting insertion events, including a SLMO2 retroduplication and insertion into CAV3, which has a potential disease association. We also found retroduplications to be associated with a variety of genomic features: (1) Insertion sites were correlated with regular nucleosome positioning. (2) They, predictably, tend to avoid conserved functional regions, such as exons, but, somewhat surprisingly, also avoid introns. (3) Retroduplications tend to be co-inserted with young L1 elements, indicating recent retrotranspositional activity, and (4) they have a weak tendency to originate from highly expressed parent genes. Our investigation provides insight into the functional impact and association with genomic elements of retroduplications. We anticipate our approach and analytical methodology to have application in a more clinical context, where exome sequencing data is abundant and the discovery of retroduplications can potentially improve the accuracy of SNP calling.
To date, studies on papillary renal-cell carcinoma (pRCC) have largely focused on coding alterations in traditional drivers, particularly the tyrosine-kinase, Met. However, for a significant fraction ...of tumors, researchers have been unable to determine a clear molecular etiology. To address this, we perform the first whole-genome analysis of pRCC. Elaborating on previous results on MET, we find a germline SNP (rs11762213) in this gene predicting prognosis. Surprisingly, we detect no enrichment for small structural variants disrupting MET. Next, we scrutinize noncoding mutations, discovering potentially impactful ones associated with MET. Many of these are in an intron connected to a known, oncogenic alternative-splicing event; moreover, we find methylation dysregulation nearby, leading to a cryptic promoter activation. We also notice an elevation of mutations in the long noncoding RNA NEAT1, and these mutations are associated with increased expression and unfavorable outcome. Finally, to address the origin of pRCC heterogeneity, we carry out whole-genome analyses of mutational processes. First, we investigate genome-wide mutational patterns, finding they are governed mostly by methylation-associated C-to-T transitions. We also observe significantly more mutations in open chromatin and early-replicating regions in tumors with chromatin-modifier alterations. Finally, we reconstruct cancer-evolutionary trees, which have markedly different topologies and suggested evolutionary trajectories for the different subtypes of pRCC.
We performed RNA sequencing on 40,000 cells to create a high-resolution single-cell gene expression atlas of developing human cortex, providing the first single-cell characterization of previously ...uncharacterized cell types, including human subplate neurons, comparisons with bulk tissue, and systematic analyses of technical factors. These data permit deconvolution of regulatory networks connecting regulatory elements and transcriptional drivers to single-cell gene expression programs, significantly extending our understanding of human neurogenesis, cortical evolution, and the cellular basis of neuropsychiatric disease. We tie cell-cycle progression with early cell fate decisions during neurogenesis, demonstrating that differentiation occurs on a transcriptomic continuum; rather than only expressing a few transcription factors that drive cell fates, differentiating cells express broad, mixed cell-type transcriptomes before telophase. By mapping neuropsychiatric disease genes to cell types, we implicate dysregulation of specific cell types in ASD, ID, and epilepsy. We developed CoDEx, an online portal to facilitate data access and browsing.
Display omitted
•High-resolution transcriptome map of 40,000 cells from developing human brain•Cell-type-specific transcription factor (TF) expression and TF-gene networks•Defines intermediate cell transition states during early neurogenesis•Implicates specific cell types in neuropsychiatric disorders
An extensive single-cell catalog of cell types in the mid-gestation human neocortex extends our understanding of early cortical development, including subplate neuron transcriptomes, cell-type-specific regulatory networks, brain evolution, and the cellular basis of neuropsychiatric disease.
Large-scale exome sequencing of tumors has enabled the identification of cancer drivers using recurrence-based approaches. Some of these methods also employ 3D protein structures to identify ...mutational hotspots in cancer-associated genes. In determining such mutational clusters in structures, existing approaches overlook protein dynamics, despite its essential role in protein function. We present a framework to identify cancer driver genes using a dynamics-based search of mutational hotspot communities. Mutations are mapped to protein structures, which are partitioned into distinct residue communities. These communities are identified in a framework where residue–residue contact edges are weighted by correlated motions (as inferred by dynamics-based models). We then search for signals of positive selection among these residue communities to identify putative driver genes, while applying our method to the TCGA (The Cancer Genome Atlas) PanCancer Atlas missense mutation catalog. Overall, we predict 1 or more mutational hotspots within the resolved structures of proteins encoded by 434 genes. These genes were enriched among biological processes associated with tumor progression. Additionally, a comparison between our approach and existing cancer hotspot detection methods using structural data suggests that including protein dynamics significantly increases the sensitivity of driver detection.
A systems understanding of nuclear organization and events is critical for determining how cells divide, differentiate, and respond to stimuli and for identifying the causes of diseases. Chromatin ...remodeling complexes such as SWI/SNF have been implicated in a wide variety of cellular processes including gene expression, nuclear organization, centromere function, and chromosomal stability, and mutations in SWI/SNF components have been linked to several types of cancer. To better understand the biological processes in which chromatin remodeling proteins participate, we globally mapped binding regions for several components of the SWI/SNF complex throughout the human genome using ChIP-Seq. SWI/SNF components were found to lie near regulatory elements integral to transcription (e.g. 5' ends, RNA Polymerases II and III, and enhancers) as well as regions critical for chromosome organization (e.g. CTCF, lamins, and DNA replication origins). Interestingly we also find that certain configurations of SWI/SNF subunits are associated with transcripts that have higher levels of expression, whereas other configurations of SWI/SNF factors are associated with transcripts that have lower levels of expression. To further elucidate the association of SWI/SNF subunits with each other as well as with other nuclear proteins, we also analyzed SWI/SNF immunoprecipitated complexes by mass spectrometry. Individual SWI/SNF factors are associated with their own family members, as well as with cellular constituents such as nuclear matrix proteins, key transcription factors, and centromere components, implying a ubiquitous role in gene regulation and nuclear function. We find an overrepresentation of both SWI/SNF-associated regions and proteins in cell cycle and chromosome organization. Taken together the results from our ChIP and immunoprecipitation experiments suggest that SWI/SNF facilitates gene regulation and genome function more broadly and through a greater diversity of interactions than previously appreciated.
Mature adipocytes store fatty acids and are a common component of tissue stroma. Adipocyte function in regulating bone marrow, skin, muscle, and mammary gland biology is emerging, but the role of ...adipocyte-derived lipids in tissue homeostasis and repair is poorly understood. Here, we identify an essential role for adipocyte lipolysis in regulating inflammation and repair after injury in skin. Genetic mouse studies revealed that dermal adipocytes are necessary to initiate inflammation after injury and promote subsequent repair. We find through histological, ultrastructural, lipidomic, and genetic experiments in mice that adipocytes adjacent to skin injury initiate lipid release necessary for macrophage inflammation. Tamoxifen-inducible genetic lineage tracing of mature adipocytes and single-cell RNA sequencing revealed that dermal adipocytes alter their fate and generate ECM-producing myofibroblasts within wounds. Thus, adipocytes regulate multiple aspects of repair and may be therapeutic for inflammatory diseases and defective wound healing associated with aging and diabetes.
Display omitted
•Inhibiting dermal adipocyte lipolysis reduces inflammatory wound bed macrophages•Wound edge adipocytes dedifferentiate within hours after injury•Adipocyte lipolysis is needed for dedifferentiated adipocytes to populate wound beds•Dedifferentiated adipocytes generate wound bed myofibroblasts after injury
Using genetic mouse models and transcriptomic profiling, Shook et al. show that skin resident adipocytes undergo lipolysis to promote efficient macrophage inflammation after injury. Lipolysis also allows adipocyte-derived cells to dedifferentiate and generate diverse extracellular matrix-producing myofibroblasts in the wound bed.
Recently, in addition to poly(A)+ long non‐coding RNAs (lncRNAs), many lncRNAs without poly(A) tails, have been characterized in mammals. However, the non‐polyA lncRNAs and their conserved motifs, ...especially those associated with environmental stresses, have not been fully investigated in plant genomes. We performed poly(A)− RNA‐seq for seedlings of Arabidopsis thaliana under four stress conditions, and predicted lncRNA transcripts. We classified the lncRNAs into three confidence levels according to their expression patterns, epigenetic signatures and RNA secondary structures. Then, we further classified the lncRNAs to poly(A)+ and poly(A)− transcripts. Compared with poly(A)+ lncRNAs and coding genes, we found that poly(A)− lncRNAs tend to have shorter transcripts and lower expression levels, and they show significant expression specificity in response to stresses. In addition, their differential expression is significantly enriched in drought condition and depleted in heat condition. Overall, we identified 245 poly(A)+ and 58 poly(A)− lncRNAs that are differentially expressed under various stress stimuli. The differential expression was validated by qRT‐PCR, and the signaling pathways involved were supported by specific binding of transcription factors (TFs), phytochrome‐interacting factor 4 (PIF4) and PIF5. Moreover, we found many conserved sequence and structural motifs of lncRNAs from different functional groups (e.g. a UUC motif responding to salt and a AU‐rich stem‐loop responding to cold), indicated that the conserved elements might be responsible for the stress‐responsive functions of lncRNAs.
There is a lack of approaches for identifying pathogenic genomic structural variants (SVs) although they play a crucial role in many diseases. We present a mechanism-agnostic machine learning-based ...workflow, called SVFX, to assign pathogenicity scores to somatic and germline SVs. In particular, we generate somatic and germline training models, which include genomic, epigenomic, and conservation-based features, for SV call sets in diseased and healthy individuals. We then apply SVFX to SVs in cancer and other diseases; SVFX achieves high accuracy in identifying pathogenic SVs. Predicted pathogenic SVs in cancer cohorts are enriched among known cancer genes and many cancer-related pathways.