The Genotype-Tissue Expression (GTEx) project was established to characterize genetic effects on the transcriptome across human tissues and to link these regulatory mechanisms to trait and disease ...associations. Here, we present analyses of the version 8 data, examining 15,201 RNA-sequencing samples from 49 tissues of 838 postmortem donors. We comprehensively characterize genetic associations for gene expression and splicing in cis and trans, showing that regulatory associations are found for almost all genes, and describe the underlying molecular mechanisms and their contribution to allelic heterogeneity and pleiotropy of complex traits. Leveraging the large diversity of tissues, we provide insights into the tissue specificity of genetic effects and show that cell type composition is a key factor in understanding gene regulatory mechanisms in human tissues.
Understanding gene function and regulation in homeostasis and disease requires knowledge of the cellular and tissue contexts in which genes are expressed. Here, we applied four single-nucleus RNA ...sequencing methods to eight diverse, archived, frozen tissue types from 16 donors and 25 samples, generating a cross-tissue atlas of 209,126 nuclei profiles, which we integrated across tissues, donors, and laboratory methods with a conditional variational autoencoder. Using the resulting cross-tissue atlas, we highlight shared and tissue-specific features of tissue-resident cell populations; identify cell types that might contribute to neuromuscular, metabolic, and immune components of monogenic diseases and the biological processes involved in their pathology; and determine cell types and gene modules that might underlie disease mechanisms for complex traits analyzed by genome-wide association studies.
Determining protein levels in each tissue and how they compare with RNA levels is important for understanding human biology and disease as well as regulatory processes that control protein levels. We ...quantified the relative protein levels from over 12,000 genes across 32 normal human tissues. Tissue-specific or tissue-enriched proteins were identified and compared to transcriptome data. Many ubiquitous transcripts are found to encode tissue-specific proteins. Discordance of RNA and protein enrichment revealed potential sites of synthesis and action of secreted proteins. The tissue-specific distribution of proteins also provides an in-depth view of complex biological events that require the interplay of multiple tissues. Most importantly, our study demonstrated that protein tissue-enrichment information can explain phenotypes of genetic diseases, which cannot be obtained by transcript information alone. Overall, our results demonstrate how understanding protein levels can provide insights into regulation, secretome, metabolism, and human diseases.
Display omitted
•Quantified proteins from more than 12,000 genes across 32 normal human tissues•Discordance of RNA and protein enrichment provides evidence of protein secretion•Tissue-specific distribution of enzymes indicates a coordinated control of metabolism•Tissue-enriched proteins provide insights into phenotypes of genetic diseases
Proteomics analysis across human tissues from the GTeX resource reveals insight into tissue-specific pathways and phenotypes arising from genetic diseases.
Current genomics methods are designed to handle tens to thousands of samples but will need to scale to millions to match the pace of data and hypothesis generation in biomedical science. Here, we ...show that high efficiency at low cost can be achieved by leveraging general-purpose libraries for computing using graphics processing units (GPUs), such as PyTorch and TensorFlow. We demonstrate > 200-fold decreases in runtime and ~ 5-10-fold reductions in cost relative to CPUs. We anticipate that the accessibility of these libraries will lead to a widespread adoption of GPUs in computational genomics.
The Genotype-Tissue Expression (GTEx) project has identified expression and splicing quantitative trait loci in cis (QTLs) for the majority of genes across a wide range of human tissues. However, the ...functional characterization of these QTLs has been limited by the heterogeneous cellular composition of GTEx tissue samples. We mapped interactions between computational estimates of cell type abundance and genotype to identify cell type-interaction QTLs for seven cell types and show that cell type-interaction expression QTLs (eQTLs) provide finer resolution to tissue specificity than bulk tissue cis-eQTLs. Analyses of genetic associations with 87 complex traits show a contribution from cell type-interaction QTLs and enables the discovery of hundreds of previously unidentified colocalized loci that are masked in bulk tissue.
Multiple myeloma is a plasma cell malignancy almost always preceded by precursor conditions, but low tumor burden of these early stages has hindered the study of their molecular programs through bulk ...sequencing technologies. Here, we generate and analyze single cell RNA-sequencing of plasma cells from 26 patients at varying disease stages and 9 healthy donors. In silico dissection and comparison of normal and transformed plasma cells from the same bone marrow biopsy enables discovery of patient-specific transcriptional changes. Using Non-Negative Matrix Factorization, we discover 15 gene expression signatures which represent transcriptional modules relevant to myeloma biology, and identify a signature that is uniformly lost in abnormal cells across disease stages. Finally, we demonstrate that tumors contain heterogeneous subpopulations expressing distinct transcriptional patterns. Our findings characterize transcriptomic alterations present at the earliest stages of myeloma, providing insight into the molecular underpinnings of disease initiation.
Long non-coding RNA (lncRNA) genes have well-established and important impacts on molecular and cellular functions. However, among the thousands of lncRNA genes, it is still a major challenge to ...identify the subset with disease or trait relevance. To systematically characterize these lncRNA genes, we used Genotype Tissue Expression (GTEx) project v8 genetic and multi-tissue transcriptomic data to profile the expression, genetic regulation, cellular contexts, and trait associations of 14,100 lncRNA genes across 49 tissues for 101 distinct complex genetic traits. Using these approaches, we identified 1,432 lncRNA gene-trait associations, 800 of which were not explained by stronger effects of neighboring protein-coding genes. This included associations between lncRNA quantitative trait loci and inflammatory bowel disease, type 1 and type 2 diabetes, and coronary artery disease, as well as rare variant associations to body mass index.
Display omitted
•29% of lncRNA genes with eQTLs show tissue-specific genetic regulation•Co-expression networks and single-cell data provide annotations for 94% of lncRNAs•Rare variants near lncRNA expression outliers impact complex traits, like BMI•We identify 800 lncRNA-trait relationships not explained by protein-coding genes
A systematic analysis of NIH Genotype Tissue Expression (GTEx) project data provides insights into lncRNA expression patterns and functions, explores the impact of genetic variation on lncRNAs, and connects lncRNAs to complex traits and human disease.
Multiple coronaviruses have emerged independently in the past 20 years that cause lethal human diseases. Although vaccine development targeting these viruses has been accelerated substantially, there ...remain patients requiring treatment who cannot be vaccinated or who experience breakthrough infections. Understanding the common host factors necessary for the life cycles of coronaviruses may reveal conserved therapeutic targets. Here, we used the known substrate specificities of mammalian protein kinases to deconvolute the sequence of phosphorylation events mediated by three host protein kinase families (SRPK, GSK-3, and CK1) that coordinately phosphorylate a cluster of serine and threonine residues in the viral N protein, which is required for viral replication. We also showed that loss or inhibition of SRPK1/2, which we propose initiates the N protein phosphorylation cascade, compromised the viral replication cycle. Because these phosphorylation sites are highly conserved across coronaviruses, inhibitors of these protein kinases not only may have therapeutic potential against COVID-19 but also may be broadly useful against coronavirus-mediated diseases.
Smoldering multiple myeloma (SMM) is a precursor condition of multiple myeloma (MM) with significant heterogeneity in disease progression. Existing clinical models of progression risk do not fully ...capture this heterogeneity. Here we integrate 42 genetic alterations from 214 SMM patients using unsupervised binary matrix factorization (BMF) clustering and identify six distinct genetic subtypes. These subtypes are differentially associated with established MM-related RNA signatures, oncogenic and immune transcriptional profiles, and evolving clinical biomarkers. Three genetic subtypes are associated with increased risk of progression to active MM in both the primary and validation cohorts, indicating they can be used to better predict high and low-risk patients within the currently used clinical risk stratification models.
Clonal hematopoiesis results from somatic mutations in cancer driver genes in hematopoietic stem cells. We sought to identify novel drivers of clonal expansion using an unbiased analysis of ...sequencing data from 84,683 persons and identified common mutations in the 5-methylcytosine reader,
, as well as in
,
, and
. We also identified these mutations at low frequency in myelodysplastic syndrome patients.
edited mouse hematopoietic stem and progenitor cells exhibited a competitive advantage
and increased genome-wide intron retention.
mutations potentially link DNA methylation and RNA splicing, the two most commonly mutated pathways in clonal hematopoiesis and MDS.