A central goal of genetics is to understand the links between genetic variation and disease. Intuitively, one might expect disease-causing variants to cluster into key pathways that drive disease ...etiology. But for complex traits, association signals tend to be spread across most of the genome—including near many genes without an obvious connection to disease. We propose that gene regulatory networks are sufficiently interconnected such that all genes expressed in disease-relevant cells are liable to affect the functions of core disease-related genes and that most heritability can be explained by effects on genes outside core pathways. We refer to this hypothesis as an “omnigenic” model.
Many complex genetic traits arise from large numbers of variants, each with small effects. This Perspective argues that risk is ultimately driven by an even larger number of genes with no direct impact on the phenotype or disease whose effects are propagated through regulatory networks.
Recent progress in deep learning has greatly improved the prediction of RNA splicing from DNA sequence. Here, we present Pangolin, a deep learning model to predict splice site strength in multiple ...tissues. Pangolin outperforms state-of-the-art methods for predicting RNA splicing on a variety of prediction tasks. Pangolin improves prediction of the impact of genetic variants on RNA splicing, including common, rare, and lineage-specific genetic variation. In addition, Pangolin identifies loss-of-function mutations with high accuracy and recall, particularly for mutations that are not missense or nonsense, demonstrating remarkable potential for identifying pathogenic variants.
Early genome-wide association studies (GWASs) led to the surprising discovery that, for typical complex traits, most of the heritability is due to huge numbers of common variants with tiny effect ...sizes. Previously, we argued that new models are needed to understand these patterns. Here, we provide a formal model in which genetic contributions to complex traits are partitioned into direct effects from core genes and indirect effects from peripheral genes acting in trans. We propose that most heritability is driven by weak trans-eQTL SNPs, whose effects are mediated through peripheral genes to impact the expression of core genes. In particular, if the core genes for a trait tend to be co-regulated, then the effects of peripheral variation can be amplified such that nearly all of the genetic variance is driven by weak trans effects. Thus, our model proposes a framework for understanding key features of the architecture of complex traits.
Display omitted
•We propose a quantitative phenotype model based on core and peripheral genes•Model is parameterized using data on cis and trans heritability of gene expression•Analysis implies that heritability explained by trans-acting variants is at least 70%•Co-regulation of core genes can further amplify the contribution of trans effects
Development of the “omnigenic” model to encompass specific effects on gene expression provides a defined framework for testing how variants in core and peripheral genes reflect genetic heritability.
Genome-wide association studies (GWAS) have identified over 41 susceptibility loci associated with Parkinson's Disease (PD) but identifying putative causal genes and the underlying mechanisms remains ...challenging. Here, we leverage large-scale transcriptomic datasets to prioritize genes that are likely to affect PD by using a transcriptome-wide association study (TWAS) approach. Using this approach, we identify 66 gene associations whose predicted expression or splicing levels in dorsolateral prefrontal cortex (DLFPC) and peripheral monocytes are significantly associated with PD risk. We uncover many novel genes associated with PD but also novel mechanisms for known associations such as MAPT, for which we find that variation in exon 3 splicing explains the common genetic association. Genes identified in our analyses belong to the same or related pathways including lysosomal and innate immune function. Overall, our study provides a strong foundation for further mechanistic studies that will elucidate the molecular drivers of PD.
Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled ...haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci. We identified 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterized 130 of the most active mobile element source elements and found that 63% of all SVs arise through homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population.
Noncoding variants play a central role in the genetics of complex traits, but we still lack a full understanding of the molecular pathways through which they act. We quantified the contribution of ...cis-acting genetic effects at all major stages of gene regulation from chromatin to proteins, in Yoruba lymphoblastoid cell lines (LCLs). About ~65% of expression quantitative trait loci (eQTLs) have primary effects on chromatin, whereas the remaining eQTLs are enriched in transcribed regions. Using a novel method, we also detected 2893 splicing QTLs, most of which have little or no effect on gene-level expression. These splicing QTLs are major contributors to complex traits, roughly on a par with variants that affect gene expression levels. Our study provides a comprehensive view of the mechanisms linking genetic variation to variation in human gene regulation.
Integrating host and HBV characteristics, this study aimed to develop models for predicting long‐term cirrhosis and hepatocellular carcinoma (HCC) risk in chronic hepatitis B virus (HBV) patients. ...This analysis included hepatitis B surface antigen (HBsAg)‐seropositive and anti‐HCV‐seronegative participants from the Risk Evaluation of Viral Load Elevation and Associated Liver Disease/Cancer in HBV (R.E.V.E.A.L.‐HBV) cohort. Newly developed cirrhosis and HCC were ascertained through regular follow‐up ultrasonography, computerized linkage with national health databases, and medical chart reviews. Two‐thirds of the participants were allocated for risk model derivation and another one‐third for model validation. The risk prediction model included age, gender, HBV e antigen (HBeAg) serostatus, serum levels of HBV DNA, and alanine aminotransferase (ALT), quantitative serum HBsAg levels, and HBV genotypes. Additionally, the family history was included in the prediction model for HCC. Cox's proportional hazards regression coefficients for cirrhosis and HCC predictors were converted into risk scores. The areas under receiver operating curve (AUROCs) were used to evaluate the performance of risk models. Elder age, male, HBeAg, genotype C, and increasing levels of ALT, HBV DNA, and HBsAg were all significantly associated with an increased risk of cirrhosis and HCC. The risk scores estimated from the derivation set could accurately categorize participants with low, medium, and high cirrhosis and HCC risk in the validation set (P < 0.001). The AUROCs for predicting 3‐year, 5‐year, and 10‐year cirrhosis risk ranged 0.83‐0.86 and 0.79‐0.82 for the derivation and validation sets, respectively. The AUROC for predicting 5‐year, 10‐year, 15‐year risk of HCC ranged 0.86‐0.89 and 0.84‐0.87 in the derivation and validation sets, respectively. Conclusion: The risk prediction models of cirrhosis and HCC by integrating host and HBV profiles have excellent prediction accuracy and discriminatory ability. They may be used for clinical management of chronic hepatitis B patients. (Hepatology 2013;58:546‐554)
Ninety-four percent of mammalian protein-coding exons exceed 51 nucleotides (nt) in length. The paucity of micro-exons (≤ 51 nt) suggests that their recognition and correct processing by the splicing ...machinery present greater challenges than for longer exons. Yet, because thousands of human genes harbor processed micro-exons, specialized mechanisms may be in place to promote their splicing. Here, we survey deep genomic data sets to define 13,085 micro-exons and to study their splicing mechanisms and molecular functions. More than 60% of annotated human micro-exons exhibit a high level of sequence conservation, an indicator of functionality. While most human micro-exons require splicing-enhancing genomic features to be processed, the splicing of hundreds of micro-exons is enhanced by the adjacent binding of splice factors in the introns of pre-messenger RNAs. Notably, splicing of a significant number of micro-exons was found to be facilitated by the binding of RBFOX proteins, which promote their inclusion in the brain, muscle, and heart. Our analyses suggest that accurate regulation of micro-exon inclusion by RBFOX proteins and PTBP1 plays an important role in the maintenance of tissue-specific protein-protein interactions.
Background COVID-19 vaccination is essential. However, no study has reported adverse events (AEs) after ChAdOx1 nCoV-19 vaccination in patients with end-stage renal disease (ESRD) on hemodialysis ...(HD). This study investigated the AEs within 30-days after the first dose of ChAdOx1 nCoV19 (Oxford-AstraZeneca) in ESRD patients on HD. Methods and findings A total of 270 ESRD patients on HD were enrolled in this study. To determine the significance of vascular access thrombosis (VAT) post vaccination, we performed a self-controlled case study (SCCS) analysis. Of these patients, 38.5% had local AEs; local pain (29.6%), tenderness (28.9%), and induration (15.6%) were the most common. Further, 62.2% had systemic AEs; fatigue (41.1%), feverishness (20%), and lethargy (19.9%) were the most common. In addition, post-vaccination thirst affected 18.9% of the participants with female predominance. Younger age, female sex, and diabetes mellitus were risk factors for AEs. Five patients had severe AEs, including fever (n = 1), herpes zoster (HZ) reactivation (n = 1), and acute VAT (n = 3). However, the SCCS analysis revealed no association between vaccination and VAT; the incidence rate ratio (IRR)-person ratio was 0.56 (95% CI 0.13-2.33) and 0.78 (95% CI 0.20-2.93) IRR-event ratio 0.78 (95% CI 0.15-4.10) and 1.00 (95% CI 0.20-4.93) in the 0-3 months and 3-6 months period prior to vaccination, respectively. Conclusions Though some ESRD patients on HD had local and systemic AEs after first-dose vaccination, the clinical significance of these symptoms was minor. Our study confirmed the safety profile of ChAdOx1 nCoV-19 in HD patients and presented a new viewpoint on vaccine-related AEs. The SCCS analysis did not find an elevated risk of VAT at 1 month following vaccination. Apart from VAT, other vaccine-related AEs, irrespective of local or systemic symptoms, had minor clinical significance on safety issues. Nonetheless, further coordinated, multi-center, or registry-based studies are needed to establish the causality.
In order to provide a comprehensive resource for human structural variants (SVs), we generated long-read sequence data and analyzed SVs for fifteen human genomes. We sequence resolved 99,604 ...insertions, deletions, and inversions including 2,238 (1.6 Mbp) that are shared among all discovery genomes with an additional 13,053 (6.9 Mbp) present in the majority, indicating minor alleles or errors in the reference. Genotyping in 440 additional genomes confirms the most common SVs in unique euchromatin are now sequence resolved. We report a ninefold SV bias toward the last 5 Mbp of human chromosomes with nearly 55% of all VNTRs (variable number of tandem repeats) mapping to this portion of the genome. We identify SVs affecting coding and noncoding regulatory loci improving annotation and interpretation of functional variation. These data provide the framework to construct a canonical human reference and a resource for developing advanced representations capable of capturing allelic diversity.
Display omitted
•We sequence resolve and annotate 99,604 common human structural variants•55% of VNTRs map to the end of chromosomes and correlate with double-strand breaks•Alternate alleles facilitate accurate genotyping with short reads and new associations•We patch the reference and add diversity needed for developing a pan human genome
Long-read sequencing allows generation of a large catalog of human structural variants and the development of an algorithm for genotyping SVs from short-read data, clarifying the spectrum and importance of structural variation in the human genome.