ABSTRACT
The purpose of the dbNSFP is to provide a one‐stop resource for functional predictions and annotations for human nonsynonymous single‐nucleotide variants (nsSNVs) and splice‐site variants ...(ssSNVs), and to facilitate the steps of filtering and prioritizing SNVs from a large list of SNVs discovered in an exome‐sequencing study. A list of all potential nsSNVs and ssSNVs based on the human reference sequence were created and functional predictions and annotations were curated and compiled for each SNV. Here, we report a recent major update of the database to version 3.0. The SNV list has been rebuilt based on GENCODE 22 and currently the database includes 82,832,027 nsSNVs and ssSNVs. An attached database dbscSNV, which compiled all potential human SNVs within splicing consensus regions and their deleteriousness predictions, add another 15,030,459 potentially functional SNVs. Eleven prediction scores (MetaSVM, MetaLR, CADD, VEST3, PROVEAN, 4× fitCons, fathmm‐MKL, and DANN) and allele frequencies from the UK10K cohorts and the Exome Aggregation Consortium (ExAC), among others, have been added. The original seven prediction scores in v2.0 (SIFT, 2× Polyphen2, LRT, MutationTaster, MutationAssessor, and FATHMM) as well as many SNV and gene functional annotations have been updated. dbNSFP v3.0 is freely available at http://sites.google.com/site/jpopgen/dbNSFP.
The purpose of the dbNSFP is to provide a one‐stop resource for functional predictions and annotations for human non‐synonymous single‐nucleotide variants (nsSNVs) and splice site variants (ssSNVs), and to facilitate the steps of filtering and prioritizing SNVs from a large list of SNVs discovered in an exome‐sequencing study. Here we report a recent major update of the database to version 3.0 and some preliminary analyses comparing the 24 functional prediction scores and conservation scores in dbNSFP v3.0.
In silico tools have been developed to predict variants that may have an impact on pre-mRNA splicing. The major limitation of the application of these tools to basic research and clinical practice is ...the difficulty in interpreting the output. Most tools only predict potential splice sites given a DNA sequence without measuring splicing signal changes caused by a variant. Another limitation is the lack of large-scale evaluation studies of these tools. We compared eight in silico tools on 2959 single nucleotide variants within splicing consensus regions (scSNVs) using receiver operating characteristic analysis. The Position Weight Matrix model and MaxEntScan outperformed other methods. Two ensemble learning methods, adaptive boosting and random forests, were used to construct models that take advantage of individual methods. Both models further improved prediction, with outputs of directly interpretable prediction scores. We applied our ensemble scores to scSNVs from the Catalogue of Somatic Mutations in Cancer database. Analysis showed that predicted splice-altering scSNVs are enriched in recurrent scSNVs and known cancer genes. We pre-computed our ensemble scores for all potential scSNVs across the human genome, providing a whole genome level resource for identifying splice-altering scSNVs discovered from large-scale sequencing studies.
Set-based analysis that jointly tests the association of variants in a group has emerged as a popular tool for analyzing rare and low-frequency variants in sequencing studies. The existing set-based ...tests can suffer significant power loss when only a small proportion of variants are causal, and their powers can be sensitive to the number, effect sizes, and effect directions of the causal variants and the choices of weights. Here we propose an aggregated Cauchy association test (ACAT), a general, powerful, and computationally efficient p value combination method for boosting power in sequencing studies. First, by combining variant-level p values, we use ACAT to construct a set-based test (ACAT-V) that is particularly powerful in the presence of only a small number of causal variants in a variant set. Second, by combining different variant-set-level p values, we use ACAT to construct an omnibus test (ACAT-O) that combines the strength of multiple complimentary set-based tests, including the burden test, sequence kernel association test (SKAT), and ACAT-V. Through analysis of extensively simulated data and the whole-genome sequencing data from the Atherosclerosis Risk in Communities (ARIC) study, we demonstrate that ACAT-V complements the SKAT and the burden test, and that ACAT-O has a substantially more robust and higher power than those of the alternative tests.
Nonalcoholic fatty liver disease (NAFLD) is a burgeoning health problem of unknown etiology that varies in prevalence among ancestry groups. To identify genetic variants contributing to differences ...in hepatic fat content, we carried out a genome-wide association scan of nonsynonymous sequence variations (n = 9,229) in a population comprising Hispanic, African American and European American individuals. An allele in PNPLA3 (rs738409G, encoding I148M) was strongly associated with increased hepatic fat levels (P = 5.9 × 10−10) and with hepatic inflammation (P = 3.7 × 10−4). The allele was most common in Hispanics, the group most susceptible to NAFLD; hepatic fat content was more than twofold higher in PNPLA3 rs738409G homozygotes than in noncarriers. Resequencing revealed another allele of PNPLA3 (rs6006460T, encoding S453I) that was associated with lower hepatic fat content in African Americans, the group at lowest risk of NAFLD. Thus, variation in PNPLA3 contributes to ancestry-related and inter-individual differences in hepatic fat content and susceptibility to NAFLD.
RNA splicing is the process during which introns are excised and exons are spliced. The precise recognition of splicing signals is critical to this process, and mutations affecting splicing comprise ...a considerable proportion of genetic disease etiology. Analysis of RNA samples from the patient is the most straightforward and reliable method to detect splicing defects. However, currently, the technical limitation prohibits its use in routine clinical practice. In silico tools that predict potential consequences of splicing mutations may be useful in daily diagnostic activities. In this review, we provide medical geneticists with some basic insights into some of the most popular in silico tools for splicing defect prediction, from the viewpoint of end users. Bioinformaticians in relevant areas who are working on huge data sets may also benefit from this review. Specifically, we focus on those tools whose primary goal is to predict the impact of mutations within the 5' and 3' splicing consensus regions: the algorithms used by different tools as well as their major advantages and disadvantages are briefly introduced; the formats of their input and output are summarized; and the interpretation, evaluation, and prospection are also discussed.
The relative activity of lipoprotein lipase (LPL) in different tissues controls the partitioning of lipoprotein-derived fatty acids between sites of fat storage (adipose tissue) and oxidation (heart ...and skeletal muscle). Here we used a reverse genetic strategy to test the hypothesis that 4 angiopoietin-like proteins (ANGPTL3, -4, -5, and -6) play key roles in triglyceride (TG) metabolism in humans. We re-sequenced the coding regions of the genes encoding these proteins and identified multiple rare nonsynonymous (NS) sequence variations that were associated with low plasma TG levels but not with other metabolic phenotypes. Functional studies revealed that all mutant alleles of ANGPTL3 and ANGPTL4 that were associated with low plasma TG levels interfered either with the synthesis or secretion of the protein or with the ability of the ANGPTL protein to inhibit LPL. A total of 1% of the Dallas Heart Study population and 4% of those participants with a plasma TG in the lowest quartile had a rare loss-of-function mutation in ANGPTL3, ANGPTL4, or ANGPTL5. Thus, ANGPTL3, ANGPTL4, and ANGPTL5, but not ANGPTL6, play nonredundant roles in TG metabolism, and multiple alleles at these loci cumulatively contribute to variability in plasma TG levels in humans.
Using a polygenic score of DNA sequence polymorphisms, the authors of this study quantified genetic risk and assessed four healthy lifestyle factors. Among participants at high genetic risk, a ...healthy lifestyle was associated with a reduced risk of coronary disease.
Both genetic and lifestyle factors are key drivers of coronary artery disease, a complex disorder that is the leading cause of death worldwide.
1
A familial pattern in the risk of coronary artery disease was first described in 1938 and was subsequently confirmed in large studies involving twins and prospective cohorts.
2
–
6
Since 2007, genomewide association analyses have identified more than 50 independent loci associated with the risk of coronary artery disease.
7
–
15
These risk alleles, when aggregated into a polygenic risk score, are predictive of incident coronary events and provide a continuous and quantitative measure of genetic susceptibility.
16
–
24
Much . . .
Nomograms to predict normal aortic root diameter for body surface area (BSA) in broad ranges of age have been widely used but are limited by lack of consideration of gender effects, jumps in upper ...limits of aortic diameter among age strata, and data from older teenagers. Sinus of Valsalva diameter was measured by American Society of Echocardiography convention in normal-weight, nonhypertensive, nondiabetic subjects ≥15 years old without aortic valve disease from clinical or population-based samples. Analyses of covariance and linear regression with assessment of residuals identified determinants and developed predictive models for normal aortic root diameter. In 1,207 apparently normal subjects ≥15 years old (54% women), aortic root diameter was 2.1 to 4.3 cm. Aortic root diameter was strongly related to BSA and height (r = 0.48 for the 2 comparisons), age (r = 0.36), and male gender (+2.7 mm adjusted for BSA and age, p <0.001 for all comparisons). Multivariable equations using age, gender, and BSA or height predicted aortic diameter strongly (R = 0.674 for the 2 comparisons, p <0.001) with minimal relation of residuals to age or body size: for BSA 2.423 + (age years × 0.009) + (BSA square meters × 0.461) − (gender 1 = man, 2 = woman × 0.267), SEE 0.261 cm; for height 1.519 + (age years × 0.010) + (height centimeters × 0.010) − (gender 1 = man, 2 = woman × 0.247), SEE 0.215 cm. In conclusion, aortic root diameter is larger in men and increases with body size and age. Regression models incorporating body size, age, and gender are applicable to adolescents and adults without limitations of previous nomograms.
IMPORTANCE: Clinical whole-exome sequencing is increasingly used for diagnostic evaluation of patients with suspected genetic disorders. OBJECTIVE: To perform clinical whole-exome sequencing and ...report (1) the rate of molecular diagnosis among phenotypic groups, (2) the spectrum of genetic alterations contributing to disease, and (3) the prevalence of medically actionable incidental findings such as FBN1 mutations causing Marfan syndrome. DESIGN, SETTING, AND PATIENTS: Observational study of 2000 consecutive patients with clinical whole-exome sequencing analyzed between June 2012 and August 2014. Whole-exome sequencing tests were performed at a clinical genetics laboratory in the United States. Results were reported by clinical molecular geneticists certified by the American Board of Medical Genetics and Genomics. Tests were ordered by the patient’s physician. The patients were primarily pediatric (1756 88%; mean age, 6 years; 888 females 44%, 1101 males 55%, and 11 fetuses 1% gender unknown), demonstrating diverse clinical manifestations most often including nervous system dysfunction such as developmental delay. MAIN OUTCOMES AND MEASURES: Whole-exome sequencing diagnosis rate overall and by phenotypic category, mode of inheritance, spectrum of genetic events, and reporting of incidental findings. RESULTS: A molecular diagnosis was reported for 504 patients (25.2%) with 58% of the diagnostic mutations not previously reported. Molecular diagnosis rates for each phenotypic category were 143/526 (27.2%; 95% CI, 23.5%-31.2%) for the neurological group, 282/1147 (24.6%; 95% CI, 22.1%-27.2%) for the neurological plus other organ systems group, 30/83 (36.1%; 95% CI, 26.1%-47.5%) for the specific neurological group, and 49/244 (20.1%; 95% CI, 15.6%-25.8%) for the nonneurological group. The Mendelian disease patterns of the 527 molecular diagnoses included 280 (53.1%) autosomal dominant, 181 (34.3%) autosomal recessive (including 5 with uniparental disomy), 65 (12.3%) X-linked, and 1 (0.2%) mitochondrial. Of 504 patients with a molecular diagnosis, 23 (4.6%) had blended phenotypes resulting from 2 single gene defects. About 30% of the positive cases harbored mutations in disease genes reported since 2011. There were 95 medically actionable incidental findings in genes unrelated to the phenotype but with immediate implications for management in 92 patients (4.6%), including 59 patients (3%) with mutations in genes recommended for reporting by the American College of Medical Genetics and Genomics. CONCLUSIONS AND RELEVANCE: Whole-exome sequencing provided a potential molecular diagnosis for 25% of a large cohort of patients referred for evaluation of suspected genetic conditions, including detection of rare genetic events and new mutations contributing to disease. The yield of whole-exome sequencing may offer advantages over traditional molecular diagnostic approaches in certain patients.
Human diseases are caused by alleles that encompass the full range of variant types, from single-nucleotide changes to copy-number variants, and these variations span a broad frequency spectrum, from ...the very rare to the common. The picture emerging from analysis of whole-genome sequences, the 1000 Genomes Project pilot studies, and targeted genomic sequencing derived from very large sample sizes reveals an abundance of rare and private variants. One implication of this realization is that recent mutation may have a greater influence on disease susceptibility or protection than is conferred by variations that arose in distant ancestors.