Alcohol consumption level and alcohol use disorder (AUD) diagnosis are moderately heritable traits. We conduct genome-wide association studies of these traits using longitudinal Alcohol Use Disorder ...Identification Test-Consumption (AUDIT-C) scores and AUD diagnoses in a multi-ancestry Million Veteran Program sample (N = 274,424). We identify 18 genome-wide significant loci: 5 associated with both traits, 8 associated with AUDIT-C only, and 5 associated with AUD diagnosis only. Polygenic Risk Scores (PRS) for both traits are associated with alcohol-related disorders in two independent samples. Although a significant genetic correlation reflects the overlap between the traits, genetic correlations for 188 non-alcohol-related traits differ significantly for the two traits, as do the phenotypes associated with the traits' PRS. Cell type group partitioning heritability enrichment analyses also differentiate the two traits. We conclude that, although heavy drinking is a key risk factor for AUD, it is not a sufficient cause of the disorder.
Genome-wide association analysis of cohorts with thousands of phenotypes is computationally expensive, particularly when accounting for sample relatedness or population structure. Here we present a ...novel machine-learning method called REGENIE for fitting a whole-genome regression model for quantitative and binary phenotypes that is substantially faster than alternatives in multi-trait analyses while maintaining statistical efficiency. The method naturally accommodates parallel analysis of multiple phenotypes and requires only local segments of the genotype matrix to be loaded in memory, in contrast to existing alternatives, which must load genome-wide matrices into memory. This results in substantial savings in compute time and memory usage. We introduce a fast, approximate Firth logistic regression test for unbalanced case-control phenotypes. The method is ideally suited to take advantage of distributed computing frameworks. We demonstrate the accuracy and computational benefits of this approach using the UK Biobank dataset with up to 407,746 individuals.
Ribosome-profiling has uncovered pervasive translation in non-canonical open reading frames, however the biological significance of this phenomenon remains unclear. Using genetic variation from ...71,702 human genomes, we assess patterns of selection in translated upstream open reading frames (uORFs) in 5'UTRs. We show that uORF variants introducing new stop codons, or strengthening existing stop codons, are under strong negative selection comparable to protein-coding missense variants. Using these variants, we map and validate gene-disease associations in two independent biobanks containing exome sequencing from 10,900 and 32,268 individuals, respectively, and elucidate their impact on protein expression in human cells. Our results suggest translation disrupting mechanisms relating uORF variation to reduced protein expression, and demonstrate that translation at uORFs is genetically constrained in 50% of human genes.
Inflammatory bowel disease (IBD), clinically defined as Crohn's disease (CD), ulcerative colitis (UC), or IBD-unclassified, results in chronic inflammation of the gastrointestinal tract in ...genetically susceptible hosts. Pediatric onset IBD represents ≥ 25% of all IBD diagnoses and often presents with intestinal stricturing, perianal disease, and failed response to conventional treatments. NOD2 was the first and is the most replicated locus associated with adult IBD, to date. However, its role in pediatric onset IBD is not well understood. We performed whole-exome sequencing on a cohort of 1,183 patients with pediatric onset IBD (ages 0-18.5 years). We identified 92 probands with biallelic rare and low frequency NOD2 variants accounting for approximately 8% of our cohort, suggesting a Mendelian inheritance pattern of disease. Additionally, we investigated the contribution of recessive inheritance of NOD2 alleles in adult IBD patients from a large clinical population cohort. We found that recessive inheritance of NOD2 variants explains ~ 7% of cases in this adult IBD cohort, including ~ 10% of CD cases, confirming the observations from our pediatric IBD cohort. Exploration of EHR data showed that several of these adult IBD patients obtained their initial IBD diagnosis before 18 years of age, consistent with early onset disease. While it has been previously reported that carriers of more than one NOD2 risk alleles have increased susceptibility to Crohn's Disease (CD), our data formally demonstrate that recessive inheritance of NOD2 alleles is a mechanistic driver of early onset IBD, specifically CD, likely due to loss of NOD2 protein function. Collectively, our findings show that recessive inheritance of rare and low frequency deleterious NOD2 variants account for 7-10% of CD cases and implicate NOD2 as a Mendelian disease gene for early onset Crohn's Disease.
Familial hypercholesterolemia (FH) remains underdiagnosed despite widespread cholesterol screening. Exome sequencing and electronic health record (EHR) data of 50,726 individuals were used to assess ...the prevalence and clinical impact of FH-associated genomic variants in the Geisinger Health System. The estimated FH prevalence was 1:256 in unselected participants and 1:118 in participants ascertained via the cardiac catheterization laboratory. FH variant carriers had significantly increased risk of coronary artery disease. Only 24% of carriers met EHR-based presequencing criteria for probable or definite FH diagnosis. Active statin use was identified in 58% of carriers; 46% of statin-treated carriers had a low-density lipoprotein cholesterol level below 100 mg/dl. Thus, we find that genomic screening can prompt the diagnosis of FH patients, most of whom are receiving inadequate lipid-lowering therapy.
Higher-than-normal levels of circulating triglycerides are a risk factor for ischemic cardiovascular disease. Activation of lipoprotein lipase, an enzyme that is inhibited by angiopoietin-like 4 ...(ANGPTL4), has been shown to reduce levels of circulating triglycerides.
We sequenced the exons of ANGPTL4 in samples obtain from 42,930 participants of predominantly European ancestry in the DiscovEHR human genetics study. We performed tests of association between lipid levels and the missense E40K variant (which has been associated with reduced plasma triglyceride levels) and other inactivating mutations. We then tested for associations between coronary artery disease and the E40K variant and other inactivating mutations in 10,552 participants with coronary artery disease and 29,223 controls. We also tested the effect of a human monoclonal antibody against ANGPTL4 on lipid levels in mice and monkeys.
We identified 1661 heterozygotes and 17 homozygotes for the E40K variant and 75 participants who had 13 other monoallelic inactivating mutations in ANGPTL4. The levels of triglycerides were 13% lower and the levels of high-density lipoprotein (HDL) cholesterol were 7% higher among carriers of the E40K variant than among noncarriers. Carriers of the E40K variant were also significantly less likely than noncarriers to have coronary artery disease (odds ratio, 0.81; 95% confidence interval, 0.70 to 0.92; P=0.002). K40 homozygotes had markedly lower levels of triglycerides and higher levels of HDL cholesterol than did heterozygotes. Carriers of other inactivating mutations also had lower triglyceride levels and higher HDL cholesterol levels and were less likely to have coronary artery disease than were noncarriers. Monoclonal antibody inhibition of Angptl4 in mice and monkeys reduced triglyceride levels.
Carriers of E40K and other inactivating mutations in ANGPTL4 had lower levels of triglycerides and a lower risk of coronary artery disease than did noncarriers. The inhibition of Angptl4 in mice and monkeys also resulted in corresponding reductions in these values. (Funded by Regeneron Pharmaceuticals.).
Pulmonary arterial hypertension (PAH) is a rare disease characterized by pulmonary arteriole remodeling, elevated arterial pressure and resistance, and subsequent heart failure. Compared with ...adult-onset disease, pediatric-onset PAH is more heterogeneous and often associated with worse prognosis. Although
mutations underlie ≈70% of adult familial PAH (FPAH) cases, the genetic basis of PAH in children is less understood.
We performed genetic analysis of 155 pediatric- and 257 adult-onset PAH patients, including both FPAH and sporadic, idiopathic PAH (IPAH). After screening for 2 common PAH risk genes, mutation-negative FPAH and all IPAH cases were evaluated by exome sequencing.
We observed similar frequencies of rare, deleterious
mutations in pediatric- and adult-onset patients: ≈55% in FPAH and 10% in IPAH patients in both age groups. However, there was significant enrichment of
mutations in pediatric- compared with adult-onset patients (IPAH: 10/130 pediatric versus 0/178 adult-onset), and
carriers had younger mean age-of-onset compared with
carriers. Mutations in other known PAH risk genes were infrequent in both age groups. Notably, among pediatric IPAH patients without mutations in known risk genes, exome sequencing revealed a 2-fold enrichment of de novo likely gene-damaging and predicted deleterious missense variants.
Mutations in known PAH risk genes accounted for ≈70% to 80% of FPAH in both age groups, 21% of pediatric-onset IPAH, and 11% of adult-onset IPAH. Rare, predicted deleterious variants in
are enriched in pediatric patients and de novo variants in novel genes may explain ≈19% of pediatric-onset IPAH cases.
Pulmonary arterial hypertension (PAH) is a rare disease characterized by distinctive changes in pulmonary arterioles that lead to progressive pulmonary arterial pressures, right-sided heart failure, ...and a high mortality rate. Up to 30% of adult and 75% of pediatric PAH cases are associated with congenital heart disease (PAH-CHD), and the underlying etiology is largely unknown. There are no known major risk genes for PAH-CHD.
To identify novel genetic causes of PAH-CHD, we performed whole exome sequencing in 256 PAH-CHD patients. We performed a case-control gene-based association test of rare deleterious variants using 7509 gnomAD whole genome sequencing population controls. We then screened a separate cohort of 413 idiopathic and familial PAH patients without CHD for rare deleterious variants in the top association gene.
We identified SOX17 as a novel candidate risk gene (p = 5.5e-7). SOX17 is highly constrained and encodes a transcription factor involved in Wnt/β-catenin and Notch signaling during development. We estimate that rare deleterious variants contribute to approximately 3.2% of PAH-CHD cases. The coding variants identified include likely gene-disrupting (LGD) and deleterious missense, with most of the missense variants occurring in a highly conserved HMG-box protein domain. We further observed an enrichment of rare deleterious variants in putative targets of SOX17, many of which are highly expressed in developing heart and pulmonary vasculature. In the cohort of PAH without CHD, rare deleterious variants of SOX17 were observed in 0.7% of cases.
These data strongly implicate SOX17 as a new risk gene contributing to PAH-CHD as well as idiopathic/familial PAH. Replication in other PAH cohorts and further characterization of the clinical phenotype will be important to confirm the precise role of SOX17 and better estimate the contribution of genes regulated by SOX17.
A promise of genomics in precision medicine is to provide individualized genetic risk predictions. Polygenic risk scores (PRS), computed by aggregating effects from many genomic variants, have been ...developed as a useful tool in complex disease research. However, the application of PRS as a tool for predicting an individual's disease susceptibility in a clinical setting is challenging because PRS typically provide a relative measure of risk evaluated at the level of a group of people but not at individual level. Here, we introduce a machine-learning technique, Mondrian Cross-Conformal Prediction (MCCP), to estimate the confidence bounds of PRS-to-disease-risk prediction. MCCP can report disease status conditional probability value for each individual and give a prediction at a desired error level. Moreover, with a user-defined prediction error rate, MCCP can estimate the proportion of sample (coverage) with a correct prediction.
Projections of the stage of the Severe Acute Respiratory Syndrome-Coronavirus-2 (SARS-CoV-2) pandemic and local, regional and national public health policies to limit coronavirus spread as well as ..."reopen" cities and states, are best informed by serum neutralizing antibody titers measured by reproducible, high throughput, and statically credible antibody (Ab) assays. To date, a myriad of Ab tests, both available and FDA authorized for emergency, has led to confusion rather than insight per se. The present study reports the results of a rapid, point-in-time 1,000-person cohort study using serial blood donors in the New York City metropolitan area (NYC) using multiple serological tests, including enzyme-linked immunosorbent assays (ELISAs) and high throughput serological assays (HTSAs). These were then tested and associated with assays for neutralizing Ab (NAb). Of the 1,000 NYC blood donor samples in late June and early July 2020, 12.1% and 10.9% were seropositive using the Ortho Total Ig and the Abbott IgG HTSA assays, respectively. These serological assays correlated with neutralization activity specific to SARS-CoV-2. The data reported herein suggest that seroconversion in this population occurred in approximately 1 in 8 blood donors from the beginning of the pandemic in NYC (considered March 1, 2020). These findings deviate with an earlier seroprevalence study in NYC showing 13.7% positivity. Collectively however, these data demonstrate that a low number of individuals have serologic evidence of infection during this "first wave" and suggest that the notion of "herd immunity" at rates of ~60% or higher are not near. Furthermore, the data presented herein show that the nature of the Ab-based immunity is not invariably associated with the development of NAb. While the blood donor population may not mimic precisely the NYC population as a whole, rapid assessment of seroprevalence in this cohort and serial reassessment could aid public health decision making.