The number of human genomes being genotyped or sequenced increases exponentially and efficient haplotype estimation methods able to handle this amount of data are now required. Here we present a ...method, SHAPEIT4, which substantially improves upon other methods to process large genotype and high coverage sequencing datasets. It notably exhibits sub-linear running times with sample size, provides highly accurate haplotypes and allows integrating external phasing information such as large reference panels of haplotypes, collections of pre-phased variants and long sequencing reads. We provide SHAPEIT4 in an open source format and demonstrate its performance in terms of accuracy and running times on two gold standard datasets: the UK Biobank data and the Genome In A Bottle.
Health risk factors such as body mass index (BMI) and serum cholesterol are associated with many common diseases. It often remains unclear whether the risk factors are cause or consequence of ...disease, or whether the associations are the result of confounding. We develop and apply a method (called GSMR) that performs a multi-SNP Mendelian randomization analysis using summary-level data from genome-wide association studies to test the causal associations of BMI, waist-to-hip ratio, serum cholesterols, blood pressures, height, and years of schooling (EduYears) with common diseases (sample sizes of up to 405,072). We identify a number of causal associations including a protective effect of LDL-cholesterol against type-2 diabetes (T2D) that might explain the side effects of statins on T2D, a protective effect of EduYears against Alzheimer's disease, and bidirectional associations with opposite effects (e.g., higher BMI increases the risk of T2D but the effect of T2D on BMI is negative).
Genome-wide association studies (GWAS) have identified thousands of genetic variants associated with human complex traits. However, the genes or functional DNA elements through which these variants ...exert their effects on the traits are often unknown. We propose a method (called SMR) that integrates summary-level data from GWAS with data from expression quantitative trait locus (eQTL) studies to identify genes whose expression levels are associated with a complex trait because of pleiotropy. We apply the method to five human complex traits using GWAS data on up to 339,224 individuals and eQTL data on 5,311 individuals, and we prioritize 126 genes (for example, TRAF1 and ANKRD55 for rheumatoid arthritis and SNX19 and NMRAL1 for schizophrenia), of which 25 genes are new candidates; 77 genes are not the nearest annotated gene to the top associated GWAS SNP. These genes provide important leads to design future functional studies to understand the mechanism whereby DNA variation leads to complex trait variation.
We develop a Bayesian mixed linear model that simultaneously estimates single-nucleotide polymorphism (SNP)-based heritability, polygenicity (proportion of SNPs with nonzero effects), and the ...relationship between SNP effect size and minor allele frequency for complex traits in conventionally unrelated individuals using genome-wide SNP data. We apply the method to 28 complex traits in the UK Biobank data (N = 126,752) and show that on average, 6% of SNPs have nonzero effects, which in total explain 22% of phenotypic variance. We detect significant (P < 0.05/28) signatures of natural selection in the genetic architecture of 23 traits, including reproductive, cardiovascular, and anthropometric traits, as well as educational attainment. The significant estimates of the relationship between effect size and minor allele frequency in complex traits are consistent with a model of negative (or purifying) selection, as confirmed by forward simulation. We conclude that negative selection acts pervasively on the genetic variants associated with human complex traits.
The human proteome is a major source of therapeutic targets. Recent genetic association analyses of the plasma proteome enable systematic evaluation of the causal consequences of variation in plasma ...protein levels. Here we estimated the effects of 1,002 proteins on 225 phenotypes using two-sample Mendelian randomization (MR) and colocalization. Of 413 associations supported by evidence from MR, 130 (31.5%) were not supported by results of colocalization analyses, suggesting that genetic confounding due to linkage disequilibrium is widespread in naïve phenome-wide association studies of proteins. Combining MR and colocalization evidence in cis-only analyses, we identified 111 putatively causal effects between 65 proteins and 52 disease-related phenotypes ( https://www.epigraphdb.org/pqtl/ ). Evaluation of data from historic drug development programs showed that target-indication pairs with MR and colocalization support were more likely to be approved, evidencing the value of this approach in identifying and prioritizing potential therapeutic targets.
Recent advances have enabled noninvasive mapping of cardiac arrhythmias with electrocardiographic imaging and noninvasive delivery of precise ablative radiation with stereotactic body radiation ...therapy (SBRT). We combined these techniques to perform catheter-free, electrophysiology-guided, noninvasive cardiac radioablation for ventricular tachycardia.
We targeted arrhythmogenic scar regions by combining anatomical imaging with noninvasive electrocardiographic imaging during ventricular tachycardia that was induced by means of an implantable cardioverter-defibrillator (ICD). SBRT simulation, planning, and treatments were performed with the use of standard techniques. Patients were treated with a single fraction of 25 Gy while awake. Efficacy was assessed by counting episodes of ventricular tachycardia, as recorded by ICDs. Safety was assessed by means of serial cardiac and thoracic imaging.
From April through November 2015, five patients with high-risk, refractory ventricular tachycardia underwent treatment. The mean noninvasive ablation time was 14 minutes (range, 11 to 18). During the 3 months before treatment, the patients had a combined history of 6577 episodes of ventricular tachycardia. During a 6-week postablation "blanking period" (when arrhythmias may occur owing to postablation inflammation), there were 680 episodes of ventricular tachycardia. After the 6-week blanking period, there were 4 episodes of ventricular tachycardia over the next 46 patient-months, for a reduction from baseline of 99.9%. A reduction in episodes of ventricular tachycardia occurred in all five patients. The mean left ventricular ejection fraction did not decrease with treatment. At 3 months, adjacent lung showed opacities consistent with mild inflammatory changes, which had resolved by 1 year.
In five patients with refractory ventricular tachycardia, noninvasive treatment with electrophysiology-guided cardiac radioablation markedly reduced the burden of ventricular tachycardia. (Funded by Barnes-Jewish Hospital Foundation and others.).
Next-generation sequencing technology is transforming our understanding of heterozygous familial hypercholesterolemia, including revision of prevalence estimates and attribution of polygenic effects. ...Here, we examined the contributions of monogenic and polygenic factors in patients with severe hypercholesterolemia referred to a specialty clinic.
We applied targeted next-generation sequencing with custom annotation, coupled with evaluation of large-scale copy number variation and polygenic scores for raised low-density lipoprotein cholesterol in a cohort of 313 individuals with severe hypercholesterolemia, defined as low-density lipoprotein cholesterol >5.0 mmol/L (>194 mg/dL). We found that (1) monogenic familial hypercholesterolemia-causing mutations detected by targeted next-generation sequencing were present in 47.3% of individuals; (2) the percentage of individuals with monogenic mutations increased to 53.7% when copy number variations were included; (3) the percentage further increased to 67.1% when individuals with extreme polygenic scores were included; and (4) the percentage of individuals with an identified genetic component increased from 57.0% to 92.0% as low-density lipoprotein cholesterol level increased from 5.0 to >8.0 mmol/L (194 to >310 mg/dL).
In a clinically ascertained sample with severe hypercholesterolemia, we found that most patients had a discrete genetic basis detected using a comprehensive screening approach that includes targeted next-generation sequencing, an assay for copy number variations, and polygenic trait scores.
Male pattern baldness (MPB) is a sex-limited, age-related, complex trait. We study MPB genetics in 205,327 European males from the UK Biobank. Here we show that MPB is strongly heritable and ...polygenic, with pedigree-heritability of 0.62 (SE = 0.03) estimated from close relatives, and SNP-heritability of 0.39 (SE = 0.01) from conventionally-unrelated males. We detect 624 near-independent genome-wide loci, contributing SNP-heritability of 0.25 (SE = 0.01), of which 26 X-chromosome loci explain 11.6%. Autosomal genetic variance is enriched for common variants and regions of lower linkage disequilibrium. We identify plausible genetic correlations between MPB and multiple sex-limited markers of earlier puberty, increased bone mineral density (r
= 0.15) and pancreatic β-cell function (r
= 0.12). Correlations with reproductive traits imply an effect on fitness, consistent with an estimated linear selection gradient of -0.018 per MPB standard deviation. Overall, we provide genetic insights into MPB: a phenotype of interest in its own right, with value as a model sex-limited, complex trait.
The growing sample size of genome-wide association studies has facilitated the discovery of gene-environment interactions (GxE). Here we propose a maximum likelihood method to estimate the ...contribution of GxE to continuous traits taking into account all interacting environmental variables, without the need to measure any. Extensive simulations demonstrate that our method provides unbiased interaction estimates and excellent coverage. We also offer strategies to distinguish specific GxE from general scale effects. Applying our method to 32 traits in the UK Biobank reveals that while the genetic risk score (GRS) of 376 variants explains 5.2% of body mass index (BMI) variance, GRSxE explains an additional 1.9%. Nevertheless, this interaction holds for any variable with identical correlation to BMI as the GRS, hence may not be GRS-specific. Still, we observe that the global contribution of specific GRSxE to complex traits is substantial for nine obesity-related measures (including leg impedance and trunk fat-free mass).
Theory for liability-scale models of the underlying genetic basis of complex disease provides an important way to interpret, compare, and understand results generated from biological studies. In ...particular, through estimation of the liability-scale heritability (LSH), liability models facilitate an understanding and comparison of the relative importance of genetic and environmental risk factors that shape different clinically important disease outcomes. Increasingly, large-scale biobank studies that link genetic information to electronic health records, containing hundreds of disease diagnosis indicators that mostly occur infrequently within the sample, are becoming available. Here, we propose an extension of the existing liability-scale model theory suitable for estimating LSH in biobank studies of low-prevalence disease. In a simulation study, we find that our derived expression yields lower mean square error (MSE) and is less sensitive to prevalence misspecification as compared to previous transformations for diseases with ≤2% population prevalence and LSH of ≤0.45, especially if the biobank sample prevalence is less than that of the wider population. Applying our expression to 13 diagnostic outcomes of ≤3% prevalence in the UK Biobank study revealed important differences in LSH obtained from the different theoretical expressions that impact the conclusions made when comparing LSH across disease outcomes. This demonstrates the importance of careful consideration for estimation and prediction of low-prevalence disease outcomes and facilitates improved inference of the underlying genetic basis of ≤2% population prevalence diseases, especially where biobank sample ascertainment results in a healthier sample population.
Estimating the heritability of low-prevalence diseases in biobanks can lead to inconsistent and unrealistic results because of high estimator variance. Here, we propose a simple alternative that increases the heritability estimation accuracy for low-prevalence traits that is also suitable for ascertained samples.