Genome-wide association analysis of cohorts with thousands of phenotypes is computationally expensive, particularly when accounting for sample relatedness or population structure. Here we present a ...novel machine-learning method called REGENIE for fitting a whole-genome regression model for quantitative and binary phenotypes that is substantially faster than alternatives in multi-trait analyses while maintaining statistical efficiency. The method naturally accommodates parallel analysis of multiple phenotypes and requires only local segments of the genotype matrix to be loaded in memory, in contrast to existing alternatives, which must load genome-wide matrices into memory. This results in substantial savings in compute time and memory usage. We introduce a fast, approximate Firth logistic regression test for unbalanced case-control phenotypes. The method is ideally suited to take advantage of distributed computing frameworks. We demonstrate the accuracy and computational benefits of this approach using the UK Biobank dataset with up to 407,746 individuals.
Recent research has uncovered an important role for de novo variation in neurodevelopmental disorders. Using aggregated data from 9,246 families with autism spectrum disorder, intellectual ...disability, or developmental delay, we found that ∼1/3 of de novo variants are independently present as standing variation in the Exome Aggregation Consortium's cohort of 60,706 adults, and these de novo variants do not contribute to neurodevelopmental risk. We further used a loss-of-function (LoF)-intolerance metric, pLI, to identify a subset of LoF-intolerant genes containing the observed signal of associated de novo protein-truncating variants (PTVs) in neurodevelopmental disorders. LoF-intolerant genes also carry a modest excess of inherited PTVs, although the strongest de novo-affected genes contribute little to this excess, thus suggesting that the excess of inherited risk resides in lower-penetrant genes. These findings illustrate the importance of population-based reference cohorts for the interpretation of candidate pathogenic variants, even for analyses of complex diseases and de novo variation.
The splicing of pre-mRNAs into mature transcripts is remarkable for its precision, but the mechanisms by which the cellular machinery achieves such specificity are incompletely understood. Here, we ...describe a deep neural network that accurately predicts splice junctions from an arbitrary pre-mRNA transcript sequence, enabling precise prediction of noncoding genetic variants that cause cryptic splicing. Synonymous and intronic mutations with predicted splice-altering consequence validate at a high rate on RNA-seq and are strongly deleterious in the human population. De novo mutations with predicted splice-altering consequence are significantly enriched in patients with autism and intellectual disability compared to healthy controls and validate against RNA-seq in 21 out of 28 of these patients. We estimate that 9%–11% of pathogenic mutations in patients with rare genetic disorders are caused by this previously underappreciated class of disease variation.
Display omitted
•SpliceAI, a 32-layer deep neural network, predicts splicing from a pre-mRNA sequence•75% of predicted cryptic splice variants validate on RNA-seq•Cryptic splicing may yield ∼10% of pathogenic variants in neurodevelopmental disorders•Cryptic splice variants frequently give rise to alternative splicing
A deep neural network precisely models mRNA splicing from a genomic sequence and accurately predicts noncoding cryptic splice mutations in patients with rare genetic diseases.
The exome sequences of approximately 8,000 children with autism spectrum disorder (ASD) and/or attention deficit hyperactivity disorder (ADHD) and 5,000 controls were analyzed, finding that ...individuals with ASD and individuals with ADHD had a similar burden of rare protein-truncating variants in evolutionarily constrained genes, both significantly higher than controls. This motivated a combined analysis across ASD and ADHD, identifying microtubule-associated protein 1A (MAP1A) as a new exome-wide significant gene conferring risk for childhood psychiatric disorders.
Millions of human genomes and exomes have been sequenced, but their clinical applications remain limited due to the difficulty of distinguishing disease-causing mutations from benign genetic ...variation. Here we demonstrate that common missense variants in other primate species are largely clinically benign in human, enabling pathogenic mutations to be systematically identified by the process of elimination. Using hundreds of thousands of common variants from population sequencing of six non-human primate species, we train a deep neural network that identifies pathogenic mutations in rare disease patients with 88% accuracy and enables the discovery of 14 new candidate genes in intellectual disability at genome-wide significance. Cataloging common variation from additional primate species would improve interpretation for millions of variants of uncertain significance, further advancing the clinical utility of human genome sequencing.
A major goal in human genetics is to use natural variation to understand the phenotypic consequences of altering each protein-coding gene in the genome. Here we used exome sequencing
to explore ...protein-altering variants and their consequences in 454,787 participants in the UK Biobank study
. We identified 12 million coding variants, including around 1 million loss-of-function and around 1.8 million deleterious missense variants. When these were tested for association with 3,994 health-related traits, we found 564 genes with trait associations at P ≤ 2.18 × 10
. Rare variant associations were enriched in loci from genome-wide association studies (GWAS), but most (91%) were independent of common variant signals. We discovered several risk-increasing associations with traits related to liver disease, eye disease and cancer, among others, as well as risk-lowering associations for hypertension (SLC9A3R2), diabetes (MAP3K15, FAM234A) and asthma (SLC27A3). Six genes were associated with brain imaging phenotypes, including two involved in neural development (GBE1, PLD1). Of the signals available and powered for replication in an independent cohort, 81% were confirmed; furthermore, association signals were generally consistent across individuals of European, Asian and African ancestry. We illustrate the ability of exome sequencing to identify gene-trait associations, elucidate gene function and pinpoint effector genes that underlie GWAS signals at scale.
Alterations in non-driver genes represent an emerging class of potential therapeutic targets in cancer. Hundreds to thousands of non-driver genes undergo loss of heterozygosity (LOH) events per ...tumor, generating discrete differences between tumor and normal cells. Here we interrogate LOH of polymorphisms in essential genes as a novel class of therapeutic targets. We hypothesized that monoallelic inactivation of the allele retained in tumors can selectively kill cancer cells but not somatic cells, which retain both alleles. We identified 5664 variants in 1278 essential genes that undergo LOH in cancer and evaluated the potential for each to be targeted using allele-specific gene-editing, RNAi, or small-molecule approaches. We further show that allele-specific inactivation of either of two essential genes (PRIM1 and EXOSC8) reduces growth of cells harboring that allele, while cells harboring the non-targeted allele remain intact. We conclude that LOH of essential genes represents a rich class of non-driver cancer vulnerabilities.
Significance Autism spectrum disorder (ASD) research is complicated by heterogeneity. There are several types of genetic risk factors for ASDs, and that diversity may be reflected in case ...presentation. This study presents evidence for systematic variation in the genetic architecture of ASDs in which higher functioning cases, defined through cognitive and behavioral assessments, are more likely to manifest familial influences. This finding suggests that genetic and neurobiological research into ASDs and other neuropsychiatric disorders may be pursued more efficiently through greater phenotypic characterization.
Autism spectrum disorders (ASDs) are a highly heterogeneous group of conditions—phenotypically and genetically—although the link between phenotypic variation and differences in genetic architecture is unclear. This study aimed to determine whether differences in cognitive impairment and symptom severity reflect variation in the degree to which ASD cases reflect de novo or familial influences. Using data from more than 2,000 simplex cases of ASD, we examined the relationship between intelligence quotient (IQ), behavior and language assessments, and rate of de novo loss of function (LOF) mutations and family history of broadly defined psychiatric disease (depressive disorders, bipolar disorder, and schizophrenia; history of psychiatric hospitalization). Proband IQ was negatively associated with de novo LOF rate ( P = 0.03) and positively associated with family history of psychiatric disease ( P = 0.003). Female cases had a higher frequency of sporadic genetic events across the severity distribution ( P = 0.01). High rates of LOF mutation and low frequencies of family history of psychiatric illness were seen in individuals who were unable to complete a traditional IQ test, a group with the greatest degree of language and behavioral impairment. These analyses provide strong evidence that familial risk for neuropsychiatric disease becomes more relevant to ASD etiology as cases become higher functioning. The findings of this study reinforce that there are many routes to the diagnostic category of autism and could lead to genetic studies with more specific insights into individual cases.
There are established associations between advanced paternal age and offspring risk for psychiatric and developmental disorders. These are commonly attributed to genetic mutations, especially de novo ...single nucleotide variants (dnSNVs), that accumulate with increasing paternal age. However, the actual magnitude of risk from such mutations in the male germline is unknown. Quantifying this risk would clarify the clinical significance of delayed paternity. Using parent-child trio whole-exome-sequencing data, we estimate the relationship between paternal-age-related dnSNVs and risk for five disorders: autism spectrum disorder (ASD), congenital heart disease, neurodevelopmental disorders with epilepsy, intellectual disability and schizophrenia (SCZ). Using Danish registry data, we investigate whether epidemiologic associations between each disorder and older fatherhood are consistent with the estimated role of dnSNVs. We find that paternal-age-related dnSNVs confer a small amount of risk for these disorders. For ASD and SCZ, epidemiologic associations with delayed paternity reflect factors that may not increase with age.
Clonal haematopoiesis involves the expansion of certain blood cell lineages and has been associated with ageing and adverse health outcomes
. Here we use exome sequence data on 628,388 individuals to ...identify 40,208 carriers of clonal haematopoiesis of indeterminate potential (CHIP). Using genome-wide and exome-wide association analyses, we identify 24 loci (21 of which are novel) where germline genetic variation influences predisposition to CHIP, including missense variants in the lymphocytic antigen coding gene LY75, which are associated with reduced incidence of CHIP. We also identify novel rare variant associations with clonal haematopoiesis and telomere length. Analysis of 5,041 health traits from the UK Biobank (UKB) found relationships between CHIP and severe COVID-19 outcomes, cardiovascular disease, haematologic traits, malignancy, smoking, obesity, infection and all-cause mortality. Longitudinal and Mendelian randomization analyses revealed that CHIP is associated with solid cancers, including non-melanoma skin cancer and lung cancer, and that CHIP linked to DNMT3A is associated with the subsequent development of myeloid but not lymphoid leukaemias. Additionally, contrary to previous findings from the initial 50,000 UKB exomes
, our results in the full sample do not support a role for IL-6 inhibition in reducing the risk of cardiovascular disease among CHIP carriers. Our findings demonstrate that CHIP represents a complex set of heterogeneous phenotypes with shared and unique germline genetic causes and varied clinical implications.