Abstract
Motivation
Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disease caused by aberrations in the genome. While several disease-causing variants have been identified, a major part ...of heritability remains unexplained. ALS is believed to have a complex genetic basis where non-additive combinations of variants constitute disease, which cannot be picked up using the linear models employed in classical genotype–phenotype association studies. Deep learning on the other hand is highly promising for identifying such complex relations. We therefore developed a deep-learning based approach for the classification of ALS patients versus healthy individuals from the Dutch cohort of the Project MinE dataset. Based on recent insight that regulatory regions harbor the majority of disease-associated variants, we employ a two-step approach: first promoter regions that are likely associated to ALS are identified, and second individuals are classified based on their genotype in the selected genomic regions. Both steps employ a deep convolutional neural network. The network architecture accounts for the structure of genome data by applying convolution only to parts of the data where this makes sense from a genomics perspective.
Results
Our approach identifies potentially ALS-associated promoter regions, and generally outperforms other classification methods. Test results support the hypothesis that non-additive combinations of variants contribute to ALS. Architectures and protocols developed are tailored toward processing population-scale, whole-genome data. We consider this a relevant first step toward deep learning assisted genotype–phenotype association in whole genome-sized data.
Availability and implementation
Our code will be available on Github, together with a synthetic dataset (https://github.com/byin-cwi/ALS-Deeplearning). The data used in this study is available to bona-fide researchers upon request.
Supplementary information
Supplementary data are available at Bioinformatics online.
Amyotrophic lateral sclerosis (ALS) is considered to be caused by both genetic and environmental factors. The causal cascade is, however, not known. We aimed to assess lifestyle during the ...presymptomatic phase of ALS, stratified by C9orf72 mutation, and examine evidence supporting causality of lifestyle factors.
This study was a longitudinal, population-based, case-control study that used data from the Prospective ALS study the Netherlands. We included patients with a C9orf72 mutation (C9+ group), patients without a C9orf72 mutation (C9– group), and controls. Patients fulfilled the revised El Escorial criteria and were recruited through neurologists and rehabilitation physicians in the Netherlands as well as the Dutch Neuromuscular Patient Association and ALS Centrum website. 1322 population-based controls, matched for age and sex, were enrolled via the patients' general practitioners. Blood relatives or spouses of patients were not eligible as controls. We studied the relationship between ALS risk and smoking, alcohol, physical activity, body-mass index (BMI), and energy intake by the use of structured questionnaires. Smoking, physical activity, and BMI were longitudinally assessed up to 50 years before onset (defined as the period before onset of muscle weakness or bulbar symptoms for cases, or age at completing the questionnaire for controls). We calculated posterior probabilities (P(θ|x)) for causal effects of smoking, alcohol, and BMI, using Bayesian instrumental variable analyses.
Between Jan 1, 2006 and Jan 27, 2016, we included 143 patients in the C9+ group, 1322 patients in the C9– group, and 1322 controls. Compared with controls, cigarette pack-years (C9+ group mean difference from control 3·15, 95% CI 0·36 to 5·93, p=0·027; C9– group 3·20, 2·02 to 4·39, p<0·0001) and daily energy intake at symptom onset (C9+ group 712 kJ, 95% CI 212 to 1213, p=0·0053; C9– group 497, 295 to 700, p<0·0001) were higher in the C9+ and C9– groups, whereas current BMI (C9+ group −2·01 kg/m2, 95% CI −2·73 to −1·29, p<0·0001; C9– group −1·35, −1·64 to −1·06, p<0·0001) and lifetime alcohol consumption (C9+ group −5388 units, 95% CI −9113 to −1663, p=0·0046; C9– group −2185, −3748 to −622, p=0·0062) were lower in the C9+ and C9– groups. Median BMI during the presymptomatic phase for the C9+ group was lower (–0·69 kg/m2, 95% CI −1·24 to −0·13, p=0·015) and physical activity was similar (–348 metabolic equivalent of task MET, 95% CI −966 to 270, p=0·27) to controls, whereas both the median BMI during the presymptomatic phase (0·27 kg/m2, 95% CI 0·04 to 0·50, p=0·022) and physical activity (585 MET, 291 to 878, p=0·0001) were higher in the C9– group than controls. Longitudinal analyses showed more cigarette pack-years in the C9– (starting 47 years pre-onset) and C9+ (starting 24 years pre-onset) groups, and higher physical activity over time in the C9– group (starting >30 years pre-onset). BMI of the C9+ group increased more slowly and was significantly lower (starting at 36 years pre-onset) than in controls, whereas the BMI of the C9– group was higher than controls (23–49 years pre-onset, becoming lower 10 years pre-onset). Instrumental variable analyses supported causal effects of alcohol consumption (P(θ|x)=0·9347) and smoking (P(θ|x)=0·9859) on ALS in the C9– group. We found evidence supporting a causal effect of increased BMI at younger age (mean 33·8 years, SD 11·7) in the C9– group (Pθ|x=0·9272), but not at older ages.
Lifestyle during the presymptomatic phase differs between patients with ALS and controls decades before onset, depends on C9– status, and is probably part of the presymptomatic causal cascade. Identification of modifiable disease-causing lifestyle factors offers opportunities to lower risk of developing neurodegenerative disease.
Netherlands ALS Foundation.
Genetic mutations related to amyotrophic lateral sclerosis (ALS) act through distinct pathophysiological pathways, which may lead to varying treatment responses. Here we assess the genetic ...interaction between C9orf72, UNC13A, and MOBP with creatine and valproic acid treatment in two clinical trials. Genotypic data was available for 309 of the 338 participants (91.4%). The UNC13A genotype affected mortality (p = 0.012), whereas C9orf72 repeat-expansion carriers exhibited a faster rate of decline in overall (p = 0.051) and bulbar functioning (p = 0.005). A dose-response pharmacogenetic interaction was identified between creatine and the A allele of the MOBP genotype (p = 0.027), suggesting a qualitative interaction in a recessive model (HR 3.96, p = 0.015). Not taking genetic information into account may mask evidence of response to treatment or be an unrecognized source of bias. Incorporating genetic data could help investigators to identify critical treatment clues in patients with ALS.
The most recent genome-wide association study in amyotrophic lateral sclerosis (ALS) demonstrates a disproportionate contribution from low-frequency variants to genetic susceptibility to disease. We ...have therefore begun Project MinE, an international collaboration that seeks to analyze whole-genome sequence data of at least 15 000 ALS patients and 7500 controls. Here, we report on the design of Project MinE and pilot analyses of successfully sequenced 1169 ALS patients and 608 controls drawn from the Netherlands. As has become characteristic of sequencing studies, we find an abundance of rare genetic variation (minor allele frequency < 0.1%), the vast majority of which is absent in public datasets. Principal component analysis reveals local geographical clustering of these variants within The Netherlands. We use the whole-genome sequence data to explore the implications of poor geographical matching of cases and controls in a sequence-based disease study and to investigate how ancestry-matched, externally sequenced controls can induce false positive associations. Also, we have publicly released genome-wide minor allele counts in cases and controls, as well as results from genic burden tests.
Amyotrophic lateral sclerosis (ALS) is a rapidly progressive fatal neurodegenerative disease affecting one in 350 people. The aim of Project MinE is to elucidate the pathophysiology of ALS through ...whole-genome sequencing at least 15,000 ALS patients and 7500 controls at 30× coverage. Here, we present the Project MinE data browser (
databrowser.projectmine.com
), a unique and intuitive one-stop, open-access server that provides detailed information on genetic variation analyzed in a new and still growing set of 4366 ALS cases and 1832 matched controls. Through its visual components and interactive design, the browser specifically aims to be a resource to those without a biostatistics background and allow clinicians and preclinical researchers to integrate Project MinE data into their own research. The browser allows users to query a transcript and immediately access a unique combination of detailed (meta)data, annotations and association statistics that would otherwise require analytic expertise and visits to scattered resources.
Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease that affects 1 in ~350 individuals. Genetic association studies have established ALS as a multifactorial disease with ...heritability estimated at ~61%, and recent studies show a prominent role for rare variation in its genetic architecture. To identify rare variants associated with disease onset we performed exome array genotyping in 4,244 cases and 3,106 controls from European cohorts. In this largest exome-wide study of rare variants in ALS to date, we performed single-variant association testing, gene-based burden, and exome-wide individual set-unique burden (ISUB) testing to identify single or aggregated rare variation that modifies disease risk. In single-variant testing no variants reached exome-wide significance, likely due to limited statistical power. Gene-based burden testing of rare non-synonymous and loss-of-function variants showed NEK1 as the top associated gene. ISUB analysis did not show an increased exome-wide burden of deleterious variants in patients, possibly suggesting a more region-specific role for rare variation. Complete summary statistics are released publicly. This study did not implicate new risk loci, emphasizing the immediate need for future large-scale collaborations in ALS that will expand available sample sizes, increase genome coverage, and improve our ability to detect rare variants associated to ALS.
Objective
The role of the survival of motor neuron (SMN) gene in amyotrophic lateral sclerosis (ALS) is unclear, with several conflicting reports. A decisive result on this topic is needed, given ...that treatment options are available now for SMN deficiency.
Methods
In this largest multicenter case control study to evaluate the effect of SMN1 and SMN2 copy numbers in ALS, we used whole genome sequencing data from Project MinE data freeze 2. SMN copy numbers of 6,375 patients with ALS and 2,412 controls were called from whole genome sequencing data, and the reliability of the calls was tested with multiplex ligation‐dependent probe amplification data.
Results
The copy number distribution of SMN1 and SMN2 between cases and controls did not show any statistical differences (binomial multivariate logistic regression SMN1 p = 0.54 and SMN2 p = 0.49). In addition, the copy number of SMN did not associate with patient survival (Royston‐Parmar; SMN1 p = 0.78 and SMN2 p = 0.23) or age at onset (Royston‐Parmar; SMN1 p = 0.75 and SMN2 p = 0.63).
Interpretation
In our well‐powered study, there was no association of SMN1 or SMN2 copy numbers with the risk of ALS or ALS disease severity. This suggests that changing SMN protein levels in the physiological range may not modify ALS disease course. This is an important finding in the light of emerging therapies targeted at SMN deficiencies. ANN NEUROL 2021;89:686–697