Some rare genetic disorders, such as retinitis pigmentosa or Alport syndrome, are caused by the co-inheritance of DNA variants at two different genetic loci (digenic inheritance). To capture the ...effects of these disease-causing variants and their possible interactive effects, various statistical methods have been developed in human genetics. Analogous developments have taken place in the field of machine learning, particularly for the field that is now called Big Data. In the past, these two areas have grown independently and have started to converge only in recent years. We discuss an overview of each of the two fields, paying special attention to machine learning methods for uncovering the combined effects of pairs of variants on human disease.
While many genetic traits follow a dominant or recessive Mendelian mode of inheritance, non-Mendelian disease transmission may occur in the form of digenic inheritance (two mutant variants at different locations are required to confer disease) or as modifier genes affecting the expression of another gene.Machine learning methods are increasingly employed in the search for pairs of variants underlying digenic traits.Highly promising approaches are based on association rules, which originated in the analysis of consumer transaction patterns some 30 years ago and have blossomed into highly sophisticated computer-based methods. As applications of these methods are becoming more widespread, digenic disease transmission may well appear to be more common than Mendelian inheritance.
For many years, linkage analysis was the primary tool used for the genetic mapping of Mendelian and complex traits with familial aggregation. Linkage analysis was largely supplanted by the wide ...adoption of genome-wide association studies (GWASs). However, with the recent increased use of whole-genome sequencing (WGS), linkage analysis is again emerging as an important and powerful analysis method for the identification of genes involved in disease aetiology, often in conjunction with WGS filtering approaches. Here, we review the principles of linkage analysis and provide practical guidelines for carrying out linkage studies using WGS data.
We present selected topics of population genetics and molecular phylogeny. As several excellent review articles have been published and generally focus on European and American scientists, here, we ...emphasize contributions by Japanese researchers. Our review may also be seen as a belated 50-year celebration of Motoo Kimura’s early seminal paper on the molecular clock, published in 1968.
To locate disease-causing DNA variants on the human gene map, the customary approach has been to carry out a genome-wide association study for one variant after another by testing for genotype ...frequency differences between individuals affected and unaffected with disease. So-called digenic traits are due to the combined effects of two variants, often on different chromosomes, while individual variants may have little or no effect on disease. Machine learning approaches have been developed to find variant pairs underlying digenic traits. However, many of these methods have large memory requirements so that only small datasets can be analyzed. The increasing availability of desktop computers with large numbers of processors and suitable programming to distribute the workload evenly over all processors in a machine make a new and relatively straightforward approach possible, that is, to evaluate all existing variant and genotype pairs for disease association. We present a prototype of such a method with two components,
Vpairs
and
Gpairs
, and demonstrate its advantages over existing implementations of such well-known algorithms as
Apriori
and
FP-growth
. We apply these methods to published case-control datasets on age-related macular degeneration and Parkinson disease and construct an ROC curve for a large set of genotype patterns.
•The kappa-opioid receptor (KOR)/dynorphin system is regulated by stress and heavy exposure to drugs including opioid and cocaine.•Activation of the KOR system can result in anhedonia and dysphoria, ...and in relapse-like behaviors.•Regions of intron 2 of the gene encoding KOR (OPRK1) have potential regulatory functions and respond to activation by glucocorticoids.•In this study, polymorphisms in intron 2 of OPRK1 were associated with increased vulnerability to cocaine dependence diagnoses, in a sample of persons of African-American ethnicity.
The dynorphin/kappa-opioid receptor (KOR) system (encoded by PDYN and OPRK1 genes respectively) is highly regulated by repeated exposure to drugs of abuse, including mu-opioid agonists and cocaine. These changes in the dynorphin/KOR system can then influence the rewarding effects of these drugs of abuse. Activation of the dynorphin/KOR system is also thought to have a role in the pro-addictive effects of stress. Recent in vitro assays showed that the OPRK1 intron 2 may function as a genomic enhancer in the regulation KOR expression, and contains a glucocorticiod-responsive sequence site. We hypothesize that SNPs in intron 2 of OPRK1 are associated with categorical opioid or cocaine dependence diagnoses, as well as with dimensional aspects of drug use (i.e., magnitude of drug exposure).
This study includes 577 subjects ≥ 18 years old, with African ancestry (AA) from the USA. They were divided into three groups: 152 control subjects, 142 persons with lifetime opioid dependence diagnosis (OD), and 283 subjects with lifetime cocaine dependence diagnosis (CD). Five SNPs (rs16918909, rs7016778, rs997917, rs6473797, rs10111937) that span 10 Kb nucleotides in intron 2 of OPRK1 were used for the association analyses. Genotyping was performed with the Smokescreen® array or sequencing of PCR-amplified DNA fragments. Association analyses for OD and CD diagnoses and the OPRK1 intron 2 alleles were carried out with Fisher’s exact test. The Kreek-McHugh-Schluger-Kellogg (KMSK) scales were used for dimensional measure of maximum exposure to specific drugs, using Mann-Whitney tests.
Two SNPs, rs997917 and rs10111937 showed point-wise significant allelic association (p < 0.05) with CD diagnosis, and rs10111937 showed a point-wise significance in association with OD. None of these single SNP associations with categorical diagnoses were significant after correction for multiple testing (pcorr > 0.05). However, significant associations of several genotype patterns (diplotypes) were found with cocaine dependence, but none for opioid dependence. The most significant genotype pattern with cocaine dependence diagnosis occurred for rs6473797 and rs10111937 (pcorr = 0.036, odds ratio = 1.92, FDR < 0.05), and survived correction for multiple testing. Dimensional analyses with KMSK scores show that persons with either rs997917 or rs10111937 variants had greater exposure to cocaine, compared to those with prototype allele (Mann-Whitney tests, point-wise).
This study provides additional support of potential importance of regulatory regions of intron 2 of the OPRK1 gene in development of cocaine and opioid dependence diagnoses, in a population with African-American ancestry.
The melanocortin receptors are G-protein-coupled receptors, which are essential components of the hypothalamic–pituitary–adrenal axis, and they mediate the actions of melanocortins ...(melanocyte-stimulating hormones: α-MSH, β-MSH, and γ-MSH) as well as the adrenocorticotropin hormone (ACTH) in skin pigmentation, adrenal steroidogenesis, and stress response. Three melanocortin receptor genes (MC1R, MC2R, and MC5R) contribute to the risk of major depressive disorder (MDD), and one melanocortin receptor gene (MC4R) contributes to the risk of type 2 diabetes (T2D). MDD increases T2D risk in drug-naïve patients; thus, MDD and T2D commonly coexist. The five melanocortin receptor genes might confer risk for both disorders. However, they have never been investigated jointly to evaluate their potential contributing roles in the MDD-T2D comorbidity, specifically within families. In 212 Italian families with T2D and MDD, we tested 11 single nucleotide polymorphisms (SNPs) in the MC1R gene, 9 SNPs in MC2R, 3 SNPs in MC3R, 4 SNPs in MC4R, and 2 SNPs in MC5R. The testing used 2-point parametric linkage and linkage disequilibrium (LD) (i.e., association) analysis with four models (dominant with complete penetrance (D1), dominant with incomplete penetrance (D2), recessive with complete penetrance (R1), and recessive with incomplete penetrance (R2)). We detected significant (p ≤ 0.05) linkage and/or LD (i.e., association) to/with MDD for one SNP in MC2R (rs111734014) and one SNP in MC5R (rs2236700), and to/with T2D for three SNPs in MC1R (rs1805007 and rs201192930, and rs2228479), one SNP in MC2R (rs104894660), two SNPs in MC3R (rs3746619 and rs3827103), and one SNP in MC4R genes (Chr18-60372302). The linkage/LD/association was significant across different linkage patterns and different modes of inheritance. All reported variants are novel in MDD and T2D. This is the first study to report risk variants in MC1R, MC2R, and MC3R genes in T2D. MC2R and MC5R genes are replicated in MDD, with one novel variant each. Within our dataset, only the MC2R gene appears to confer risk for both MDD and T2D, albeit with different risk variants. To further clarity the role of the melanocortin receptor genes in MDD-T2D, these findings should be sought among other ethnicities as well.
There is a reciprocal relationship between the circadian and the reward systems. Polymorphisms in several circadian rhythm-related (clock) genes were associated with drug addiction. This study aims ...to search for associations between 895 variants in 39 circadian rhythm-related genes and opioid addiction (OUD). Genotyping was performed with the Smokescreen® array. Ancestry was verified by principal/MDS component analysis and the sample was limited to European Americans (EA) (OUD; n = 435, controls; n = 138). Nominally significant associations (p < 0.01) were detected for several variants in genes encoding vasoactive intestinal peptide receptor 2 (VIPR2), period circadian regulator 2 (PER2), casein kinase 1 epsilon (CSNK1E), and activator of transcription and developmental regulator (AUTS2), but no signal survived correction for multiple testing. There was intriguing association signal for the untranslated region (3' UTR) variant rs885863 in VIPR2, (p = .0065; OR = 0.51; 95% CI 0.31-0.51). The result was corroborated in an independent EA OUD sample (n = 398, p = 0.0036; for the combined samples). Notably, this SNP is an expression quantitative trait locus (cis-eQTL) for VIPR2 and a long intergenic non-coding RNA, lincRNA 689, in a tissue-specific manner, based on the Genotype-Tissue Expression (GTEx) project. Vasoactive intestinal peptide (VIP) is an important peptide of light-activated suprachiasmatic nucleus cells. It regulates diverse physiological processes including circadian rhythms, learning and memory, and stress response. This is the first report of an association of a VIPR2 variant and OUD. Additionally, analysis of combinations of single nucleotide polymorphisms (SNPs) genotypes revealed an association of PER2 SNP rs80136044, and SNP rs4128839, located 41.6 kb downstream of neuropeptide Y receptor type 1 gene, NPY1R (p = 3.4 × 10-6, OR = 11.4, 95% CI 2.7-48.2). The study provides preliminary insight into the relationship between genetic variants in circadian rhythm genes and long non-coding RNA (lncRNAs) in their vicinity, and opioid addiction.
Statistical analysis methods for gene mapping originated in counting recombinant and non-recombinant offspring, but have now progressed to sophisticated approaches for the mapping of complex trait ...genes. Here, we outline new statistical methods that capture the simultaneous effects of multiple gene loci and thereby achieve a more global view of gene action and interaction than is possible by traditional gene-by-gene analysis. We aim to show that the work of statisticians goes far beyond the running of computer programs.
Identifying gene-gene interaction is a hot topic in genome wide association studies. Two fundamental challenges are: (1) how to smartly identify combinations of variants that may be associated with ...the trait from astronomical number of all possible combinations; and (2) how to test epistatic interaction when all potential combinations are available. We developed AprioriGWAS, which brings two innovations. (1) Based on Apriori, a successful method in field of Frequent Itemset Mining (FIM) in which a pattern growth strategy is leveraged to effectively and accurately reduce search space, AprioriGWAS can efficiently identify genetically associated genotype patterns. (2) To test the hypotheses of epistasis, we adopt a new conditional permutation procedure to obtain reliable statistical inference of Pearson's chi-square test for the Formula: see text contingency table generated by associated variants. By applying AprioriGWAS to age-related macular degeneration (AMD) data, we found that: (1) angiopoietin 1 (ANGPT1) and four retinal genes interact with Complement Factor H (CFH). (2) GO term "glycosaminoglycan biosynthetic process" was enriched in AMD interacting genes. The epistatic interactions newly found by AprioriGWAS on AMD data are likely true interactions, since genes interacting with CFH are retinal genes, and GO term enrichment also verified that interaction between glycosaminoglycans (GAGs) and CFH plays an important role in disease pathology of AMD. By applying AprioriGWAS on Bipolar disorder in WTCCC data, we found variants without marginal effect show significant interactions. For example, multiple-SNP genotype patterns inside gene GABRB2 and GRIA1 (AMPA subunit 1 receptor gene). AMPARs are found in many parts of the brain and are the most commonly found receptor in the nervous system. The GABRB2 mediates the fastest inhibitory synaptic transmission in the central nervous system. GRIA1 and GABRB2 are relevant to mental disorders supported by multiple evidences.