Random forests (RF) have been increasingly used in applications such as genome-wide association and microarray studies where predictor correlation is frequently observed. Recent works on ...permutation-based variable importance measures (VIMs) used in RF have come to apparently contradictory conclusions. We present an extended simulation study to synthesize results.
In the case when both predictor correlation was present and predictors were associated with the outcome (HA), the unconditional RF VIM attributed a higher share of importance to correlated predictors, while under the null hypothesis that no predictors are associated with the outcome (H0) the unconditional RF VIM was unbiased. Conditional VIMs showed a decrease in VIM values for correlated predictors versus the unconditional VIMs under HA and was unbiased under H0. Scaled VIMs were clearly biased under HA and H0.
Unconditional unscaled VIMs are a computationally tractable choice for large datasets and are unbiased under the null hypothesis. Whether the observed increased VIMs for correlated predictors may be considered a "bias" - because they do not directly reflect the coefficients in the generating model - or if it is a beneficial attribute of these VIMs is dependent on the application. For example, in genetic association studies, where correlation between markers may help to localize the functionally relevant variant, the increased importance of correlated predictors may be an advantage. On the other hand, we show examples where this increased importance may result in spurious signals.
A recent study examined the stability of rankings from random forests using two variable importance measures (mean decrease accuracy (MDA) and mean decrease Gini (MDG)) and concluded that rankings ...based on the MDG were more robust than MDA. However, studies examining data-specific characteristics on ranking stability have been few. Rankings based on the MDG measure showed sensitivity to within-predictor correlation and differences in category frequencies, even when the number of categories was held constant, and thus may produce spurious results. The MDA measure was robust to these data characteristics. Further, under strong within-predictor correlation, MDG rankings were less stable than those using MDA.
Anxiety disorders are common, complex psychiatric disorders with twin heritabilities of 30-60%. We conducted a genome-wide association study of Lifetime Anxiety Disorder (n
= 25 453, n
= 58 113) ...and an additional analysis of Current Anxiety Symptoms (n
= 19 012, n
= 58 113). The liability scale common variant heritability estimate for Lifetime Anxiety Disorder was 26%, and for Current Anxiety Symptoms was 31%. Five novel genome-wide significant loci were identified including an intergenic region on chromosome 9 that has previously been associated with neuroticism, and a locus overlapping the BDNF receptor gene, NTRK2. Anxiety showed significant positive genetic correlations with depression and insomnia as well as coronary artery disease, mirroring findings from epidemiological studies. We conclude that common genetic variation accounts for a substantive proportion of the genetic architecture underlying anxiety.
Bipolar, schizophrenia, and schizoaffective disorders are common, highly heritable psychiatric disorders, for which familial coaggregation, as well as epidemiological and genetic evidence, suggests ...overlapping etiologies. No definitive susceptibility genes have yet been identified for any of these disorders. Genetic heterogeneity, combined with phenotypic imprecision and poor marker coverage, has contributed to the difficulty in defining risk variants. We focused on families of Ashkenazi Jewish descent, to reduce genetic heterogeneity, and, as a precursor to genomewide association studies, we undertook a single-nucleotide polymorphism (SNP) genotyping screen of 64 candidate genes (440 SNPs) chosen on the basis of previous linkage or of association and/or biological relevance. We genotyped an average of 6.9 SNPs per gene, with an average density of 1 SNP per 11.9 kb in 323 bipolar I disorder and 274 schizophrenia or schizoaffective Ashkenazi case-parent trios. Using single-SNP and haplotype-based transmission/disequilibrium tests, we ranked genes on the basis of strength of association (
P<.01). Six genes (
DAO, GRM3, GRM4, GRIN2B, IL2RB, and
TUBA8) met this criterion for bipolar I disorder; only
DAO has been previously associated with bipolar disorder. Six genes (
RGS4, SCA1, GRM4, DPYSL2, NOS1, and
GRID1) met this criterion for schizophrenia or schizoaffective disorder; five replicate previous associations, and one,
GRID1, shows a novel association with schizophrenia. In addition, six genes (
DPYSL2, DTNBP1, G30/G72, GRID1, GRM4, and
NOS1) showed overlapping suggestive evidence of association in both disorders. These results may help to prioritize candidate genes for future study from among the many suspected/proposed for schizophrenia and bipolar disorders. They provide further support for shared genetic susceptibility between these two disorders that involve glutamate-signaling pathways.
Abnormalities in social interaction are a common feature of several psychiatric disorders, aligning with the recent move towards using Research Domain Criteria (RDoC) to describe disorders in terms ...of observable behaviours rather than using specific diagnoses. Neuroeconomic games are an effective measure of social decision-making that can be adapted for use in neuroimaging, allowing investigation of the biological basis for behaviour. This review summarises findings of neuroeconomic gameplay studies in Axis 1 psychiatric disorders and advocates the use of these games as measures of the RDoC Affiliation and Attachment, Reward Responsiveness, Reward Learning and Reward Valuation constructs. Although research on neuroeconomic gameplay is in its infancy, consistencies have been observed across disorders, particularly in terms of impaired integration of social and cognitive information, avoidance of negative social interactions and reduced reward sensitivity, as well as a reduction in activity in brain regions associated with processing and responding to social information.
To examine whether postmenopausal women with diabetes experienced a higher incidence of hip fracture than women without diabetes.
A prospective cohort of 32,089 postmenopausal women residing in Iowa ...were surveyed by mail in 1986 and followed for 11 years. Diabetes status and other potential risk factors were assessed by questionnaires at baseline; incidence of hip fracture was ascertained by follow-up questionnaires.
A total of 490 hip fractures were reported over 306,900 person-years of follow-up. After adjustment for age, smoking status, estrogen use, BMI, and waist-to-hip ratio, women with type 1 diabetes (n = 47) were 12.25 times (95% CI 5.05-29.73) more likely to report an incident hip fracture than women without diabetes. Women with type 2 diabetes had a 1.70-fold higher risk (1.21-2.38) of incident hip fracture than women without diabetes. Longer duration of type 2 diabetes was associated with higher incidence, as was use of insulin or oral diabetes medications in women with type 2 diabetes. Furthermore, women who were initially free of diabetes but in whom diabetes developed had a relative risk of hip fracture of 1.60 (1.14-2.25) compared with women who never had diabetes.
Postmenopausal women who have diabetes or in whom diabetes develops are at higher risk for hip fracture than nondiabetic postmenopausal women. Strategies to prevent osteoporosis and/or falling may be especially warranted in women with diabetes.
Motivation: The advent of high-throughput genomics has produced studies with large numbers of predictors (e.g. genome-wide association, microarray studies). Machine learning algorithms (MLAs) are a ...computationally efficient way to identify phenotype-associated variables in high-dimensional data. There are important results from mathematical theory and numerous practical results documenting their value. One attractive feature of MLAs is that many operate in a fully multivariate environment, allowing for small-importance variables to be included when they act cooperatively. However, certain properties of MLAs under conditions common in genomic-related data have not been well-studied—in particular, correlations among predictors pose a problem. Results: Using extensive simulation, we showed considering correlation within predictors is crucial in making valid inferences using variable importance measures (VIMs) from three MLAs: random forest (RF), conditional inference forest (CIF) and Monte Carlo logic regression (MCLR). Using a case–control illustration, we showed that the RF VIMs—even permutation-based—were less able to detect association than other algorithms at effect sizes encountered in complex disease studies. This reduction occurred when ‘causal’ predictors were correlated with other predictors, and was sharpest when RF tree building used the Gini index. Indeed, RF Gini VIMs are biased under correlation, dependent on predictor correlation strength/number and over-trained to random fluctuations in data when tree terminal node size was small. Permutation-based VIM distributions were less variable for correlated predictors and are unbiased, thus may be preferred when predictors are correlated. MLAs are a powerful tool for high-dimensional data analysis, but well-considered use of algorithms is necessary to draw valid conclusions. Contact: kristin.nicodemus@well.ox.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
Organized neuronal firing is crucial for cortical processing and is disrupted in schizophrenia. Using rapid amplification of 5' complementary DNA ends in human brain, we identified a primate-specific ...isoform (3.1) of the ether-a-go-go-related K(+) channel KCNH2 that modulates neuronal firing. KCNH2-3.1 messenger RNA levels are comparable to full-length KCNH2 (1A) levels in brain but three orders of magnitude lower in heart. In hippocampus from individuals with schizophrenia, KCNH2-3.1 expression is 2.5-fold greater than KCNH2-1A expression. A meta-analysis of five clinical data sets (367 families, 1,158 unrelated cases and 1,704 controls) shows association of single nucleotide polymorphisms in KCNH2 with schizophrenia. Risk-associated alleles predict lower intelligence quotient scores and speed of cognitive processing, altered memory-linked functional magnetic resonance imaging signals and increased KCNH2-3.1 mRNA levels in postmortem hippocampus. KCNH2-3.1 lacks a domain that is crucial for slow channel deactivation. Overexpression of KCNH2-3.1 in primary cortical neurons induces a rapidly deactivating K(+) current and a high-frequency, nonadapting firing pattern. These results identify a previously undescribed KCNH2 channel isoform involved in cortical physiology, cognition and psychosis, providing a potential new therapeutic drug target.
Risk for complex disease is thought to be controlled by multiple genetic risk factors, each with small individual effects. Meta-analyses of several independent studies may be helpful to increase the ...ability to detect association when effect sizes are modest. Although many software options are available for meta-analysis of genetic case-control data, no currently available software implements the method described by Kazeem and Farrall (2005), which combines data from independent family-based and case-control studies.
I introduce the package catmap for the R statistical computing environment that implements fixed- and random-effects pooled estimates for case-control and transmission disequilibrium methods, allowing for the use of genetic association data across study types. In addition, catmap may be used to create forest and funnel plots and to perform sensitivity analysis and cumulative meta-analysis. catmap is available from the Comprehensive R Archive Network http://www.r-project.org.
catmap allows researchers to synthesize data to assess evidence for association in studies of genetic polymorphisms, facilitating the use of pooled data analyses which may increase power to detect moderate genetic associations.
Category fluency is a widely used task that relies on multiple neurocognitive processes and is a sensitive assay of cortical dysfunction, including in schizophrenia. The test requires naming of as ...many words belonging to a certain category (e.g., animals) as possible within a short period of time. The core metrics are the overall number of words produced and the number of errors, namely non-members generated for a target category. We combine a computational linguistic approach with a candidate gene approach to examine the genetic architecture of this traditional fluency measure.
In addition to the standard metric of overall word count, we applied a computational approach to semantics, Latent Semantic Analysis (LSA), to analyse the clustering pattern of the categories generated, as it likely reflects the search in memory for meanings. Also, since fluency performance probably also recruits verbal learning and recall processes, we included two standard measures of this cognitive process: the Wechsler Memory Scale and California Verbal Learning Test (CVLT). To explore the genetic architecture of traditional and LSA-derived fluency measures we employed a candidate gene approach focused on SNPs with known function that were available from a recent genome-wide association study (GWAS) of schizophrenia. The selected candidate genes were associated with language and speech, verbal learning and recall processes, and processing speed. A total of 39 coding SNPs were included for analysis in 665 subjects.
Given the modest sample size, the results should be regarded as exploratory and preliminary. Nevertheless, the data clearly illustrate how extracting the meaning from participants' responses, by analysing the actual content of words, generates useful and neurocognitively viable metrics. We discuss three replicated SNPs in the genes ZNF804A, DISC1 and KIAA0319, as well as the potential for computational analyses of linguistic and textual data in other genomics tasks.