With the explosion of microarray studies, an enormous amount of data is being produced. Systematic integration of gene expression data from different sources increases statistical power of detecting ...differentially expressed genes and allows assessment of heterogeneity. The challenge, however, is in designing and implementing efficient analytic methodologies for combination of data generated by different research groups.
We extended traditional effect size models to combine information from different microarray datasets by incorporating a quality measure for each gene in each study into the effect size estimation. We illustrated our method by integrating two datasets generated using different Affymetrix oligonucleotide types. Our results indicate that the proposed quality-adjusted weighting strategy for modelling inter-study variation of gene expression profiles not only increases consistency and decreases heterogeneous results between these two datasets, but also identifies many more differentially expressed genes than methods proposed previously.
Data integration and synthesis is becoming increasingly important. We live in a high-throughput era where technologies constantly change leaving behind a trail of data with different forms, shapes and sizes. Statistical and computational methodologies are therefore critical for extracting the most out of these related but not identical sources of data.
In observational studies, type-2 diabetes (T2D) is associated with an increased risk of coronary heart disease (CHD), yet interventional trials have shown no clear effect of glucose-lowering on CHD. ...Confounding may have therefore influenced these observational estimates. Here we use Mendelian randomization to obtain unconfounded estimates of the influence of T2D and fasting glucose (FG) on CHD risk. Using multiple genetic variants associated with T2D and FG, we find that risk of T2D increases CHD risk (odds ratio (OR)=1.11 (1.05-1.17), per unit increase in odds of T2D, P=8.8 × 10(-5); using data from 34,840/114,981 T2D cases/controls and 63,746/130,681 CHD cases/controls). FG in non-diabetic individuals tends to increase CHD risk (OR=1.15 (1.00-1.32), per mmol·per l, P=0.05; 133,010 non-diabetic individuals and 63,746/130,681 CHD cases/controls). These findings provide evidence supporting a causal relationship between T2D and CHD and suggest that long-term trials may be required to discern the effects of T2D therapies on CHD risk.
Dysembryoplastic neuroepithelial tumor (DNET) is a benign brain tumor associated with intractable drug-resistant epilepsy. In order to identify underlying genetic alterations and molecular ...mechanisms, we examined three family members affected by multinodular DNETs as well as 100 sporadic tumors from 96 patients, which had been referred to us as DNETs. We performed whole-exome sequencing on 46 tumors and targeted sequencing for hotspot
FGFR1
mutations and BRAF p.V600E was used on the remaining samples. FISH, copy number variation assays and Sanger sequencing were used to validate the findings. By whole-exome sequencing of the familial cases, we identified a novel germline
FGFR1
mutation, p.R661P. Somatic activating
FGFR1
mutations (p.N546K or p.K656E) were observed in the tumor samples and further evidence for functional relevance was obtained by in silico modeling. The FGFR1 p.K656E mutation was confirmed to be
in cis
with the germline p.R661P variant. In 43 sporadic cases, in which the diagnosis of DNET could be confirmed on central blinded neuropathology review, FGFR1 alterations were also frequent and mainly comprised intragenic tyrosine kinase
FGFR1
duplication and multiple mutants
in cis
(25/43; 58.1 %) while BRAF p.V600E alterations were absent (0/43). In contrast, in 53 cases, in which the diagnosis of DNET was not confirmed,
FGFR1
alterations were less common (10/53; 19 %;
p
< 0.0001) and hotspot BRAF p.V600E (12/53; 22.6 %) (
p
< 0.001) prevailed. We observed overexpression of phospho-ERK in FGFR1 p.R661P and p.N546K mutant expressing HEK293 cells as well as
FGFR1
mutated tumor samples, supporting enhanced MAP kinase pathway activation under these conditions. In conclusion, constitutional and somatic
FGFR1
alterations and MAP kinase pathway activation are key events in the pathogenesis of DNET. These findings point the way towards existing targeted therapies.
Preventive measures and treatments for psychiatric disorders are limited. Circulating metabolites are potential candidates for biomarker and therapeutic target identification, given their ...measurability and essential roles in biological processes.
Leveraging large-scale genome-wide association studies, we conducted Mendelian randomization analyses to assess the associations between circulating metabolite abundances and the risks of bipolar disorder, schizophrenia, and depression. Genetic instruments were selected for 94 metabolites measured in the Canadian Longitudinal Study on Aging cohort (N = 8299). We repeated Mendelian randomization analyses based on the UK Biobank, INTERVAL, and EPIC (European Prospective Investigation into Cancer)–Norfolk studies.
After validating Mendelian randomization assumptions and colocalization evidence, we found that a 1 SD increase in genetically predicted circulating abundances of eicosapentaenoate and docosapentaenoate was associated with odds ratios of 0.72 (95% CI, 0.65–0.79) and 0.63 (95% CI, 0.55–0.72), respectively, for bipolar disorder. Genetically increased Ω-3 unsaturated fatty acids abundance and Ω-3-to-total fatty acids ratio, as well as genetically decreased Ω-6-to-Ω-3 ratio, were negatively associated with the risk of bipolar disorder in the UK Biobank. Genetically increased circulating abundances of 3 N-acetyl-amino acids were associated with an increased risk of schizophrenia with a maximum odds ratio of 1.31 (95% CI, 1.18–1.44) per 1 SD increase. Furthermore, a 1 SD increase in genetically predicted circulating abundance of hypotaurine was associated with an odds ratio of 0.85 (95% CI, 0.78–0.93) for depression.
The biological mechanisms that underlie Ω-3 unsaturated fatty acids, NAT8-catalyzed N-acetyl-amino acids, and hypotaurine warrant exploration to identify new biomarkers and potential therapeutic targets.
The role of rare genetic variation in the etiology of complex disease remains unclear. However, the development of next-generation sequencing technologies offers the experimental opportunity to ...address this question. Several novel statistical methodologies have been recently proposed to assess the contribution of rare variation to complex disease etiology. Nevertheless, no empirical estimates comparing their relative power are available. We therefore assessed the parameters that influence their statistical power in 1,998 individuals Sanger-sequenced at seven genes by modeling different distributions of effect, proportions of causal variants, and direction of the associations (deleterious, protective, or both) in simulated continuous trait and case/control phenotypes. Our results demonstrate that the power of recently proposed statistical methods depend strongly on the underlying hypotheses concerning the relationship of phenotypes with each of these three factors. No method demonstrates consistently acceptable power despite this large sample size, and the performance of each method depends upon the underlying assumption of the relationship between rare variants and complex traits. Sensitivity analyses are therefore recommended to compare the stability of the results arising from different methods, and promising results should be replicated using the same method in an independent sample. These findings provide guidance in the analysis and interpretation of the role of rare base-pair variation in the etiology of complex traits and diseases.
Few studies have attempted to identify how distinct dimensions of maternal prenatal affective symptoms relate to offspring psychopathology. We defined latent dimensions of women’s prenatal affective ...symptoms and pregnancy-specific worries to examine their association with early offspring psychopathology in three prenatal cohorts.
Data were used from three cohorts of the DREAM-BIG consortium: Avon Longitudinal Study of Parents and Children (ALSPAC N = 12,515), Generation R (N = 6,803), and the Canadian prenatal cohort Maternal Adversity, Vulnerability, and Neurodevelopment (MAVAN N = 578). Maternal prenatal affective symptoms and pregnancy-specific worries were assessed using different measures in each cohort. Through confirmatory factor analyses, we determined whether comparable latent dimensions of prenatal maternal affective symptoms existed across the cohorts. We used structural equation models to examine cohort-specific associations between these dimensions and offspring psychopathology at 4 to 8 years of age (general psychopathology, specific internalizing and externalizing previously derived using confirmatory factor analyses). Cohort-based estimates were meta-analyzed using inverse variance-weighing.
Four prenatal maternal factors were similar in all cohorts: a general affective symptoms factor and three specific factors—an anxiety/depression factor, a somatic factor, and a pregnancy-specific worries factor. In meta-analyses, both the general affective symptoms factor and pregnancy-specific worries factor were independently associated with offspring general psychopathology. The general affective symptoms factor was further associated with offspring specific internalizing problems. There were no associations with specific externalizing problems.
These replicated findings of independent and adverse effects for prenatal general affective symptoms and pregnancy-specific worries on child mental health support the need for specific interventions in pregnancy.
Background
Internalising and externalising problems commonly co‐occur in childhood. Yet, few developmental models describing the structure of child psychopathology appropriately account for this ...comorbidity. We evaluate a model of childhood psychopathology that separates the unique and shared contribution of individual psychological symptoms into specific internalising, externalising and general psychopathology factors and assess how these general and specific factors predict long‐term outcomes concerning criminal behaviour, academic achievement and affective symptoms in three independent cohorts.
Methods
Data were drawn from independent birth cohorts (Avon Longitudinal Study of Parents and Children (ALSPAC), N = 11,612; Generation R, N = 7,946; Maternal Adversity, Vulnerability and Neurodevelopment (MAVAN), N = 408). Child psychopathology was assessed between 4 and 8 years using a range of diagnostic and questionnaire‐based measures, and multiple informants. First, structural equation models were used to assess the fit of hypothesised models of shared and unique components of psychopathology in all cohorts. Once the model was chosen, linear/logistic regressions were used to investigate whether these factors were associated with important outcomes such as criminal behaviour, academic achievement and well‐being from late adolescence/early adulthood.
Results
The model that included specific factors for internalising/externalising and a general psychopathology factor capturing variance shared between symptoms regardless of their classification fits well for all of the cohorts. As hypothesised, general psychopathology factor scores were predictive of all outcomes of later functioning, while specific internalising factor scores predicted later internalising outcomes. Specific externalising factor scores, capturing variance not shared by any other psychological symptoms, were not predictive of later outcomes.
Conclusions
Early symptoms of psychopathology carry information that is syndrome‐specific as well as indicative of general vulnerability and the informant reporting on the child. The ‘general psychopathology factor' might be more relevant for long‐term outcomes than specific symptoms. These findings emphasise the importance of considering the co‐occurrence of common internalising and externalising problems in childhood when considering long‐term impact.
Several founder mutations leading to increased risk of cancer among Ashkenazi Jewish individuals have been identified, and some estimates of the age of the mutations have been published. A variety of ...different methods have been used previously to estimate the age of the mutations. Here three datasets containing genotype information near known founder mutations are reanalyzed in order to compare three approaches for estimating the age of a mutation. The methods are: (a) the single marker method used by Risch et al., (1995); (b) the intra-allelic coalescent model known as DMLE, and (c) the Goldgar method proposed in Neuhausen et al. (1996), and modified slightly by our group. The three mutations analyzed were MSH2*1906 G->C, APC*I1307K, and BRCA2*6174delT.
All methods depend on accurate estimates of inter-marker recombination rates. The modified Goldgar method allows for marker mutation as well as recombination, but requires prior estimates of the possible haplotypes carrying the mutation for each individual. It does not incorporate population growth rates. The DMLE method simultaneously estimates the haplotypes with the mutation age, and builds in the population growth rate. The single marker estimates, however, are more sensitive to the recombination rates and are unstable. Mutation age estimates based on DMLE are 16.8 generations for MSH2 (95% credible interval (13, 23)), 106 generations for I1037K (86-129), and 90 generations for 6174delT (71-114).
For recent founder mutations where marker mutations are unlikely to have occurred, both DMLE and the Goldgar method can give good results. Caution is necessary for older mutations, especially if the effective population size may have remained small for a long period of time.
Background
Polygenic risk scores (PRSs) operationalize genetic propensity toward a particular mental disorder and hold promise as early predictors of psychopathology, but before a PRS can be used ...clinically, explanatory power must be increased and the specificity for a psychiatric domain established. To enable early detection, it is crucial to study these psychometric properties in childhood. We examined whether PRSs associate more with general or with specific psychopathology in school‐aged children. Additionally, we tested whether psychiatric PRSs can be combined into a multi‐PRS score for improved performance.
Methods
We computed 16 PRSs based on GWASs of psychiatric phenotypes, but also neuroticism and cognitive ability, in mostly adult populations. Study participants were 9,247 school‐aged children from three population‐based cohorts of the DREAM‐BIG consortium: ALSPAC (UK), The Generation R Study (Netherlands), and MAVAN (Canada). We associated each PRS with general and specific psychopathology factors, derived from a bifactor model based on self‐report and parental, teacher, and observer reports. After fitting each PRS in separate models, we also tested a multi‐PRS model, in which all PRSs are entered simultaneously as predictors of the general psychopathology factor.
Results
Seven PRSs were associated with the general psychopathology factor after multiple testing adjustment, two with specific externalizing and five with specific internalizing psychopathology. PRSs predicted general psychopathology independently of each other, with the exception of depression and depressive symptom PRSs. Most PRSs associated with a specific psychopathology domain, were also associated with general child psychopathology.
Conclusions
The results suggest that PRSs based on current GWASs of psychiatric phenotypes tend to be associated with general psychopathology, or both general and specific psychiatric domains, but not with one specific psychopathology domain only. Furthermore, PRSs can be combined to improve predictive ability. PRS users should therefore be conscious of nonspecificity and consider using multiple PRSs simultaneously, when predicting psychiatric disorders.
ABSTRACT
Although a standard genome‐wide significance level has been accepted for the testing of association between common genetic variants and disease, the era of whole‐genome sequencing (WGS) ...requires a new threshold. The allele frequency spectrum of sequence‐identified variants is very different from common variants, and the identified rare genetic variation is usually jointly analyzed in a series of genomic windows or regions. In nearby or overlapping windows, these test statistics will be correlated, and the degree of correlation is likely to depend on the choice of window size, overlap, and the test statistic. Furthermore, multiple analyses may be performed using different windows or test statistics. Here we propose an empirical approach for estimating genome‐wide significance thresholds for data arising from WGS studies, and we demonstrate that the empirical threshold can be efficiently estimated by extrapolating from calculations performed on a small genomic region. Because analysis of WGS may need to be repeated with different choices of test statistics or windows, this prediction approach makes it computationally feasible to estimate genome‐wide significance thresholds for different analysis choices. Based on UK10K whole‐genome sequence data, we derive genome‐wide significance thresholds ranging between 2.5 × 10−8 and 8 × 10−8 for our analytic choices in window‐based testing, and thresholds of 0.6 × 10−8–1.5 × 10−8 for a combined analytic strategy of testing common variants using single‐SNP tests together with rare variants analyzed with our sliding‐window test strategy.