With the advent of high throughput genetic data, there have been attempts to estimate heritability from genome-wide SNP data on a cohort of distantly related individuals using linear mixed model ...(LMM). Fitting such an LMM in a large scale cohort study, however, is tremendously challenging due to its high dimensional linear algebraic operations. In this paper, we propose a new method named PredLMM approximating the aforementioned LMM motivated by the concepts of genetic coalescence and Gaussian predictive process. PredLMM has substantially better computational complexity than most of the existing LMM based methods and thus, provides a fast alternative for estimating heritability in large scale cohort studies. Theoretically, we show that under a model of genetic coalescence, the limiting form of our approximation is the celebrated predictive process approximation of large Gaussian process likelihoods that has well-established accuracy standards. We illustrate our approach with extensive simulation studies and use it to estimate the heritability of multiple quantitative traits from the UK Biobank cohort.
We propose a resampling-based fast variable selection technique for detecting relevant single nucleotide polymorphisms (SNP) in a multi-marker mixed effect model. Due to computational complexity, ...current practice primarily involves testing the effect of one SNP at a time, commonly termed as 'single SNP association analysis'. Joint modeling of genetic variants within a gene or pathway may have better power to detect associated genetic variants, especially the ones with weak effects. In this paper, we propose a computationally efficient model selection approach-based on the e-values framework-for single SNP detection in families while utilizing information on multiple SNPs simultaneously. To overcome computational bottleneck of traditional model selection methods, our method trains one single model, and utilizes a fast and scalable bootstrap procedure. We illustrate through numerical studies that our proposed method is more effective in detecting SNPs associated with a trait than either single-marker analysis using family data or model selection methods that ignore the familial dependency structure. Further, we perform gene-level analysis in Minnesota Center for Twin and Family Research (MCTFR) dataset using our method to detect several SNPs using this that have been implicated to be associated with alcohol consumption.
In the past decade, many genome-wide association studies (GWASs) have been conducted to explore association of single nucleotide polymorphisms (SNPs) with complex diseases using a case-control ...design. These GWASs not only collect information on the disease status (primary phenotype, D) and the SNPs (genotypes, X), but also collect extensive data on several risk factors and traits. Recent literature and grant proposals point toward a trend in reusing existing large case-control data for exploring genetic associations of some additional traits (secondary phenotypes, Y) collected during the study. These secondary phenotypes may be correlated, and a proper analysis warrants a multivariate approach. Commonly used multivariate methods are not equipped to properly account for the non-random sampling scheme. Current ad hoc practices include analyses without any adjustment, and analyses with D adjusted as a covariate. Our theoretical and empirical studies suggest that the type I error for testing genetic association of secondary traits can be substantial when X as well as Y are associated with D, even when there is no association between X and Y in the underlying (target) population. Whether using D as a covariate helps maintain type I error depends heavily on the disease mechanism and the underlying causal structure (which is often unknown). To avoid grossly incorrect inference, we have proposed proportional odds model adjusted for propensity score (POM-PS). It uses a proportional odds logistic regression of X on Y and adjusts estimated conditional probability of being diseased as a covariate. We demonstrate the validity and advantage of POM-PS, and compare to some existing methods in extensive simulation experiments mimicking plausible scenarios of dependency among Y, X, and D. Finally, we use POM-PS to jointly analyze four adiposity traits using a type 2 diabetes (T2D) case-control sample from the population-based Metabolic Syndrome in Men (METSIM) study. Only POM-PS analysis of the T2D case-control sample seems to provide valid association signals.
Abstract Background Greater public awareness of venous thromboembolism may be an important next step for optimizing venous thromboembolism prevention and treatment. “Lifetime risk” is an easily ...interpretable way of presenting risk information. Therefore, we sought to calculate the lifetime risk of venous thromboembolism (deep vein thrombosis or pulmonary embolism) using data from 2 large, prospective cohort studies: the Cardiovascular Health Study (CHS) and the Atherosclerosis Risk in Communities (ARIC) study. Methods We followed participants aged 45-64 years in ARIC (n = 14,185) and ≥65 in CHS (n = 5414) at baseline visits (1987-1989 in ARIC, 1989-1990 and 1992-1993 in CHS) for incident venous thromboembolism (n = 728 in ARIC through 2011 and n = 172 in CHS through 2001). We estimated lifetime risks and 95% confidence intervals of incident venous thromboembolism using a modified Kaplan-Meier method, accounting for the competing risk of death from other causes. Results At age 45 years, the remaining lifetime risk of venous thromboembolism in ARIC was 8.1% (95% confidence interval, 7.1-8.7). High-risk groups were African Americans (11.5% lifetime risk), those with obesity (10.9%), heterozygous for the factor V Leiden (17.1%), or with sickle cell trait or disease (18.2%). Lifetime risk estimates differed by cohort; these differences were explained by differences in time period of venous thromboembolism ascertainment. Conclusions At least 1 in 12 middle-aged adults will develop venous thromboembolism in their remaining lifetime. This estimate of lifetime risk may be useful to promote awareness of venous thromboembolism and guide decisions at both clinical and policy levels.
Purpose
The impact of an increased body mass index (BMI) on outcomes of neoadjuvant chemotherapy (NACT) in breast cancer remains controversial. The purpose of this study was to analyze the impact of ...BMI on pathological complete response (pCR) rates for operable breast cancer after NACT.
Methods
We searched Medline, Embase, and Web of Science database for observational studies and randomized controlled trials that reported the association of BMI with pCR after NACT. We performed a meta-analysis to assess the impact of BMI on pCR rate.
Results
We identified 13 studies including a total of 18,702 women with operable breast cancer who underwent NACT. Two studies were pooled analyses of prospective clinical trials (10,669 patients); the rest were case–control studies (8033 patients). All studies provided data of two BMI groups (BMI < 25 vs. BMI ≥ 25). Pooled analyses demonstrated that overweight/obese women were less likely to achieve pCR after NACT as compared to under-/normal weight women (odds ratio (OR) = 0.80; 95% confidence interval (CI): 0.68–0.93). Eleven studies provided data of three BMI groups (BMI < 25, 25 ≤ BMI < 30, BMI ≥ 30). Based on pooled analyses, both overweight and obese groups were less likely to achieve pCR with NACT as compared to under-/normal weight group, (OR = 0.77, 95% CI 0.65–0.93 and OR = 0.68, 95% CI 0.61–0.77, respectively).
Conclusions
Overweight and obese breast cancer patients had a lower pCR rate with NACT compared to patients with under-/normal weight. Further prospective studies may help confirm this finding and investigate possible mechanisms.
Genome-wide association studies (GWASs) are a popular tool for detecting association between genetic variants or single nucleotide polymorphisms (SNPs) and complex traits. Family data introduce ...complexity due to the non-independence of the family members. Methods for non-independent data are well established, but when the GWAS contains distinct family types, explicit modeling of between-family-type differences in the dependence structure comes at the cost of significantly increased computational burden. The situation is exacerbated with binary traits. In this paper, we perform several simulation studies to compare multiple candidate methods to perform single SNP association analysis with binary traits. We consider generalized estimating equations (GEE), generalized linear mixed models (GLMMs), or generalized least square (GLS) approaches. We study the influence of different working correlation structures for GEE on the GWAS findings and also the performance of different analysis method(s) to conduct a GWAS with binary trait data in families. We discuss the merits of each approach with attention to their applicability in a GWAS. We also compare the performances of the methods on the alcoholism data from the Minnesota Center for Twin and Family Research (MCTFR) study.
We report results from a genome wide association study (GWAS) of five quantitative indicators of behavioral disinhibition: nicotine, alcohol consumption, alcohol dependence, illicit drugs, and ...non-substance related behavioral disinhibition. The sample, consisting of 7,188 Caucasian individuals clustered in 2,300 nuclear families, was genotyped on over 520,000 SNP markers from Illumina’s Human 660W-Quad Array. Analysis of individual SNP associations revealed only one marker-component phenotype association, between rs1868152 and illicit drugs, with a
p
value below the standard genome-wide threshold of 5 × 10
−8
. Because we had analyzed five separate phenotypes, we do not consider this single association to be significant. However, we report 13 SNPs that were associated at
p
< 10
−5
for one phenotype and
p
< 10
−3
for at least two other phenotypes, which are potential candidates for future investigations of variants associated with general behavioral disinhibition. Biometric analysis of the twin and family data yielded estimates of additive heritability for the component phenotypes ranging from 49 to 70 %, GCTA estimates of heritability for the same phenotypes ranged from 8 to 37 %. Consequently, even though the common variants genotyped on the GWAS array appear in aggregate to account for a sizable proportion of heritable effects in multiple indicators of behavioral disinhibition, our data suggest that most of the additive heritability remains “missing”.
We carried out a genome-wide association study (GWAS) for general cognitive ability (GCA) plus three other analyses of GWAS data that aggregate the effects of multiple single-nucleotide polymorphisms ...(SNPs) in various ways. Our multigenerational sample comprised 7,100 Caucasian participants, drawn from two longitudinal family studies, who had been assessed with an age-appropriate IQ test and had provided DNA samples passing quality screens. We conducted the GWAS across ∼ 2.5 million SNPs (both typed and imputed), using a generalized least-squares method appropriate for the different family structures present in our sample, and subsequently conducted gene-based association tests. We also conducted polygenic prediction analyses under five-fold cross-validation, using two different schemes of weighting SNPs. Using parametric bootstrapping, we assessed the performance of this prediction procedure under the null. Finally, we estimated the proportion of variance attributable to all genotyped SNPs as random effects with software GCTA. The study is limited chiefly by its power to detect realistic single-SNP or single-gene effects, none of which reached genome-wide significance, though some genomic inflation was evident from the GWAS. Unit SNP weights performed about as well as least-squares regression weights under cross-validation, but the performance of both increased as more SNPs were included in calculating the polygenic score. Estimates from GCTA were 35% of phenotypic variance at the recommended biological-relatedness ceiling. Taken together, our results concur with other recent studies: they support a substantial heritability of GCA, arising from a very large number of causal SNPs, each of very small effect. We place our study in the context of the literature-both contemporary and historical-and provide accessible explication of our statistical methods.
Osteosarcoma is considered to be the most common primary malignant bone cancer among children and young adults. Previous studies suggest growth spurts and height to be risk factors for osteosarcoma. ...However, studies on the genetic cause are still limited given the rare occurrence of the disease. In this study, we investigated in a family trio data set that is composed of 209 patients and their unaffected parents and conducted a genome-wide association study (GWAS) to identify genetic risk factors for osteosarcoma. We performed a Bayesian gene-based GWAS based on the single-nucleotide polymorphism (SNP)-level summary statistics obtained from a likelihood ratio test of the trio data, which uses a hierarchically structured prior that incorporates the SNP-gene hierarchical structure. The Bayesian approach has higher power than SNP-level GWAS analysis due to the reduced number of tests and is robust by accounting for the correlations between SNPs so that it borrows information across SNPs within a gene. We identified 217 genes that achieved genome-wide significance. Ingenuity pathway analysis of the gene set indicated that osteosarcoma is potentially related to TP53, estrogen receptor signaling, xenobiotic metabolism signaling, and RANK signaling in osteoclasts.
Multi-locus effect modeling is a powerful approach for detection of genes influencing a complex disease. Especially for rare variants, we need to analyze multiple variants together to achieve ...adequate power for detection. In this paper, we propose several parsimonious branching model techniques to assess the joint effect of a group of rare variants in a case-control study. These models implement a data reduction strategy within a likelihood framework and use a weighted score test to assess the statistical significance of the effect of the group of variants on the disease. The primary advantage of the proposed approach is that it performs model-averaging over a substantially smaller set of models supported by the data and thus gains power to detect multi-locus effects. We illustrate these proposed approaches on simulated and real data and study their performance compared to several existing rare variant detection approaches. The primary goal of this paper is to assess if there is any gain in power to detect association by averaging over a number of models instead of selecting the best model. Extensive simulations and real data application demonstrate the advantage the proposed approach in presence of causal variants with opposite directional effects along with a moderate number of null variants in linkage disequilibrium.