Advancing from statistical associations of complex traits with genetic markers to understanding the functional genetic variants that influence traits is often a complex process. Fine-mapping can ...select and prioritize genetic variants for further study, yet the multitude of analytical strategies and study designs makes it challenging to choose an optimal approach. We review the strengths and weaknesses of different fine-mapping approaches, emphasizing the main factors that affect performance. Topics include interpreting results from genome-wide association studies (GWAS), the role of linkage disequilibrium, statistical fine-mapping approaches, trans-ethnic studies, genomic annotation and data integration, and other analysis and design issues.
Clinical trials have been the bedrock of research to evaluate the safety and efficacy of new medical, surgical, or other interventions. Traditional "explanatory" clinical trials have aimed to explain ...a biological cause (new treatment) and effect (patient outcome) while controlling for many factors that might impact the evaluation, such as restricted eligibility criteria, frequent follow-up visits, and multiple clinical and laboratory measures. Despite the benefits of a well-controlled clinical trial, compromises have been made that can limit who might benefit from a new intervention, can increase complexity of the conduct of a trial, or that lead to excessively long durations of trials. An alternative approach to evaluate the effectiveness of an intervention is based on "pragmatic" clinical trials, which consider how an intervention affects a patient's condition in the real world, accounting for how to optimize an intervention within the operations of busy and diverse clinical practices. Although we describe explanatory and pragmatic trial designs as separate approaches, there is a continuum of approaches that intersect. Some key points are the need to maintain scientific rigor, increase efficiency of clinical trials operations, ensure that trial results can be generalized to a broad spectrum of patients, and balance the needs of real-world clinical care. Pragmatic trials can leverage technology and telecommunication strategies of decentralized trials to further reach underrepresented and underserved patients to close the health disparity gaps.
Polygenic scores (PGS) for coronary heart disease (CHD) are constructed using GWAS summary statistics for CHD. However, pleiotropy is pervasive in biology and disease-associated variants often share ...etiologic pathways with multiple traits. Therefore, incorporating GWAS summary statistics of additional traits could improve the performance of PGS for CHD. Using lasso regression models, we developed two multi-PGS for CHD: 1) multiPGS
, utilizing GWAS summary statistics for CHD, its risk factors, and other ASCVD as training data and the UK Biobank for tuning, and 2) extendedPGS
, using existing PGS for a broader range of traits in the PGS Catalog as training data and the Atherosclerosis Risk in Communities Study (ARIC) cohort for tuning. We evaluated the performance of multiPGS
and extendedPGS
in the Mayo Clinic Biobank, an independent cohort of 43,578 adults of European ancestry which included 4,479 CHD cases and 39,099 controls. In the Mayo Clinic Biobank, a 1 SD increase in multiPGS
and extendedPGS
was associated with a 1.66-fold (95% CI: 1.60-1.71) and 1.70-fold (95% CI: 1.64-1.76) increased odds of CHD, respectively, in models that included age, sex, and 10 PCs, whereas an already published PGS for CHD (CHD_PRSCS) increased the odds by 1.50 (95% CI: 1.45-1.56). In the highest deciles of extendedPGS
, multiPGS
, and CHD_PRSCS, 18.4%, 17.5%, and 16.3% of patients had CHD, respectively.
Prostate cancer is the most frequent cancer among men in most developed countries, yet little is known about its causes. Older age, African ancestry and a positive family history of prostate cancer ...have long been recognized as important risk factors. The evidence that genetics probably plays a critical role is based on a variety of study designs, including case–control, cohort, twin and family-based, all of which are reviewed in detail. The search for prostate cancer susceptibility genes by linkage studies offered early hope that finding genes would be as ‘easy’ as finding genes for breast cancer and colon cancer susceptibilities. However, this hope has been dampened by the difficulty of replicating promising regions of linkage. This review provides updates on recent developments, and a broad view of the disparate findings from different linkage studies. Early linkage results have provided targeted candidate regions for prostate cancer susceptibility loci, including HPC1 on chromosome 1q23–25, PCAP on chromosome 1q42–43, CAPB on chromosome 1p36, linkage to chromosome 8p22–23, HPC2 on chromosome 17p, HPC20 on chromosome 20q13, and HPCX on chromosome Xq27–28. These linkage findings lead to refined mapping and mutation screening of several strong candidate genes, including ELAC2, RNASEL and MSR1. Up to now, a total of 10 genome-wide linkage scans for prostate cancer susceptibility have been completed, and are reviewed. Furthermore, recent findings that Gleason's grade, a measure of aggressiveness of prostate cancer, is linked to several genomic regions are reviewed. Finally, the roles of environmental and dietary risk factors, and common genetic polymorphisms of genes likely to play a role in common forms of prostate cancer, are briefly discussed within in the context of searching for genes that influence prostate cancer risk.
Polygenic risk scores (PRSs) for a variety of diseases have recently been shown to have relative risks that depend on age, and genetic relative risks decrease with increasing age. A refined ...understanding of the age dependency of PRSs for a disease is important for personalized risk predictions and risk stratification. To further evaluate how the PRS relative risk for prostate cancer depends on age, we refined analyses for a validated PRS for prostate cancer by using 64,274 prostate cancer cases and 46,432 controls of diverse ancestry (82.8% European, 9.8% African American, 3.8% Latino, 2.8% Asian, and 0.8% Ghanaian). Our strategy applied a novel weighted proportional hazards model to case-control data to fully utilize age to refine how the relative risk decreased with age. We found significantly greater relative risks for younger men (age 30-55 years) compared with older men (70-88 years) for both relative risk per standard deviation of the PRS and dichotomized according to the upper 90
percentile of the PRS distribution. For the largest European ancestral group that could provide reliable resolution, the log-relative risk decreased approximately linearly from age 50 to age 75. Despite strong evidence of age-dependent genetic relative risk, our results suggest that absolute risk predictions differed little from predictions that assumed a constant relative risk over ages, from short-term to long-term predictions, simplifying implementation of risk discussions into clinical practice.
Two recently developed fine-mapping methods, CAVIAR and PAINTOR, demonstrate better performance over other fine-mapping methods. They also have the advantage of using only the marginal test ...statistics and the correlation among SNPs. Both methods leverage the fact that the marginal test statistics asymptotically follow a multivariate normal distribution and are likelihood based. However, their relationship with Bayesian fine mapping, such as BIMBAM, is not clear. In this study, we first show that CAVIAR and BIMBAM are actually approximately equivalent to each other. This leads to a fine-mapping method using marginal test statistics in the Bayesian framework, which we call CAVIAR Bayes factor (CAVIARBF). Another advantage of the Bayesian framework is that it can answer both association and fine-mapping questions. We also used simulations to compare CAVIARBF with other methods under different numbers of causal variants. The results showed that both CAVIARBF and BIMBAM have better performance than PAINTOR and other methods. Compared to BIMBAM, CAVIARBF has the advantage of using only marginal test statistics and takes about one-quarter to one-fifth of the running time. We applied different methods on two independent cohorts of the same phenotype. Results showed that CAVIARBF, BIMBAM, and PAINTOR selected the same top 3 SNPs; however, CAVIARBF and BIMBAM had better consistency in selecting the top 10 ranked SNPs between the two cohorts. Software is available at https://bitbucket.org/Wenan/caviarbf.
Sequencing cases without matched healthy controls hinders prioritization of germline disease-predisposition genes. To circumvent this problem, genotype summary counts from public data sets can serve ...as controls. However, systematic inflation and false positives can arise if confounding factors are not controlled. We propose a framework, consistent summary counts based rare variant burden test (CoCoRV), to address these challenges. CoCoRV implements consistent variant quality control and filtering, ethnicity-stratified rare variant association test, accurate estimation of inflation factors, powerful FDR control, and detection of rare variant pairs in high linkage disequilibrium. When we applied CoCoRV to pediatric cancer cohorts, the top genes identified were cancer-predisposition genes. We also applied CoCoRV to identify disease-predisposition genes in adult brain tumors and amyotrophic lateral sclerosis. Given that potential confounding factors were well controlled after applying the framework, CoCoRV provides a cost-effective solution to prioritizing disease-risk genes enriched with rare pathogenic variants.
Because polygenic risk scores (PRSs) for coronary heart disease (CHD) are derived from mainly European ancestry (EA) cohorts, their validity in African ancestry (AA) and Hispanic ethnicity (HE) ...individuals is unclear. We investigated associations of “restricted” and genome-wide PRSs with CHD in three major racial and ethnic groups in the U.S. The eMERGE cohort (mean age 48 ± 14 years, 58% female) included 45,645 EA, 7,597 AA, and 2,493 HE individuals. We assessed two restricted PRSs (PRSTikkanen and PRSTada; 28 and 50 variants, respectively) and two genome-wide PRSs (PRSmetaGRS and PRSLDPred; 1.7 M and 6.6 M variants, respectively) derived from EA cohorts. Over a median follow-up of 11.1 years, 2,652 incident CHD events occurred. Hazard and odds ratios for the association of PRSs with CHD were similar in EA and HE cohorts but lower in AA cohorts. Genome-wide PRSs were more strongly associated with CHD than restricted PRSs were. PRSmetaGRS, the best performing PRS, was associated with CHD in all three cohorts; hazard ratios (95% CI) per 1 SD increase were 1.53 (1.46–1.60), 1.53 (1.23–1.90), and 1.27 (1.13–1.43) for incident CHD in EA, HE, and AA individuals, respectively. The hazard ratios were comparable in the EA and HE cohorts (pinteraction = 0.77) but were significantly attenuated in AA individuals (pinteraction= 2.9 × 10−3). These results highlight the potential clinical utility of PRSs for CHD as well as the need to assemble diverse cohorts to generate ancestry- and ethnicity PRSs.
One problem that plagues epigenome-wide association studies is the potential confounding due to cell mixtures when purified target cells are not available. Reference-free adjustment of cell mixtures ...has become increasingly popular due to its flexibility and simplicity. However, existing methods are still not optimal: increased false positive rates and reduced statistical power have been observed in many scenarios.
We develop SmartSVA, an optimized surrogate variable analysis (SVA) method, for fast and robust reference-free adjustment of cell mixtures. SmartSVA corrects the limitation of traditional SVA under highly confounded scenarios by imposing an explicit convergence criterion and improves the computational efficiency for large datasets.
Compared to traditional SVA, SmartSVA achieves an order-of-magnitude speedup and better false positive control. It protects the signals when capturing the cell mixtures, resulting in significant power increase while controlling for false positives. Through extensive simulations and real data applications, we demonstrate a better performance of SmartSVA than the existing methods.
SmartSVA is a fast and robust method for reference-free adjustment of cell mixtures for epigenome-wide association studies. As a general method, SmartSVA can be applied to other genomic studies to capture unknown sources of variability.