Polygenic risk scores (PRSs) for a variety of diseases have recently been shown to have relative risks that depend on age, and genetic relative risks decrease with increasing age. A refined ...understanding of the age dependency of PRSs for a disease is important for personalized risk predictions and risk stratification. To further evaluate how the PRS relative risk for prostate cancer depends on age, we refined analyses for a validated PRS for prostate cancer by using 64,274 prostate cancer cases and 46,432 controls of diverse ancestry (82.8% European, 9.8% African American, 3.8% Latino, 2.8% Asian, and 0.8% Ghanaian). Our strategy applied a novel weighted proportional hazards model to case-control data to fully utilize age to refine how the relative risk decreased with age. We found significantly greater relative risks for younger men (age 30-55 years) compared with older men (70-88 years) for both relative risk per standard deviation of the PRS and dichotomized according to the upper 90
percentile of the PRS distribution. For the largest European ancestral group that could provide reliable resolution, the log-relative risk decreased approximately linearly from age 50 to age 75. Despite strong evidence of age-dependent genetic relative risk, our results suggest that absolute risk predictions differed little from predictions that assumed a constant relative risk over ages, from short-term to long-term predictions, simplifying implementation of risk discussions into clinical practice.
The kinship2 R Package for Pedigree Data Sinnwell, Jason P.; Therneau, Terry M.; Schaid, Daniel J.
Human heredity,
01/2014, Letnik:
78, Številka:
2
Journal Article
Recenzirano
Odprti dostop
Background: The kinship2 package is restructured from the previous kinship package. Existing features are now enhanced and new features added for handling pedigree objects. Methods: Pedigree plotting ...features have been updated to display features on complex pedigrees while adhering to pedigree plotting standards. Kinship matrices can now be calculated for the X chromosome. Other methods have been added to subset and trim pedigrees while maintaining the pedigree structure. Conclusion: We make the kinship2 package available for R on the Contributed R Archives Network (CRAN), where data management is built-in and other packages can use the pedigree object.
Mediation analysis attempts to determine whether the relationship between an independent variable (e.g., exposure) and an outcome variable can be explained, at least partially, by an intermediate ...variable, called a mediator. Most methods for mediation analysis focus on one mediator at a time, although multiple mediators can be jointly analyzed by structural equation models (SEMs) that account for correlations among the mediators. We extend the use of SEMs for the analysis of multiple mediators by creating a sparse group lasso penalized model such that the penalty considers the natural groupings of parameters that determine mediation, as well as encourages sparseness of the model parameters. This provides a way to simultaneously evaluate many mediators and select those that have the most impact, a feature of modern penalized models. Simulations are used to illustrate the benefits and limitations of our approach, and application to a study of DNA methylation and reactive cortisol stress following childhood trauma discovered two novel methylation loci that mediate the association of childhood trauma scores with reactive cortisol stress levels. Our new methods are incorporated into R software called regmed.
Genetic studies have shifted to sequencing-based rare variants discovery after decades of success in identifying common disease variants by Genome-Wide Association Studies using Single Nucleotide ...Polymorphism chips. Sequencing-based studies require large sample sizes for statistical power and therefore often inadvertently introduce batch effects because samples are typically collected, processed, and sequenced at multiple centers. Conventionally, batch effects are first detected and visualized using Principal Components Analysis and then controlled by including batch covariates in the disease association models. For sequencing-based genetic studies, because all variants included in the association analyses have passed sequencing-related quality control measures, this conventional approach treats every variant as equal and ignores the substantial differences still remaining in variant qualities and characteristics such as genotype quality scores, alternative allele fractions (fraction of reads supporting alternative allele at a variant position) and sequencing depths. In the Alzheimer's Disease Sequencing Project (ADSP) exome dataset of 9,904 cases and controls, we discovered hidden variant-level differences between sample batches of three sequencing centers and two exome capture kits. Although sequencing centers were included as a covariate in our association models, we observed differences at the variant level in genotype quality and alternative allele fraction between samples processed by different exome capture kits that significantly impacted both the confidence of variant detection and the identification of disease-associated variants. Furthermore, we found that a subset of top disease-risk variants came exclusively from samples processed by one exome capture kit that was more effective at capturing the alternative alleles compared to the other kit. Our findings highlight the importance of additional variant-level quality control for large sequencing-based genetic studies. More importantly, we demonstrate that automatically filtering out variants with batch differences may lead to false negatives if the batch discordances come largely from quality differences and if the batch-specific variants have better quality.
Genetic pleiotropy is when a single gene influences more than one trait. Detecting pleiotropy and understanding its causes can improve the biological understanding of a gene in multiple ways, yet ...current multivariate methods to evaluate pleiotropy test the null hypothesis that none of the traits are associated with a variant; departures from the null could be driven by just one associated trait. A formal test of pleiotropy should assume a null hypothesis that one or no traits are associated with a genetic variant. For the special case of two traits, one can construct this null hypothesis based on the intersection-union (IU) test, which rejects the null hypothesis only if the null hypotheses of no association for both traits are rejected. To allow for more than two traits, we developed a new likelihood-ratio test for pleiotropy. We then extended the testing framework to a sequential approach to test the null hypothesis that Formula: see text traits are associated, given that the null of k traits are associated was rejected. This provides a formal testing framework to determine the number of traits associated with a genetic variant, while accounting for correlations among the traits. By simulations, we illustrate the type I error rate and power of our new methods; describe how they are influenced by sample size, the number of traits, and the trait correlations; and apply the new methods to multivariate immune phenotypes in response to smallpox vaccination. Our new approach provides a quantitative assessment of pleiotropy, enhancing current analytic practice.
Statistical methods to integrate multiple layers of data, from exposures to intermediate traits to outcome variables, are needed to guide interpretation of complex data sets for which variables are ...likely contributing in a causal pathway from exposure to outcome. Statistical mediation analysis based on structural equation models provide a general modeling framework, yet they can be difficult to apply to high-dimensional data and they are not automated to select the best fitting model. To overcome these limitations, we developed novel algorithms and software to simultaneously evaluate multiple exposure variables, multiple intermediate traits, and multiple outcome variables. Our penalized mediation models are computationally efficient and simulations demonstrate that they produce reliable results for large data sets. Application of our methods to a study of vascular disease demonstrates their utility to identify novel direct effects of single-nucleotide polymorphisms (SNPs) on coronary heart disease and peripheral artery disease, while disentangling the effects of SNPs on the intermediate risk factors including lipids, cigarette smoking, systolic blood pressure, and type 2 diabetes.
Triple-negative breast cancer (TNBC) is the most aggressive breast cancer subtype. Patients with TNBC are primarily treated with neoadjuvant chemotherapy (NAC). The response to NAC is prognostic, ...with reductions in overall survival and disease-free survival rates in those patients who do not achieve a pathological complete response (pCR). Based on this premise, we hypothesized that paired analysis of primary and residual TNBC tumors following NAC could identify unique biomarkers associated with post-NAC recurrence.
We investigated 24 samples from 12 non-LAR TNBC patients with paired pre- and post-NAC data, including four patients with recurrence shortly after surgery (< 24 months) and eight who remained recurrence-free (> 48 months). These tumors were collected from a prospective NAC breast cancer study (BEAUTY) conducted at the Mayo Clinic. Differential expression analysis of pre-NAC biopsies showed minimal gene expression differences between early recurrent and nonrecurrent TNBC tumors; however, post-NAC samples demonstrated significant alterations in expression patterns in response to intervention. Topological-level differences associated with early recurrence were implicated in 251 gene sets, and an independent assessment of microarray gene expression data from the 9 paired non-LAR samples available in the NAC I-SPY1 trial confirmed 56 gene sets. Within these 56 gene sets, 113 genes were observed to be differentially expressed in the I-SPY1 and BEAUTY post-NAC studies. An independent (n = 392) breast cancer dataset with relapse-free survival (RFS) data was used to refine our gene list to a 17-gene signature. A threefold cross-validation analysis of the gene signature with the combined BEAUTY and I-SPY1 data yielded an average AUC of 0.88 for six machine-learning models. Due to the limited number of studies with pre- and post-NAC TNBC tumor data, further validation of the signature is needed.
Analysis of multiomics data from post-NAC TNBC chemoresistant tumors showed down regulation of mismatch repair and tubulin pathways. Additionally, we identified a 17-gene signature in TNBC associated with post-NAC recurrence enriched with down-regulated immune genes.
Approximately one‐third of patients with metastatic castration‐resistant prostate cancer (CRPC) exhibited primary abiraterone resistance. To identify alternative treatment for abiraterone ...nonresponders, we performed drug discovery analyses using the L1000 database using differentially expressed genes identified in tumor biopsies and patient‐derived xenograft (PDX) tumors between abiraterone responders and nonresponders enrolled in PROMOTE trial. This approach identified 3 drugs, including topoisomerase II (TOP2) inhibitor mitoxantrone, CDK4/6 inhibitor palbociclib, and pan‐CDK inhibitor PHA‐793887. These drugs significantly suppressed the growth of abiraterone‐resistant cell lines and PDX models. Moreover, we identified 11 genes targeted by all 3 drugs that were associated with worse outcomes in both the PROMOTE and Stand Up To Cancer cohorts. This 11‐gene panel might also function as biomarkers to select the 3 alternative therapies for this subgroup of patients with CRPC, warranting further clinical investigation.
ABSTRACT
Searching for rare genetic variants associated with complex diseases can be facilitated by enriching for diseased carriers of rare variants by sampling cases from pedigrees enriched for ...disease, possibly with related or unrelated controls. This strategy, however, complicates analyses because of shared genetic ancestry, as well as linkage disequilibrium among genetic markers. To overcome these problems, we developed broad classes of “burden” statistics and kernel statistics, extending commonly used methods for unrelated case‐control data to allow for known pedigree relationships, for autosomes and the X chromosome. Furthermore, by replacing pedigree‐based genetic correlation matrices with estimates of genetic relationships based on large‐scale genomic data, our methods can be used to account for population‐structured data. By simulations, we show that the type I error rates of our developed methods are near the asymptotic nominal levels, allowing rapid computation of P‐values. Our simulations also show that a linear weighted kernel statistic is generally more powerful than a weighted “burden” statistic. Because the proposed statistics are rapid to compute, they can be readily used for large‐scale screening of the association of genomic sequence data with disease status.
Summary
When a single gene influences more than one trait, known as pleiotropy, it is important to detect pleiotropy to improve the biological understanding of a gene. This can lead to improved ...screening, diagnosis, and treatment of diseases. Yet, most current multivariate methods to evaluate pleiotropy test the null hypothesis that none of the traits are associated with a variant; departures from the null could be driven by just one associated trait. A formal test of pleiotropy should assume a null hypothesis that one or fewer traits are associated with a genetic variant. We recently developed statistical methods to analyze pleiotropy for quantitative traits having a multivariate normal distribution. We now extend this approach to traits that can be modeled by generalized linear models, such as analysis of binary, ordinal, or quantitative traits, or a mixture of these types of traits. Based on methods from estimating equations, we developed a new test for pleiotropy. We then extended the testing framework to a sequential approach to test the null hypothesis that $k+1$ traits are associated, given that the null of $k$ associated traits was rejected. This provides a testing framework to determine the number of traits associated with a genetic variant, as well as which traits, while accounting for correlations among the traits. By simulations, we illustrate the Type-I error rate and power of our new methods, describe how they are influenced by sample size, the number of traits, and the trait correlations, and apply the new methods to a genome-wide association study of multivariate traits measuring symptoms of major depression. Our new approach provides a quantitative assessment of pleiotropy, enhancing current analytic practice.