An efficient basket trial design Cunanan, Kristen M.; Iasonos, Alexia; Shen, Ronglai ...
Statistics in medicine,
10 May 2017, Letnik:
36, Številka:
10
Journal Article
It is increasingly common clinically for cancer specimens to be examined using techniques that identify somatic mutations. In principle, these mutational profiles can be used to diagnose the tissue ...of origin, a critical task for the 3% to 5% of tumors that have an unknown primary site. Diagnosis of primary site is also critical for screening tests that employ circulating DNA. However, most mutations observed in any new tumor are very rarely occurring mutations, and indeed the preponderance of these may never have been observed in any previous recorded tumor. To create a viable diagnostic tool we need to harness the information content in this “hidden genome” of variants for which no direct information is available. To accomplish this we propose a multilevel meta‐feature regression to extract the critical information from rare variants in the training data in a way that permits us to also extract diagnostic information from any previously unobserved variants in the new tumor sample. A scalable implementation of the model is obtained by combining a high‐dimensional feature screening approach with a group‐lasso penalized maximum likelihood approach based on an equivalent mixed‐effect representation of the multilevel model. We apply the method to the Cancer Genome Atlas whole‐exome sequencing data set including 3702 tumor samples across seven common cancer sites. Results show that our multilevel approach can harness substantial diagnostic information from the hidden genome.
We have observed that the area under the receiver operating characteristic curve (AUC) is increasingly being used to evaluate whether a novel predictor should be incorporated in a multivariable model ...to predict risk of disease. Frequently, investigators will approach the issue in two distinct stages: first, by testing whether the new predictor variable is significant in a multivariable regression model; second, by testing differences between the AUC of models with and without the predictor using the same data from which the predictive models were derived. These two steps often lead to discordant conclusions.
We conducted a simulation study in which two predictors, X and X*, were generated as standard normal variables with varying levels of predictive strength, represented by means that differed depending on the binary outcome Y. The data sets were analyzed using logistic regression, and likelihood ratio and Wald tests for the incremental contribution of X* were performed. The patient-specific predictors for each of the models were then used as data for a test comparing the two AUCs. Under the null, the size of the likelihood ratio and Wald tests were close to nominal, but the area test was extremely conservative, with test sizes less than 0.006 for all configurations studied. Where X* was associated with outcome, the area test had much lower power than the likelihood ratio and Wald tests.
Evaluation of the statistical significance of a new predictor when there are existing clinical predictors is most appropriately accomplished in the context of a regression model. Although comparison of AUCs is a conceptually equivalent approach to the likelihood ratio and Wald test, it has vastly inferior statistical properties. Use of both approaches will frequently lead to inconsistent conclusions. Nonetheless, comparison of receiver operating characteristic curves remains a useful descriptive tool for initial evaluation of whether a new predictor might be of clinical relevance.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
To date, the vast preponderance of somatic variants observed in the cancer genome have been rare variants, and it is common in practice to encounter in a new tumor variants that have not been ...observed previously. Here we focus on probability estimation for encountering such hitherto unseen variants. We draw upon statistical methodology that has been developed in other fields of study, notably in species estimation in ecology, and word frequency estimation in computational linguistics. Analysis of whole-exome and targeted panel sequencing data sets reveal substantial variability in variant "richness" between genes that could be harnessed for clinically relevant problems. We quantify the variant-tissue association and show a strong gene-specific, lineage-dependent pattern of encountering new variants. This variability is largely determined by the proportion of observed variants that are rare. Our findings suggest that variants that occur at very low frequencies can harbor important signals that are clinically consequential.
The Lorenz curve is a graphical tool that is used widely in econometrics. It represents the spread of a probability distribution, and its traditional use has been to characterize population ...distributions of wealth or income, or more specifically, inequalities in wealth or income. However, its utility in public health research has not been broadly established. The purpose of this article is to explain its special usefulness for characterizing the population distribution of disease risks, and in particular for identifying the precise disease burden that can be predicted to occur in segments of the population that are known to have especially high (or low) risks, a feature that is important for evaluating the yield of screening or other disease prevention initiatives. We demonstrate that, although the Lorenz curve represents the distribution of predicted risks in a population at risk for the disease, in fact it can be estimated from a case–control study conducted in the population without the need for information on absolute risks. We explore two different estimation strategies and compare their statistical properties using simulations. The Lorenz curve is a statistical tool that deserves wider use in public health research.
Although most hospital-based studies suggest more favorable survival with tumor-infiltrating lymphocytes (TILs) present in primary melanomas, it is uncertain whether TILs provide prognostic ...information beyond existing melanoma staging definitions. We addressed the issue in an international population-based study of patients with single and multiple primary melanomas.
On the basis of the Genes, Environment and Melanoma (GEM) study, we conducted follow-up of 2,845 patients diagnosed from 1998 to 2003 with 3,330 invasive primary melanomas centrally reviewed for TIL grade (absent, nonbrisk, or brisk). The odds of TIL grades associated with clinicopathologic features and survival by TIL grade were examined.
Independent predictors (P < .05) for nonbrisk TIL grade were site, histologic subtype, and Breslow thickness, and for brisk TIL grade, they were age, site, Breslow thickness, and radial growth phase. Nonbrisk and brisk TIL grades were each associated with lower American Joint Committee on Cancer (AJCC) tumor stage compared with TIL absence (P(trend) < .001). Death as a result of melanoma was 30% less with nonbrisk TIL grade (hazard ratio HR, 0.7; 95% CI, 0.5 to 1.0) and 50% less with brisk TIL grade (HR, 0.5; 95% CI, 0.3 to 0.9) relative to TIL absence, adjusted for age, sex, site, and AJCC tumor stage.
At the population level, higher TIL grade of primary melanoma is associated with a lower risk of death as a result of melanoma independently of tumor characteristics currently used for AJCC tumor stage. We conclude that TIL grade deserves further prospective investigation to determine whether it should be included in future AJCC staging revisions.
Inferring the cancer-type specificities of ultra-rare, genome-wide somatic mutations is an open problem. Traditional statistical methods cannot handle such data due to their ultra-high dimensionality ...and extreme data sparsity. To harness information in rare mutations, we have recently proposed a formal multilevel multilogistic "hidden genome" model. Through its hierarchical layers, the model condenses information in ultra-rare mutations through meta-features embodying mutation contexts to characterize cancer types. Consistent, scalable point estimation of the model can incorporate 10s of millions of variants across thousands of tumors and permit impressive prediction and attribution. However, principled statistical inference is infeasible due to the volume, correlation, and noninterpretability of mutation contexts. In this paper, we propose a novel framework that leverages topic models from computational linguistics to effectuate dimension reduction of mutation contexts producing interpretable, decorrelated meta-feature topics. We propose an efficient MCMC algorithm for implementation that permits rigorous full Bayesian inference at a scale that is orders of magnitude beyond the capability of existing out-of-the-box inferential high-dimensional multi-class regression methods and software. Applying our model to the Pan Cancer Analysis of Whole Genomes dataset reveals interesting biological insights including somatic mutational topics associated with UV exposure in skin cancer, aging in colorectal cancer, and strong influence of epigenome organization in liver cancer. Under cross-validation, our model demonstrates highly competitive predictive performance against blackbox methods of random forest and deep learning.