This article presents a novel algorithm that efficiently computes L1 penalized (lasso) estimates of parameters in high‐dimensional models. The lasso has the property that it simultaneously performs ...variable selection and shrinkage, which makes it very useful for finding interpretable prediction rules in high‐dimensional data. The new algorithm is based on a combination of gradient ascent optimization with the Newton–Raphson algorithm. It is described for a general likelihood function and can be applied in generalized linear models and other models with an L1 penalty. The algorithm is demonstrated in the Cox proportional hazards model, predicting survival of breast cancer patients using gene expression data, and its performance is compared with competing approaches. An R package, penalized, that implements the method, is available on CRAN.
Motivation: Many statistical tests have been proposed in recent years for analyzing gene expression data in terms of gene sets, usually from Gene Ontology. These methods are based on widely different ...methodological assumptions. Some approaches test differential expression of each gene set against differential expression of the rest of the genes, whereas others test each gene set on its own. Also, some methods are based on a model in which the genes are the sampling units, whereas others treat the subjects as the sampling units. This article aims to clarify the assumptions behind different approaches and to indicate a preferential methodology of gene set testing.
Results: We identify some crucial assumptions which are needed by the majority of methods. P-values derived from methods that use a model which takes the genes as the sampling unit are easily misinterpreted, as they are based on a statistical model that does not resemble the biological experiment actually performed. Furthermore, because these models are based on a crucial and unrealistic independence assumption between genes, the P-values derived from such methods can be wildly anti-conservative, as a simulation experiment shows. We also argue that methods that competitively test each gene set against the rest of the genes create an unnecessary rift between single gene testing and gene set testing.
Contact:
j.j.goeman@lumc.nl
Evaluating the variation in the strength of the effect across studies is a key feature of meta-analyses. This variability is reflected by measures like τ(2) or I(2), but their clinical interpretation ...is not straightforward. A prediction interval is less complicated: it presents the expected range of true effects in similar studies. We aimed to show the advantages of having the prediction interval routinely reported in meta-analyses.
We show how the prediction interval can help understand the uncertainty about whether an intervention works or not. To evaluate the implications of using this interval to interpret the results, we selected the first meta-analysis per intervention review of the Cochrane Database of Systematic Reviews Issues 2009-2013 with a dichotomous (n=2009) or continuous (n=1254) outcome, and generated 95% prediction intervals for them.
In 72.4% of 479 statistically significant (random-effects p<0.05) meta-analyses in the Cochrane Database 2009-2013 with heterogeneity (I(2)>0), the 95% prediction interval suggested that the intervention effect could be null or even be in the opposite direction. In 20.3% of those 479 meta-analyses, the prediction interval showed that the effect could be completely opposite to the point estimate of the meta-analysis. We demonstrate also how the prediction interval can be used to calculate the probability that a new trial will show a negative effect and to improve the calculations of the power of a new trial.
The prediction interval reflects the variation in treatment effects over different settings, including what effect is to be expected in future patients, such as the patients that a clinician is interested to treat. Prediction intervals should be routinely reported to allow more informative inferences in meta-analyses.
Permutation-based true discovery guarantee by sum tests Vesely, Anna; Finos, Livio; Goeman, Jelle J
Journal of the Royal Statistical Society. Series B, Statistical methodology,
07/2023, Letnik:
85, Številka:
3
Journal Article
Recenzirano
Odprti dostop
Abstract
Sum-based global tests are highly popular in multiple hypothesis testing. In this paper, we propose a general closed testing procedure for sum tests, which provides lower confidence bounds ...for the proportion of true discoveries (TDPs), simultaneously over all subsets of hypotheses. These simultaneous inferences come for free, i.e., without any adjustment of the α-level, whenever a global test is used. Our method allows for an exploratory approach, as simultaneity ensures control of the TDP even when the subset of interest is selected post hoc. It adapts to the unknown joint distribution of the data through permutation testing. Any sum test may be employed, depending on the desired power properties. We present an iterative shortcut for the closed testing procedure, based on the branch and bound algorithm, which converges to the full closed testing results, often after few iterations; even if it is stopped early, it controls the TDP. We compare the properties of different choices for the sum test through simulations, then we illustrate the feasibility of the method for high-dimensional data on brain imaging and genomics data.
Identify gene expression profiles associated with OA processes in articular cartilage and determine pathways changing during the disease process.
Genome wide gene expression was determined in paired ...samples of OA affected and preserved cartilage of the same joint using microarray analysis for 33 patients of the RAAK study. Results were replicated in independent samples by RT-qPCR and immunohistochemistry. Profiles were analyzed with the online analysis tools DAVID and STRING to identify enrichment for specific pathways and protein-protein interactions.
Among the 1717 genes that were significantly differently expressed between OA affected and preserved cartilage we found significant enrichment for genes involved in skeletal development (e.g. TNFRSF11B and FRZB). Also several inflammatory genes such as CD55, PTGES and TNFAIP6, previously identified in within-joint analyses as well as in analyses comparing preserved cartilage from OA affected joints versus healthy cartilage were among the top genes. Of note was the high up-regulation of NGF in OA cartilage. RT-qPCR confirmed differential expression for 18 out of 19 genes with expression changes of 2-fold or higher, and immunohistochemistry of selected genes showed a concordant change in protein expression. Most of these changes associated with OA severity (Mankin score) but were independent of joint-site or sex.
We provide further insights into the ongoing OA pathophysiological processes in cartilage, in particular into differences in macroscopically intact cartilage compared to OA affected cartilage, which seem relatively consistent and independent of sex or joint. We advocate that development of treatment could benefit by focusing on these similarities in gene expression changes and/or pathways.
Summary We investigate a class of methods for selective inference that condition on a selection event. Such methods follow a two-stage process. First, a data-driven collection of hypotheses is chosen ...from some large universe of hypotheses. Subsequently, inference takes place within this data-driven collection, conditioned on the information that was used for the selection. Examples of such methods include basic data splitting as well as modern data-carving methods and post-selection inference methods for lasso coefficients based on the polyhedral lemma. In this article, we take a holistic view of such methods, considering the selection, conditioning and final error control steps together as a single method. From this perspective, we demonstrate that multiple testing methods defined directly on the full universe of hypotheses are always at least as powerful as selective inference methods based on selection and conditioning. This result holds true even when the universe is potentially infinite and only implicitly defined, such as in the case of data splitting. We provide general theory and intuition before investigating in detail several case studies where a shift to a nonselective or unconditional perspective can yield a power gain.
Summary We introduce a multiple testing procedure that controls the median of the proportion of false discoveries in a flexible way. The procedure requires only a vector of p-values as input and is ...comparable to the Benjamini–Hochberg method, which controls the mean of the proportion of false discoveries. Our method allows free choice of one or several values of $ \alpha $ after seeing the data, unlike the Benjamini–Hochberg procedure, which can be very anti-conservative when $ \alpha $ is chosen post hoc. We prove these claims and illustrate them with simulations. The proposed procedure is inspired by a popular estimator of the total number of true hypotheses. We adapt this estimator to provide simultaneously median unbiased estimators of the proportion of false discoveries, valid for finite samples. This simultaneity allows for the claimed flexibility. Our approach does not assume independence. The time complexity of our method is linear in the number of hypotheses, after sorting the p-values.
Periconceptional diet may persistently influence DNA methylation levels with phenotypic consequences. However, a comprehensive assessment of the characteristics of prenatal malnutrition-associated ...differentially methylated regions (P-DMRs) is lacking in humans. Here we report on a genome-scale analysis of differential DNA methylation in whole blood after periconceptional exposure to famine during the Dutch Hunger Winter. We show that P-DMRs preferentially occur at regulatory regions, are characterized by intermediate levels of DNA methylation and map to genes enriched for differential expression during early development. Validation and further exploratory analysis of six P-DMRs highlight the critical role of gestational timing. Interestingly, differential methylation of the P-DMRs extends along pathways related to growth and metabolism. P-DMRs located in INSR and CPT1A have enhancer activity in vitro and differential methylation is associated with birth weight and serum LDL cholesterol. Epigenetic modulation of pathways by prenatal malnutrition may promote an adverse metabolic phenotype in later life.