The matched case-control design, up until recently mostly pertinent to epidemiological studies, is becoming customary in biomedical applications as well. For instance, in omics studies, it is quite ...common to compare cancer and healthy tissue from the same patient. Furthermore, researchers today routinely collect data from various and variable sources that they wish to relate to the case-control status. This highlights the need to develop and implement statistical methods that can take these tendencies into account. We present an R package penalizedclr, that provides an implementation of the penalized conditional logistic regression model for analyzing matched case-control studies. It allows for different penalties for different blocks of covariates, and it is therefore particularly useful in the presence of multi-source omics data. Both L1 and L2 penalties are implemented. Additionally, the package implements stability selection for variable selection in the considered regression model. The proposed method fills a gap in the available software for fitting high-dimensional conditional logistic regression models accounting for the matched design and block structure of predictors/features. The output consists of a set of selected variables that are significantly associated with case-control status. These variables can then be investigated in terms of functional interpretation or validation in further, more targeted studies.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Quantitative genetic analyses require extensive measurements of phenotypic traits, a task that is often not trivial, especially in wild populations. On top of instrumental measurement error, some ...traits may undergo transient (i.e., nonpersistent) fluctuations that are biologically irrelevant for selection processes. These two sources of variability, which we denote here as measurement error in a broad sense, are possible causes for bias in the estimation of quantitative genetic parameters. We illustrate how in a continuous trait transient effects with a classical measurement error structure may bias estimates of heritability, selection gradients, and the predicted response to selection. We propose strategies to obtain unbiased estimates with the help of repeated measurements taken at an appropriate temporal scale. However, the fact that in quantitative genetic analyses repeated measurements are also used to isolate permanent environmental instead of transient effects requires that the information content of repeated measurements is carefully assessed. To this end, we propose to distinguish “short-term” from “long-term” repeats, where the former capture transient variability and the latter help isolate permanent effects. We show how the inclusion of the corresponding variance components in quantitative genetic models yields unbiased estimates of all quantities of interest, and we illustrate the application of the method to data from a Swiss snow vole population.
Full text
Available for:
BFBNIB, FZAB, GIS, IJS, KILJ, NLZOH, NMLJ, NUK, OILJ, PNG, SAZU, SBCE, SBMB, UL, UM, UPUK
Cancer genomic studies often include data collected from several omics platforms. Each omics data source contributes to the understanding of the underlying biological process via source specific ...("individual") patterns of variability. At the same time, statistical associations and potential interactions among the different data sources can reveal signals from common biological processes that might not be identified by single source analyses. These common patterns of variability are referred to as "shared" or "joint". In this work, we show how the use of joint and individual components can lead to better predictive models, and to a deeper understanding of the biological process at hand. We identify joint and individual contributions of DNA methylation, miRNA and mRNA expression collected from blood samples in a lung cancer case-control study nested within the Norwegian Women and Cancer (NOWAC) cohort study, and we use such components to build prediction models for case-control and metastatic status. To assess the quality of predictions, we compare models based on simultaneous, integrative analysis of multi-source omics data to a standard non-integrative analysis of each single omics dataset, and to penalized regression models. Additionally, we apply the proposed approach to a breast cancer dataset from The Cancer Genome Atlas. Our results show how an integrative analysis that preserves both components of variation is more appropriate than standard multi-omics analyses that are not based on such a distinction. Both joint and individual components are shown to contribute to a better quality of model predictions, and facilitate the interpretation of the underlying biological processes in lung cancer development. In the presence of multiple omics data sources, we recommend the use of data integration techniques that preserve the joint and individual components across the omics sources. We show how the inclusion of such components increases the quality of model predictions of clinical outcomes.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Studies on the effects of air pollution and more generally environmental exposures on health require measurements of pollutants, which are affected by measurement error. This is a cause of bias in ...the estimation of parameters relevant to the study and can lead to inaccurate conclusions when evaluating associations among pollutants, disease risk and biomarkers. Although the presence of measurement error in such studies has been recognized as a potential problem, it is rarely considered in applications and practical solutions are still lacking. In this work, we formulate Bayesian measurement error models and apply them to study the link between air pollution and omic signals. The data we use stem from the "Oxford Street II Study", a randomized crossover trial in which 60 volunteers walked for two hours in a traffic-free area (Hyde Park) and in a busy shopping street (Oxford Street) of London. Metabolomic measurements were made in each individual as well as air pollution measurements, in order to investigate the association between short-term exposure to traffic related air pollution and perturbation of metabolic pathways. We implemented error-corrected models in a classical framework and used the flexibility of Bayesian hierarchical models to account for dependencies among omic signals, as well as among different pollutants. Models were implemented using traditional Markov Chain Monte Carlo (MCMC) simulative methods as well as integrated Laplace approximation. The inclusion of a classical measurement error term resulted in variable estimates of the association between omic signals and traffic related air pollution measurements, where the direction of the bias was not predictable a priori. The models were successful in including and accounting for different correlation structures, both among omic signals and among different pollutant exposures. In general, more associations were identified when the correlation among omics and among pollutants were modeled, and their number increased when a measurement error term was additionally included in the multivariate models (particularly for the associations between metabolomics and NO2).
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Exposure to traffic-related air pollution (TRAP) has been associated with adverse health outcomes but underlying biological mechanisms remain poorly understood. Two randomized crossover trials were ...used here, the Oxford Street II (London) and the TAPAS II (Barcelona) studies, where volunteers were allocated to high or low air pollution exposures. The two locations represent different exposure scenarios, with Oxford Street characterized by diesel vehicles and Barcelona by normal mixed urban traffic. Levels of five and four pollutants were measured, respectively, using personal exposure monitoring devices. Serum samples were used for metabolomic profiling. The association between TRAP and levels of each metabolic feature was assessed. All pollutant levels were significantly higher at the high pollution sites. 29 and 77 metabolic features were associated with at least one pollutant in the Oxford Street II and TAPAS II studies, respectively, which related to 17 and 30 metabolic compounds. Little overlap was observed across pollutants for metabolic features, suggesting that different pollutants may affect levels of different metabolic features. After observing the annotated compounds, the main pathway suggested in Oxford Street II in association with NO2 was the acyl-carnitine pathway, previously found to be associated with cardio-respiratory disease. No overlap was found between the metabolic features identified in the two studies.
•Two randomized crossover trials were used to assess the relationship between TRAP and metabolic features with MS-based metabolomics (MWAS)•The locations represent different exposure scenarios, with London characterized by diesel vehicles and Barcelona by normal mixed urban traffic•Levels of 17 and 30 metabolic compounds associated with different air pollutants in the studies, with little overlap in features across pollutants•No overlap found between metabolomic features identified in the two studies, possibly due to different levels of single pollutants•The acyl-carnitine pathway, involved in cardio-respiratory disease, was suggested as a potential pathway in association with NO2 in one study
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Lifestyle factors, such as food choices and exposure to chemicals, can alter DNA methylation and lead to changes in gene activity. Two such exposures with pharmacologically active components are ...coffee and tea consumption. Both coffee and tea have been suggested to play an important role in modulating disease-risk in humans by suppressing tumour progression, decreasing inflammation and influencing estrogen metabolism. These mechanisms may be mediated by changes in DNA methylation. To investigate if DNA methylation in blood is associated with coffee and tea consumption, we performed a genome-wide DNA methylation study for coffee and tea consumption in four European cohorts (N = 3,096). DNA methylation was measured from whole blood at 421,695 CpG sites distributed throughout the genome and analysed in men and women both separately and together in each cohort. Meta-analyses of the results and additional regional-level analyses were performed. After adjusting for multiple testing, the meta-analysis revealed that two individual CpG-sites, mapping to DNAJC16 and TTC17, were differentially methylated in relation to tea consumption in women. No individual sites were associated with men or with the sex-combined analysis for tea or coffee. The regional analysis revealed that 28 regions were differentially methylated in relation to tea consumption in women. These regions contained genes known to interact with estradiol metabolism and cancer. No significant regions were found in the sex-combined and male-only analysis for either tea or coffee consumption.
Single nephrographic phase computed tomography (CT) is no worse than four-phase CT for detecting urothelial carcinoma among patients presenting with visible haematuria. Implementing a simplified CT ...protocol in the evaluation of these patients will not only decrease radiation exposure for patients, but also enhance the efficiency of radiological services.
There is uncertainty about the utility of multiphase computed tomography (CT) compared with single-phase CT in the routine examination of patients with visible haematuria (VH).
To compare the accuracies of single nephrographic phase (NP) CT and four-phase CT in detecting urothelial carcinoma (UC).
This was a single-centre, prospective, paired, noninferiority study of patients with painless VH referred for CT before cystoscopy between September 2019 and June 2021. Patients were followed up for 1 yr to ascertain UC diagnosis.
All patients underwent four-phase CT (control), from which single NP CT (experimental) was extracted. Both were independently assessed for UC.
The primary outcome was the difference in accuracy between the control and experimental CT using a 7.5% noninferiority limit. Histologically verified UC defined a positive reference standard. Secondary outcomes included differences in sensitivity, specificity, negative (NPV) and positive (PPV) predictive values, and area under the curve (AUC). All results are reported per patient.
Of the 308 patients included, UC was diagnosed in 45 (14.6%). The difference in accuracy between the control and experimental CT was 1.9% (95% confidence interval −2.8 to 6.7), demonstrating noninferiority. Sensitivity was 93.3% versus 91.1%, specificity was 83.7% versus 81.8%, NPV was 98.7% versus 98.2%, PPV was 49.4% versus 46.1%, and AUC was 0.96 versus 0.94 for the control versus experimental CT. Limitations included a low number of UC cases and no definite criteria for selecting a noninferiority limit.
The accuracy of NP CT is not inferior to that of four-phase CT for detecting UC.
This study shows that a computed tomography (CT) examination with only one contrast phase is no worse than a more complex CT examination for detecting cancer in the urinary tract among patients presenting with visible blood in the urine.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
We consider the problem of variable screening in ultra-high-dimensional generalized linear models (GLMs) of nonpolynomial orders. Since the popular SIS approach is extremely unstable in the presence ...of contamination and noise, we discuss a new robust screening procedure based on the minimum density power divergence estimator (MDPDE) of the marginal regression coefficients. Our proposed screening procedure performs well under pure and contaminated data scenarios. We provide a theoretical motivation for the use of marginal MDPDEs for variable screening from both population as well as sample aspects; in particular, we prove that the marginal MDPDEs are uniformly consistent leading to the sure screening property of our proposed algorithm. Finally, we propose an appropriate MDPDE-based extension for robust conditional screening in GLMs along with the derivation of its sure screening property. Our proposed methods are illustrated through extensive numerical studies along with an interesting real data application.
Full text
Available for:
DOBA, FZAB, GIS, IJS, IZUM, KILJ, NLZOH, NUK, OILJ, PILJ, PNG, SAZU, SBCE, SBMB, UILJ, UKNU, UL, UM, UPUK
DNA hypomethylation in certain genes is associated with tobacco exposure but it is unknown whether these methylation changes translate into increased lung cancer risk. In an epigenome-wide study of ...DNA from pre-diagnostic blood samples from 132 case-control pairs in the NOWAC cohort, we observe that the most significant associations with lung cancer risk are for cg05575921 in AHRR (OR for 1 s.d.=0.37, 95% CI: 0.31-0.54, P-value=3.3 × 10(-11)) and cg03636183 in F2RL3 (OR for 1 s.d.=0.40, 95% CI: 0.31-0.56, P-value=3.9 × 10(-10)), previously shown to be strongly hypomethylated in smokers. These associations remain significant after adjustment for smoking and are confirmed in additional 664 case-control pairs tightly matched for smoking from the MCCS, NSHDS and EPIC HD cohorts. The replication and mediation analyses suggest that residual confounding is unlikely to explain the observed associations and that hypomethylation of these CpG sites may mediate the effect of tobacco on lung cancer risk.
We consider the problem of variable screening in ultra‐high‐dimensional generalized linear models (GLMs) of nonpolynomial orders. Since the popular SIS approach is extremely unstable in the presence ...of contamination and noise, we discuss a new robust screening procedure based on the minimum density power divergence estimator (MDPDE) of the marginal regression coefficients. Our proposed screening procedure performs well under pure and contaminated data scenarios. We provide a theoretical motivation for the use of marginal MDPDEs for variable screening from both population as well as sample aspects; in particular, we prove that the marginal MDPDEs are uniformly consistent leading to the sure screening property of our proposed algorithm. Finally, we propose an appropriate MDPDE‐based extension for robust conditional screening in GLMs along with the derivation of its sure screening property. Our proposed methods are illustrated through extensive numerical studies along with an interesting real data application.
Full text
Available for:
DOBA, FZAB, GIS, IJS, IZUM, KILJ, NLZOH, NUK, OILJ, PILJ, PNG, SAZU, SBCE, SBMB, UILJ, UKNU, UL, UM, UPUK