Continuous diagnostic tests are often used for discriminating between healthy and diseased populations. For the clinical application of such tests, it is useful to select a cutpoint or discrimination ...value c that defines positive and negative test results. In general, individuals with a diagnostic test value of c or higher are classified as diseased. Several search strategies have been proposed for choosing optimal cutpoints in diagnostic tests, depending on the underlying reason for this choice. This paper introduces an R package, known as OptimalCutpoints, for selecting optimal cutpoints in diagnostic tests. It incorporates criteria that take the costs of the different diagnostic decisions into account, as well as the prevalence of the target disease and several methods based on measures of diagnostic test accuracy. Moreover, it enables optimal levels to be calculated according to levels of given (categorical) covariates. While the numerical output includes the optimal cutpoint values and associated accuracy measures with their confidence intervals, the graphical output includes the receiver operating characteristic (ROC) and predictive ROC curves. An illustration of the use of OptimalCutpoints is provided, using a real biomedical dataset.
The progressive ageing in developed countries entails an increase in multimorbidity. Population-wide predictive models for adverse health outcomes are crucial to address these growing healthcare ...needs. The main objective of this study is to develop and validate a population-based prognostic model to predict the probability of unplanned hospitalization in the Basque Country, through comparing the performance of a logistic regression model and three families of machine learning models.
Using age, sex, diagnoses and drug prescriptions previously transformed by the Johns Hopkins Adjusted Clinical Groups (ACG) System, we predict the probability of unplanned hospitalization in the Basque Country (2.2 million inhabitants) using several techniques. When dealing with non-deterministic algorithms, comparing a single model per technique is not enough to choose the best approach. Thus, we conduct 40 experiments per family of models - Random Forest, Gradient Boosting Decision Trees and Multilayer Perceptrons - and compare them to Logistic Regression. Models' performance are compared both population-wide and for the 20,000 patients with the highest predicted probabilities, as a hypothetical high-risk group to intervene on.
The best-performing technique is Multilayer Perceptron, followed by Gradient Boosting Decision Trees, Logistic Regression and Random Forest. Multilayer Perceptrons also have the lowest variability, around an order of magnitude less than Random Forests. Median area under the ROC curve, average precision and positive predictive value range from 0.789 to 0.802, 0.237 to 0.257 and 0.485 to 0.511, respectively. For Brier Score the median values are 0.048 for all techniques. There is some overlap between the algorithms. For instance, Gradient Boosting Decision Trees perform better than Logistic Regression more than 75% of the time, but not always.
All models have good global performance. The only family that is consistently superior to Logistic Regression is Multilayer Perceptron, showing a very reliable performance with the lowest variability.
Insulin resistance has been associated with metabolic and hemodynamic alterations and higher cardio metabolic risk. There is great variability in the threshold homeostasis model assessment of insulin ...resistance (HOMA-IR) levels to define insulin resistance. The purpose of this study was to describe the influence of age and gender in the estimation of HOMA-IR optimal cut-off values to identify subjects with higher cardio metabolic risk in a general adult population.
It included 2459 adults (range 20-92 years, 58.4% women) in a random Spanish population sample. As an accurate indicator of cardio metabolic risk, Metabolic Syndrome (MetS), both by International Diabetes Federation criteria and by Adult Treatment Panel III criteria, were used. The effect of age was analyzed in individuals with and without diabetes mellitus separately. ROC regression methodology was used to evaluate the effect of age on HOMA-IR performance in classifying cardio metabolic risk.
In Spanish population the threshold value of HOMA-IR drops from 3.46 using 90th percentile criteria to 2.05 taking into account of MetS components. In non-diabetic women, but no in men, we found a significant non-linear effect of age on the accuracy of HOMA-IR. In non-diabetic men, the cut-off values were 1.85. All values are between 70th-75th percentiles of HOMA-IR levels in adult Spanish population.
The consideration of the cardio metabolic risk to establish the cut-off points of HOMA-IR, to define insulin resistance instead of using a percentile of the population distribution, would increase its clinical utility in identifying those patients in whom the presence of multiple metabolic risk factors imparts an increased metabolic and cardiovascular risk. The threshold levels must be modified by age in non-diabetic women.
Summary
Growth curve studies are typically conducted to evaluate differences between group or treatment‐specific curves. Most analyses focus solely on the growth curves, but it has been argued that ...the derivative of growth curves can highlight differences between groups that may be masked when considering the raw curves only. Motivated by the desire to estimate derivative curves hierarchically, we introduce a new sequence of quotient differences (empirical derivatives) which, among other things, are well behaved near the boundaries compared with other sequences in the literature. Using the sequence of quotient differences, we develop a Bayesian method to estimate curve derivatives in a multilevel setting (a common scenario in growth studies) and show how the method can be used to estimate individual and group derivative curves and to make comparisons. We apply the new methodology to data collected from a study conducted to explore the effect that radiation‐based therapies have on growth in female children diagnosed with acute lymphoblastic leukaemia.
High throughput phenotyping (HTP) platforms and devices are increasingly used for the characterization of growth and developmental processes for large sets of plant genotypes. Such HTP data require ...challenging statistical analyses in which longitudinal genetic signals need to be estimated against a background of spatio-temporal noise processes. We propose a two-stage approach for the analysis of such longitudinal HTP data. In a first stage, we correct for design features and spatial trends per time point. In a second stage, we focus on the longitudinal modelling of the spatially corrected data, thereby taking advantage of shared longitudinal features between genotypes and plants within genotypes. We propose a flexible hierarchical three-level P-spline growth curve model, with plants/plots nested in genotypes, and genotypes nested in populations. For selection of genotypes in a plant breeding context, we show how to extract new phenotypes, like growth rates, from the estimated genotypic growth curves and their first-order derivatives. We illustrate our approach on HTP data from the PhenoArch greenhouse platform at INRAE Montpellier and the outdoor Field Phenotyping platform at ETH Zürich.
•General plot-level model for repeated high-throughput field phenotyping measurements.•Extraction of three main intermediate trait categories for dynamic modelling.•Seamless processing approach that ...integrates temporal and spatial modelling.•Phenomics data processing cheatsheet.
Decision-making in breeding increasingly depends on the ability to capture and predict crop responses to changing environmental factors. Advances in crop modeling as well as high-throughput field phenotyping (HTFP) hold promise to provide such insights. Processing HTFP data is an interdisciplinary task that requires broad knowledge on experimental design, measurement techniques, feature extraction, dynamic trait modeling, and prediction of genotypic values using statistical models. To get an overview of sources of variation in HTFP, we develop a general plot-level model for repeated measurements. Based on this model, we propose a seamless step-wise procedure that allows for carry on of estimated means and variances from stage to stage. The process builds on the extraction of three intermediate trait categories; (1) timing of key stages, (2) quantities at defined time points or periods, and (3) dose-response curves. In a first stage, these intermediate traits are extracted from low-level traits’ time series (e.g., canopy height) using P-splines and the quarter of maximum elongation rate method (QMER), as well as final height percentiles. In a second and third stage, extracted traits are further processed using a stage-wise linear mixed model analysis. Using a wheat canopy growth simulation to generate canopy height time series, we demonstrate the suitability of the stage-wise process for traits of the first two above-mentioned categories. Results indicate that, for the first stage, the P-spline/QMER method was more robust than the percentile method. In the subsequent two-stage linear mixed model processing, weighting the second and third stage with error variance estimates from the previous stages improved the root mean squared error. We conclude that processing phenomics data in stages represents a feasible approach if estimated means and variances are carried forward from one processing stage to the next. P-splines in combination with the QMER method are suitable tools to extract timing of key stages and quantities at defined time points from HTFP data.