Optimal classification trees Bertsimas, Dimitris; Dunn, Jack
Machine learning,
07/2017, Letnik:
106, Številka:
7
Journal Article
Recenzirano
Odprti dostop
State-of-the-art decision tree methods apply heuristics recursively to create each split in isolation, which may not capture well the underlying characteristics of the dataset. The optimal decision ...tree problem attempts to resolve this by creating the entire decision tree at once to achieve global optimality. In the last 25 years, algorithmic advances in integer optimization coupled with hardware improvements have resulted in an astonishing 800 billion factor speedup in mixed-integer optimization (MIO). Motivated by this speedup, we present
optimal classification trees
, a novel formulation of the decision tree problem using modern MIO techniques that yields the optimal decision tree for axes-aligned splits. We also show the richness of this MIO formulation by adapting it to give
optimal classification trees with hyperplanes
that generates optimal decision trees with multivariate splits. Synthetic tests demonstrate that these methods recover the true decision tree more closely than heuristics, refuting the notion that optimal methods overfit the training data. We comprehensively benchmark these methods on a sample of 53 datasets from the UCI machine learning repository. We establish that these MIO methods are practically solvable on real-world datasets with sizes in the 1000s, and give average absolute improvements in out-of-sample accuracy over CART of 1–2 and 3–5% for the univariate and multivariate cases, respectively. Furthermore, we identify that optimal classification trees are likely to outperform CART by 1.2–1.3% in situations where the CART accuracy is high and we have sufficient training data, while the multivariate version outperforms CART by 4–7% when the CART accuracy or dimension of the dataset is low.
Abstract
Starch synthase III plays a key role in starch biosynthesis and is highly expressed in developing wheat grains. To understand the contribution of SSIII to starch and grain properties, we ...developed wheat
ssIIIa
mutants in the elite cultivar Cadenza using in silico TILLING in a mutagenized population. SSIIIa protein was undetectable by immunoblot analysis in triple
ssIIIa
mutants carrying mutations in each homoeologous copy of
ssIIIa
(A, B and D). Loss of SSIIIa in triple mutants led to significant changes in starch phenotype including smaller A-type granules and altered granule morphology. Starch chain-length distributions of double and triple mutants indicated greater levels of amylose than sibling controls (33.8% of starch in triple mutants, and 29.3% in double mutants vs. 25.5% in sibling controls) and fewer long amylopectin chains. Wholemeal flour of triple mutants had more resistant starch (6.0% vs. 2.9% in sibling controls) and greater levels of non-starch polysaccharides; the grains appeared shrunken and weighed ~ 11% less than the sibling control which was partially explained by loss in starch content. Interestingly, our study revealed gene dosage effects which could be useful for fine-tuning starch properties in wheat breeding applications while minimizing impact on grain weight and quality.
Existing machine learning approaches for data-driven predictive maintenance are usually black boxes that claim high predictive power yet cannot be understood by humans. This limits the ability of ...humans to use these models to derive insights and understanding of the underlying failure mechanisms, and also limits the degree of confidence that can be placed in such a system to perform well on future data. We consider the task of predicting hard drive failure in a data center using recent algorithms for interpretable machine learning. We demonstrate that these methods provide meaningful insights about short- and long-term drive health, while also maintaining high predictive performance. We also show that these analyses still deliver useful insights even when limited historical data is available, enabling their use in situations where data collection has only recently begun.
•Optimal Trees predict long-term and short-term failures for hard drives accurately.•Discovers how different factors interact and impact drive health non-linearly.•The tree models are interpretable, transparent and can be validated.•This method is efficient even in low data availability using censoring.
Highlights • LLIF is a newer technique that can be applied to many lumbar spine pathologies. • Transpsoas method may be biomechanically favorable by avoiding ligament disruption. • Stand-alone MIS ...LLIF is an option for treating ASD of the lumbar spine.
Five cultivars of bread wheat and spelt and three of emmer were grown in replicate randomised field trials on two sites for two years with 100 and 200 kg nitrogen fertiliser per hectare, reflecting ...low input and intensive farming systems. Wholemeal flours were analysed for components that are suggested to contribute to a healthy diet. The ranges of all components overlapped between the three cereal types, reflecting the effects of both genotype and environment. Nevertheless, statistically significant differences in the contents of some components were observed. Notably, emmer and spelt had higher contents of protein, iron, zinc, magnesium, choline and glycine betaine, but also of asparagine (the precursor of acrylamide) and raffinose. By contrast, bread wheat had higher contents of the two major types of fibre, arabinoxylan (AX) and β-glucan, than emmer and a higher AX content than spelt. Although such differences in composition may be suggested to result in effects on metabolic parameters and health when studied in isolation, the final effects will depend on the quantity consumed and the composition of the overall diet.
We propose an approach for learning optimal tree-based prescription policies directly from data, combining methods for counterfactual estimation from the causal inference literature with recent ...advances in training globally-optimal decision trees. The resulting method, Optimal Policy Trees, yields interpretable prescription policies, is highly scalable, and handles both discrete and continuous treatments. We conduct extensive experiments on both synthetic and real-world datasets and demonstrate that these trees offer best-in-class performance across a wide variety of problems.
Optimal survival trees Bertsimas, Dimitris; Dunn, Jack; Gibson, Emma ...
Machine learning,
08/2022, Letnik:
111, Številka:
8
Journal Article
Recenzirano
Odprti dostop
Tree-based models are increasingly popular due to their ability to identify complex relationships that are beyond the scope of parametric models. Survival tree methods adapt these models to allow for ...the analysis of censored outcomes, which often appear in medical data. We present a new Optimal Survival Trees algorithm that leverages mixed-integer optimization (MIO) and local search techniques to generate globally optimized survival tree models. We demonstrate that the OST algorithm improves on the accuracy of existing survival tree methods, particularly in large datasets.
Most risk assessment tools assume that the impact of risk factors is linear and cumulative. Using novel machine-learning techniques, we sought to design an interactive, nonlinear risk calculator for ...Emergency Surgery (ES).
All ES patients in the American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) 2007 to 2013 database were included (derivation cohort). Optimal Classification Trees (OCT) were leveraged to train machine-learning algorithms to predict postoperative mortality, morbidity, and 18 specific complications (eg, sepsis, surgical site infection). Unlike classic heuristics (eg, logistic regression), OCT is adaptive and reboots itself with each variable, thus accounting for nonlinear interactions among variables. An application Predictive OpTimal Trees in Emergency Surgery Risk (POTTER) was then designed as the algorithms' interactive and user-friendly interface. POTTER performance was measured (c-statistic) using the 2014 ACS-NSQIP database (validation cohort) and compared with the American Society of Anesthesiologists (ASA), Emergency Surgery Score (ESS), and ACS-NSQIP calculators' performance.
Based on 382,960 ES patients, comprehensive decision-making algorithms were derived, and POTTER was created where the provider's answer to a question interactively dictates the subsequent question. For any specific patient, the number of questions needed to predict mortality ranged from 4 to 11. The mortality c-statistic was 0.9162, higher than ASA (0.8743), ESS (0.8910), and ACS (0.8975). The morbidity c-statistics was similarly the highest (0.8414).
POTTER is a highly accurate and user-friendly ES risk calculator with the potential to continuously improve accuracy with ongoing machine-learning. POTTER might prove useful as a tool for bedside preoperative counseling of ES patients and families.