In Germany (similar to other countries), 30 % of students demonstrate insufficient spelling skills at the end of primary school – partly owing to the challenge for teachers to manage a variety of ...students' learning needs. Digital tools using Machine Learning can enable teachers to individualise students' learning. However, there are still no suitable approaches for demographics of students who are not yet proficient in spelling.
With an aim to adapt Machine Learning for students of all proficiencies, we investigate how accurately specific spelling errors can be predicted across different skill levels, and what the content-related reasons for incorrect predictions are.
To that end, we developed a web application to record the spelling efforts of N = 685 first- and second-graders in Bavaria, Germany. A total of 18,133 different misspellings were recorded. Using this dataset, we trained six Machine Learning models and compared their performances to predict misspellings.
Comparing all Machine Learning models employed in this work, the Random Forest performed best on average as a predictor of spelling errors. Errors at the syllable- and morpheme-levels were predicted best, and errors at the basic phoneme-grapheme-level were predicted slightly less accurately. Confusions often concerned cases that are considered linguistically ambiguous or occurred in complex error entanglements. The implications of these results are discussed.
•We propose a tailored machine learning algorithm to help students learn spelling.•The machine learning algorithm can classify specific errors with high accuracy.•The machine learning algorithm can help to individualise spelling classes.•The approach can be useful for analysing specific spelling deficiencies.
Most machine learning algorithms are configured by a set of hyperparameters whose values must be carefully chosen and which often considerably impact performance. To avoid a time‐consuming and ...irreproducible manual process of trial‐and‐error to find well‐performing hyperparameter configurations, various automatic hyperparameter optimization (HPO) methods—for example, based on resampling error estimation for supervised machine learning—can be employed. After introducing HPO from a general perspective, this paper reviews important HPO methods, from simple techniques such as grid or random search to more advanced methods like evolution strategies, Bayesian optimization, Hyperband, and racing. This work gives practical recommendations regarding important choices to be made when conducting HPO, including the HPO algorithms themselves, performance evaluation, how to combine HPO with machine learning pipelines, runtime improvements, and parallelization.
This article is categorized under:
Algorithmic Development > Statistics
Technologies > Machine Learning
Technologies > Prediction
After a general introduction of hyperparameter optimization, we review important HPO methods such as grid or random search, evolutionary algorithms, Bayesian optimization, Hyperband and racing. We include many practical recommendations w.r.t. performance evaluation, how to combine HPO with ML pipelines, runtime improvements and parallelization.
Machine learning for the educational sciences Hilbert, Sven; Coors, Stefan; Kraus, Elisabeth ...
Review of education (Oxford),
October 2021, 2021-10-00, Letnik:
9, Številka:
3
Journal Article
Recenzirano
Odprti dostop
Machine learning (ML) provides a powerful framework for the analysis of high‐dimensional datasets by modelling complex relationships, often encountered in modern data with many variables, cases and ...potentially non‐linear effects. The impact of ML methods on research and practical applications in the educational sciences is still limited, but continuously grows, as larger and more complex datasets become available through massive open online courses (MOOCs) and large‐scale investigations. The educational sciences are at a crucial pivot point, because of the anticipated impact ML methods hold for the field. To provide educational researchers with an elaborate introduction to the topic, we provide an instructional summary of the opportunities and challenges of ML for the educational sciences, show how a look at related disciplines can help learning from their experiences, and argue for a philosophical shift in model evaluation. We demonstrate how the overall quality of data analysis in educational research can benefit from these methods and show how ML can play a decisive role in the validation of empirical models. Specifically, we (1) provide an overview of the types of data suitable for ML and (2) give practical advice for the application of ML methods. In each section, we provide analytical examples and reproducible R code. Also, we provide an extensive Appendix on ML‐based applications for education. This instructional summary will help educational scientists and practitioners to prepare for the promises and threats that come with the shift towards digitisation and large‐scale assessment in education.
Context and implications
Rationale for this study
In 2020, the worldwide SARS‐COV‐2 pandemic forced the educational sciences to perform a rapid paradigm shift with classrooms going online around the world—a hardly novel but now strongly catalysed development. In the context of data‐driven education, this paper demonstrates that the widespread adoption of machine learning techniques is central for the educational sciences and shows how these methods will become crucial tools in the collection and analysis of data and in concrete educational applications. Helping to leverage the opportunities and to avoid the common pitfalls of machine learning, this paper provides educators with the theoretical, conceptual and practical essentials.
Why the new findings matter
The process of teaching and learning is complex, multifaceted and dynamic. This paper contributes a seminal resource to highlight the digitisation of the educational sciences by demonstrating how new machine learning methods can be effectively and reliably used in research, education and practical application.
Implications for educational researchers and policy makers
The progressing digitisation of societies around the globe and the impact of the SARS‐COV‐2 pandemic have highlighted the vulnerabilities and shortcomings of educational systems. These developments have shown the necessity to provide effective educational processes that can support sometimes overwhelmed teachers to digitally impart knowledge on the plan of many governments and policy makers. Educational scientists, corporate partners and stakeholders can make use of machine learning techniques to develop advanced, scalable educational processes that account for individual needs of learners and that can complement and support existing learning infrastructure. The proper use of machine learning methods can contribute essential applications to the educational sciences, such as (semi‐)automated assessments, algorithmic‐grading, personalised feedback and adaptive learning approaches. However, these promises are strongly tied to an at least basic understanding of the concepts of machine learning and a degree of data literacy, which has to become the standard in education and the educational sciences.
Demonstrating both the promises and the challenges that are inherent to the collection and the analysis of large educational data with machine learning, this paper covers the essential topics that their application requires and provides easy‐to‐follow resources and code to facilitate the process of adoption.
Optimizing a machine learning (ML) pipeline for radiomics analysis involves numerous choices in data set composition, preprocessing, and model selection. Objective identification of the optimal setup ...is complicated by correlated features, interdependency structures, and a multitude of available ML algorithms. Therefore, we present a radiomics-based benchmarking framework to optimize a comprehensive ML pipeline for the prediction of overall survival. This study is conducted on an image set of patients with hepatic metastases of colorectal cancer, for which radiomics features of the whole liver and of metastases from computed tomography images were calculated. A mixed model approach was used to find the optimal pipeline configuration and to identify the added prognostic value of radiomics features.
In this study, a large-scale ML benchmark pipeline consisting of preprocessing, feature selection, dimensionality reduction, hyperparameter optimization, and training of different models was developed for radiomics-based survival analysis. Portal-venous computed tomography imaging data from a previous prospective randomized trial evaluating radioembolization of liver metastases of colorectal cancer were quantitatively accessible through a radiomics approach. One thousand two hundred eighteen radiomics features of hepatic metastases and the whole liver were calculated, and 19 clinical parameters (age, sex, laboratory values, and treatment) were available for each patient. Three ML algorithms-a regression model with elastic net regularization (glmnet), a random survival forest (RSF), and a gradient tree-boosting technique (xgboost)-were evaluated for 5 combinations of clinical data, tumor radiomics, and whole-liver features. Hyperparameter optimization and model evaluation were optimized toward the performance metric integrated Brier score via nested cross-validation. To address dependency structures in the benchmark setup, a mixed-model approach was developed to compare ML and data configurations and to identify the best-performing model.
Within our radiomics-based benchmark experiment, 60 ML pipeline variations were evaluated on clinical data and radiomics features from 491 patients. Descriptive analysis of the benchmark results showed a preference for RSF-based pipelines, especially for the combination of clinical data with radiomics features. This observation was supported by the quantitative analysis via a linear mixed model approach, computed to differentiate the effect of data sets and pipeline configurations on the resulting performance. This revealed the RSF pipelines to consistently perform similar or better than glmnet and xgboost. Further, for the RSF, there was no significantly better-performing pipeline composition regarding the sort of preprocessing or hyperparameter optimization.
Our study introduces a benchmark framework for radiomics-based survival analysis, aimed at identifying the optimal settings with respect to different radiomics data sources and various ML pipeline variations, including preprocessing techniques and learning algorithms. A suitable analysis tool for the benchmark results is provided via a mixed model approach, which showed for our study on patients with intrahepatic liver metastases, that radiomics features captured the patients' clinical situation in a manner comparable to the provided information solely from clinical parameters. However, we did not observe a relevant additional prognostic value obtained by these radiomics features.
Hyperparameter optimization constitutes a large part of typical modern machine learning (ML) workflows. This arises from the fact that ML methods and corresponding preprocessing steps often only ...yield optimal performance when hyperparameters are properly tuned. But in many applications, we are not only interested in optimizing ML pipelines solely for predictive accuracy; additional metrics or constraints must be considered when determining an optimal configuration, resulting in a multi-objective optimization problem. This is often neglected in practice, due to a lack of knowledge and readily available software implementations for multi-objective hyperparameter optimization. In this work, we introduce the reader to the basics of multi-objective hyperparameter optimization and motivate its usefulness in applied ML. Furthermore, we provide an extensive survey of existing optimization strategies from the domains of evolutionary algorithms and Bayesian optimization. We illustrate the utility of multi-objective optimization in several specific ML applications, considering objectives such as operating conditions, prediction time, sparseness, fairness, interpretability, and robustness.
The intentional distortion of test results presents a fundamental problem to self-report-based psychiatric assessment, such as screening for depressive symptoms. The first objective of the study was ...to clarify whether depressed patients like healthy controls possess both the cognitive ability and motivation to deliberately influence results of commonly used screening measures. The second objective was the construction of a method derived directly from within the test takers’ responses to systematically detect faking behavior. Supervised machine learning algorithms posit the potential to empirically learn the implicit interconnections between responses, which shape detectable faking patterns. In a standardized design, faking bad and faking good were experimentally induced in a matched sample of 150 depressed and 150 healthy subjects. Participants completed commonly used questionnaires to detect depressive and associated symptoms. Group differences throughout experimental conditions were evaluated using linear mixed-models. Machine learning algorithms were trained on the test results and compared regarding their capacity to systematically predict distortions in response behavior in two scenarios: (1) differentiation of authentic patient responses from simulated responses of healthy participants; (2) differentiation of authentic patient responses from dissimulated patient responses. Statistically significant convergence of the test scores in both faking conditions suggests that both depressive patients and healthy controls have the cognitive ability as well as the motivational compliance to alter their test results. Evaluation of the algorithmic capability to detect faking behavior yielded ideal predictive accuracies of up to 89%. Implications of the findings, as well as future research objectives are discussed.
Trial Registration
The study was pre-registered at the German registry for clinical trials (Deutsches Register klinischer Studien, DRKS; DRKS00007708).
In practice, machine learning (ML) workflows require various different steps, from data preprocessing, missing value imputation, model selection, to model tuning as well as model evaluation. Many of ...these steps rely on human ML experts. AutoML - the field of automating these ML pipelines - tries to help practitioners to apply ML off-the-shelf without any expert knowledge. Most modern AutoML systems like auto-sklearn, H20-AutoML or TPOT aim for high predictive performance, thereby generating ensembles that consist almost exclusively of black-box models. This, in turn, makes the interpretation for the layperson more intricate and adds another layer of opacity for users. We propose an AutoML system that constructs an interpretable additive model that can be fitted using a highly scalable componentwise boosting algorithm. Our system provides tools for easy model interpretation such as visualizing partial effects and pairwise interactions, allows for a straightforward calculation of feature importance, and gives insights into the required model complexity to fit the given task. We introduce the general framework and outline its implementation autocompboost. To demonstrate the frameworks efficacy, we compare autocompboost to other existing systems based on the OpenML AutoML-Benchmark. Despite its restriction to an interpretable model space, our system is competitive in terms of predictive performance on most data sets while being more user-friendly and transparent.
AutoML systems are currently rising in popularity, as they can build powerful models without human oversight. They often combine techniques from many different sub-fields of machine learning in order ...to find a model or set of models that optimize a user-supplied criterion, such as predictive performance. The ultimate goal of such systems is to reduce the amount of time spent on menial tasks, or tasks that can be solved better by algorithms while leaving decisions that require human intelligence to the end-user. In recent years, the importance of other criteria, such as fairness and interpretability, and many others have become more and more apparent. Current AutoML frameworks either do not allow to optimize such secondary criteria or only do so by limiting the system's choice of models and preprocessing steps. We propose to optimize additional criteria defined by the user directly to guide the search towards an optimal machine learning pipeline. In order to demonstrate the need and usefulness of our approach, we provide a simple multi-criteria AutoML system and showcase an exemplary application.
AMLB: an AutoML Benchmark Gijsbers, Pieter; Bueno, Marcos L P; Coors, Stefan ...
arXiv.org,
11/2023
Paper, Journal Article
Odprti dostop
Comparing different AutoML frameworks is notoriously challenging and often done incorrectly. We introduce an open and extensible benchmark that follows best practices and avoids common mistakes when ...comparing AutoML frameworks. We conduct a thorough comparison of 9 well-known AutoML frameworks across 71 classification and 33 regression tasks. The differences between the AutoML frameworks are explored with a multi-faceted analysis, evaluating model accuracy, its trade-offs with inference time, and framework failures. We also use Bradley-Terry trees to discover subsets of tasks where the relative AutoML framework rankings differ. The benchmark comes with an open-source tool that integrates with many AutoML frameworks and automates the empirical evaluation process end-to-end: from framework installation and resource allocation to in-depth evaluation. The benchmark uses public data sets, can be easily extended with other AutoML frameworks and tasks, and has a website with up-to-date results.