Predictive modelling of academic success and retention has been a key research theme in Learning Analytics. While the initial work on predictive modelling was focused on the development of general ...predictive models, portable across different learning settings, later studies demonstrated the drawbacks of not considering the specificities of course design and disciplinary context. This study builds on the methods and findings of related earlier studies to further explore factors predictive of learners' academic success in blended learning. In doing so, it differentiates itself by (i) relying on a larger and homogeneous course sample (15 courses, 50 course offerings in total), and (ii) considering both internal and external conditions as factors affecting the learning process. We apply mixed effect linear regression models, to examine: i) to what extent indicators of students' online learning behaviour can explain the variability in the final grades, and ii) to what extent that variability is attributable to the course and students' internal conditions, not captured by the logged data. Having examined different types of behaviour indicators (e.g., indicators of the overall activity level, those indicative of regularity of study, etc), we found little difference, if any, in their predictive power. Our results further indicate that a low proportion of variance is explained by the behaviour-based indicators, while a significant portion of variability stems from the learners' internal conditions. Hence, when variability in external conditions is largely controlled for (the same institution, discipline, and nominal pedagogical model), students’ internal state is the key predictor of their course performance.
•Low portability of trace-based student success predictors across homogenous courses.•Low predictive power of online learning behaviour indicators in blended courses.•Activity level and regularity of study indicators have comparable predictive power.•Students' internal state can explain a large portion of variance in the course grades.•In similar course settings, study success relates most to students' internal state.
Can language models automate data wrangling? Jaimovitch-López, Gonzalo; Ferri, Cèsar; Hernández-Orallo, José ...
Machine learning,
06/2023, Letnik:
112, Številka:
6
Journal Article
Recenzirano
Odprti dostop
The automation of data science and other data manipulation processes depend on the integration and formatting of ‘messy’ data. Data wrangling is an umbrella term for these tedious and time-consuming ...tasks. Tasks such as transforming dates, units or names expressed in different formats have been challenging for machine learning because (1) users expect to solve them with short cues or few examples, and (2) the problems depend heavily on domain knowledge. Interestingly, large language models today (1) can infer from very few examples or even a short clue in natural language, and (2) can integrate vast amounts of domain knowledge. It is then an important research question to analyse whether language models are a promising approach for data wrangling, especially as their capabilities continue growing. In this paper we apply different variants of the language model Generative Pre-trained Transformer (GPT) to five batteries covering a wide range of data wrangling problems. We compare the effect of prompts and few-shot regimes on their results and how they compare with specialised data wrangling systems and other tools. Our major finding is that they appear as a powerful tool for a wide range of data wrangling tasks. We provide some guidelines about how they can be integrated into data processing pipelines, provided the users can take advantage of their flexibility and the diversity of tasks to be addressed. However, reliability is still an important issue to overcome.
We are moving from the Internet era into the AI era. We are living in a world where we have the opportunity to harness the benefit of data. Digitization will allow one to collect more data than we ...can imagine, but what are the real potentials and challenges we may face when we embark on this data-centric journey? This talk aims to look at a data-centric world, the potential and challenges when one wish to use data to solve problems, and how we can leverage on data to achieve a sustainable future.
Stem cell (SC) is a crucial factor of the human organ that is significantly important for clinical solutions. However, consideration of SC in the therapeutic or disease classification process is ...complex in terms of accurate classification and prediction. To overcome this issue, Machine learning (ML) is the most effective technique that is frequently used in cell-based clinical applications for diagnosis, treatment, and disease identification. Recently it has been implemented for SC observation which is a crucial factor for clinical solutions. Thus, the objective of this review work is to represent the effectiveness of ML techniques for SC observation from clinical perspectives with current challenges and future direction for further improvement.Background and ObjectiveStem cell (SC) is a crucial factor of the human organ that is significantly important for clinical solutions. However, consideration of SC in the therapeutic or disease classification process is complex in terms of accurate classification and prediction. To overcome this issue, Machine learning (ML) is the most effective technique that is frequently used in cell-based clinical applications for diagnosis, treatment, and disease identification. Recently it has been implemented for SC observation which is a crucial factor for clinical solutions. Thus, the objective of this review work is to represent the effectiveness of ML techniques for SC observation from clinical perspectives with current challenges and future direction for further improvement.In this study, we conducted a short review of ML-based applications in SCs investigation and classification for the improvement of clinical solutions. We explored studies from five scientific databases (Web of Science, Google Scholar, Scopus, ScienceDirect, and PubMed) with several keywords related to the objective of our research study. After primary and secondary screening, 15 articles were utilized for this research study and summarized the observation results in terms of ten aspects (year of publication, focused area, objective, experimented datasets, selected ML classifiers, experimental procedure, classification parameter, overall performance in terms of accuracy, advancements, and limitations) with their current limitations and future improvement directions.MethodsIn this study, we conducted a short review of ML-based applications in SCs investigation and classification for the improvement of clinical solutions. We explored studies from five scientific databases (Web of Science, Google Scholar, Scopus, ScienceDirect, and PubMed) with several keywords related to the objective of our research study. After primary and secondary screening, 15 articles were utilized for this research study and summarized the observation results in terms of ten aspects (year of publication, focused area, objective, experimented datasets, selected ML classifiers, experimental procedure, classification parameter, overall performance in terms of accuracy, advancements, and limitations) with their current limitations and future improvement directions.The majority of the existing literature review works are limited to focusing on specific SC-based investigation, limited evaluation attributes, and lack of challenges and future improvement suggestions. Also, most of the review work didn't consider the investigation of the effectiveness of the ML technique in SC biology. Therefore, in this paper, we investigate existing literature related to the development of clinical solutions considering ML techniques, in the area of SC and cell culture processes and highlight current challenges and future directions.Key Content and FindingsThe majority of the existing literature review works are limited to focusing on specific SC-based investigation, limited evaluation attributes, and lack of challenges and future improvement suggestions. Also, most of the review work didn't consider the investigation of the effectiveness of the ML technique in SC biology. Therefore, in this paper, we investigate existing literature related to the development of clinical solutions considering ML techniques, in the area of SC and cell culture processes and highlight current challenges and future directions.The majority of studies focused on the disease identification process and implemented the convolutional neural network and support vector machine techniques. The prime limitations of the investigated studies are related to the focused area, investigated SCs, the small number of experimental datasets, and validation techniques. None of the studies provided complete evidence to determine an optimal ML technique for SC to build classification or predictive models. Therefore, further concern is required to develop and improve the developed solutions including other ML techniques, large datasets, and advanced evaluation processes.ConclusionsThe majority of studies focused on the disease identification process and implemented the convolutional neural network and support vector machine techniques. The prime limitations of the investigated studies are related to the focused area, investigated SCs, the small number of experimental datasets, and validation techniques. None of the studies provided complete evidence to determine an optimal ML technique for SC to build classification or predictive models. Therefore, further concern is required to develop and improve the developed solutions including other ML techniques, large datasets, and advanced evaluation processes.
This survey is an updated and improved version of the previous one published in 2013 in this journal with the title “data mining in education”. It reviews in a comprehensible and very general way how ...Educational Data Mining and Learning Analytics have been applied over educational data. In the last decade, this research area has evolved enormously and a wide range of related terms are now used in the bibliography such as Academic Analytics, Institutional Analytics, Teaching Analytics, Data‐Driven Education, Data‐Driven Decision‐Making in Education, Big Data in Education, and Educational Data Science. This paper provides the current state of the art by reviewing the main publications, the key milestones, the knowledge discovery cycle, the main educational environments, the specific tools, the free available datasets, the most used methods, the main objectives, and the future trends in this research area.
This article is categorized under:
Application Areas > Education and Learning
Educational Data Mining and Learning Analytics.