MOOCs (Massive Open Online Courses) have usually high dropout rates. Many articles have proposed predictive models in order to early detect learners at risk to alleviate this issue. Nevertheless, ...existing models do not consider complex high-level variables, such as self-regulated learning (SRL) strategies, which can have an important effect on learners' success. In addition, predictions are often carried out in instructor-paced MOOCs, where contents are released gradually, but not in self-paced MOOCs, where all materials are available from the beginning and users can enroll at any time. For self-paced MOOCs, existing predictive models are limited in the way they deal with the flexibility offered by the course start date, which is learner dependent. Therefore, they need to be adapted so as to predict with little information short after each learner starts engaging with the MOOC. To solve these issues, this paper contributes with the study of how SRL strategies could be included in predictive models for self-paced MOOCs. Particularly, self-reported and event-based SRL strategies are evaluated and compared to measure their effect for dropout prediction. Also, the paper contributes with a new methodology to analyze self-paced MOOCs when carrying out a temporal analysis to discover how early prediction models can serve to detect learners at risk. Results of this article show that event-based SRL strategies show a very high predictive power, although variables related to learners' interactions with exercises are still the best predictors. That is, event-based SRL strategies can be useful to predict if e.g., variables related to learners' interactions with exercises are not available. Furthermore, results show that this methodology serves to achieve early powerful predictions from about 25 to 33% of the theoretical course duration. The proposed methodology presents a new approach to predict dropouts in self-paced MOOCs, considering complex variables that go beyond the classic trace-data directly captured by the MOOC platforms.
•Event-based self-regulated learning (SRL) strategies are good predictors of dropout.•Self-reported SRL strategies are useless to predict dropout.•A new approach is proposed for prediction in self-paced MOOCs.•It is possible to achieve powerful predictions from 25 to 33% of the course duration.
In clinical practice, with its time constraints, a frequent conclusion is that asking about the ability to smell may suffice to detect olfactory problems. To address this question systematically, ...6049 subjects were asked about how well they can perceive odors, with 5 possible responses. Participants presented at a University Department of Otorhinolaryngology, where olfactory testing was part of the routine investigation performed in patients receiving surgery at the clinic (for various reasons). According to an odor identification test, 1227 subjects had functional anosmia and 3113 were labeled with normosmia. Measures of laboratory test performance were used to assess the success of self-estimates to capture the olfactory diagnosis. Ratings of the olfactory function as absent or impaired provided the diagnosis of anosmia at a balanced accuracy of 79%, whereas ratings of good or excellent indicated normosmia at a balanced accuracy of 64.6%. The number of incorrect judgments of anosmia increased with age, whereas false negative self-estimates of normosmia became rarer with increasing age. The subject's sex was irrelevant in this context. Thus, when asking the question "How well can you smell odors?" and querying standardized responses, fairly accurate information can be obtained about whether or not the subject can smell. However, this has to be completed with the almost 30% (355 subjects) of anosmic patients who judged their ability to smell as at least "average." Thus, olfactory testing using reliable and validated tests appears indispensable.
When discussing research in physics and in science more generally, it is common to ascribe equal importance to the three components of the scientific trinity: theoretical, experimental, and ...computational studies. This review will explore the future of modern turbulence theory by tracing its history, which began in earnest with Kolmogorov’s 1941 analysis of turbulence cascade and inertial range A.N. Kolmogorov, Dokl. Akad. Nauk SSSR, 30, 299, (1941); 32, 19, (1941). The 80th Anniversary of Kolmogorov’s landmark study is a welcome opportunity to survey the achievements and evaluate the future of the theoretical approach of turbulence research. Over the years, turbulence theories have been critically important in laying the foundation of our understanding of the nature of turbulent flows. In particular, the Direct Interaction Approximation (DIA) R.H. Kraichnan, J. Fluid Mech., 5, 497 (1959) and its subsequent development, known as the statistical closure approach, can be identified as perhaps the most profound single advancement. The remarkable success of the statistical closure has furnished a platform to study such essential concepts as the energy transfer process and interacting scales, and the roles of the straining and sweeping motions. More recently, the quasi-Lagrangian formulation of V. L’vov & I. Procaccia and Kraichnan’s solvable passive scalar model provided powerful ways to explore another fundamental aspect of turbulent flows, the phenomena of intermittency, and the associated anomalous scaling exponents. In the meantime, the theory of fluid equilibria has been developed to describe the large-scale structures that can emerge from turbulent cascades of two-dimensional and geophysical flows at a later time. And yet, despite all these successes, analytical treatments suffer from mathematical complexities. As a result, the utility of theoretical approaches has been limited to relatively idealized flows. On the other hand, in recent decades, computational abilities and experimental facilities have reached an unprecedented scale. Looking beyond the horizon, the imminent deployment of exascale supercomputers will generate complete datasets of the entire flow field of key benchmark flows, allowing researchers to extract additional measurements concerning fully developed, complex turbulent flow fields far beyond those available from the statistical closure theories. Some other developments that could potentially influence the future course of turbulence theories include the advancement of machine learning, artificial intelligence, and data science; likely disruptions arising from the advent of quantum computation; and the increasingly prominent role of turbulence research in providing more accurate climate scientific data. Turbulence theorists can leverage these developments by asking the right questions and developing advanced, sophisticated frameworks that will be able to predict and correlate vast amounts of data from the other two components of the trinity.
In the United States, state and local agencies administering government assistance programs have in their administrative data a powerful resource for policy analysis to inform evaluation and guide ...improvement of their programs. Understanding different aspects of their administrative data quality is critical for agencies to conduct such analyses and to improve their data for future use. However, state and local agencies often lack the resources and training for staff to conduct rigorous evaluations of data quality. We describe our efforts in developing tools that can be used to assess data quality as well as the challenges encountered in constructing these tools. The toolkit focuses on critical dimensions of quality for analyzing an administrative dataset, including checks on data accuracy, the completeness of the records, and the comparability of the data over time and among subgroups of interest. State and local administrative databases often include a longitudinal component which our toolkit also aims to exploit to help evaluate data quality. In addition, we incorporate data visualization to draw attention to sets of records or variables that contain outliers or for which quality may be a concern. While we seek to develop general tools for common data quality analyses, most administrative datasets have particularities that can benefit from a customized analysis building on our toolkit.
As one of the most desired skills for contemporary education and career, problem-solving is fundamental and critical in game-based learning research. However, students' implicit and self-controlled ...learning processes in games make it difficult to understand their problem-solving behaviors. Observational and qualitative methods, such as interviews and exams, fail to capture students' in-process difficulties. By integrating data mining techniques, this study explored students' problem-solving processes in a puzzle-based game. First, we applied the Continuous Hidden Markov Model to identify students' problem-solving phases and the transition probabilities between these phases. Second, we employed sequence mining techniques to investigate problem-solving patterns and strategies facilitating students' problem-solving processes. The results suggested that most students were stuck in certain phases, with only a few able to transfer to systematic phases by applying efficient strategies. At the beginning of the puzzle, the most popular strategy was testing one dimension of the solution at each attempt. In contrast, the other two strategies (remove or add untested dimensions one by one) played pivotal roles in promoting transitions to higher problem-solving phases. The findings of this study shed light on when, how, and why students advanced their effective problem-solving processes. Using the Continuous Hidden Markov Model and sequence mining techniques, we provide considerable promise for uncovering students' problem-solving processes, which helps trigger future scaffolds and interventions to support students’ personalized learning in game-based learning environments.
•Data mining approaches to understand students' gameplay behaviors.•Uncovering students' problem-solving processes in game-based learning environments.•Exploring students' problem-solving phases and strategies from their complex learning actions.
Based on a literature review, we present a framework for structuring the application of graph theory in the library domain. Our goal is to provide both researchers and libraries with a standard tool ...to classify scientific work, at the same time allowing for the identification of previously underrepresented areas where future research might be productive. To achieve this, we compile graph theoretical approaches from the literature to consolidate the components of our framework on a solid basis. The extendable framework consists of multiple facets grouped into five categories whose elements can be arbitrarily combined. Libraries can benefit from these facets by using them as a point of reference for the (meta)data they offer. Further work on formally defining the framework’s categories as well as on integration of other graph-related research areas not discussed in this article (e.g. knowledge graphs) would be desirable and helpful in the future.
Data science can be incorporated into every stage of a scientific study. Here we describe how data science can be used to generate hypotheses, to design experiments, to perform experiments, and to ...analyse data. We also present our vision for how data science techniques will be an integral part of the laboratory of the future.
Artificial intelligence (AI) and machine learning (ML) are expanding in popularity for broad applications to challenging tasks in chemistry and materials science. Examples include the prediction of ...properties, the discovery of new reaction pathways, or the design of new molecules. The machine needs to read and write fluently in a chemical language for each of these tasks. Strings are a common tool to represent molecular graphs, and the most popular molecular string representation, Smiles, has powered cheminformatics since the late 1980s. However, in the context of AI and ML in chemistry, Smiles has several shortcomings—most pertinently, most combinations of symbols lead to invalid results with no valid chemical interpretation. To overcome this issue, a new language for molecules was introduced in 2020 that guarantees 100% robustness: SELF-referencing embedded string (Selfies). Selfies has since simplified and enabled numerous new applications in chemistry. In this perspective, we look to the future and discuss molecular string representations, along with their respective opportunities and challenges. We propose 16 concrete future projects for robust molecular representations. These involve the extension toward new chemical domains, exciting questions at the interface of AI and robust languages, and interpretability for both humans and machines. We hope that these proposals will inspire several follow-up works exploiting the full potential of molecular string representations for the future of AI in chemistry and materials science.
Artificial intelligence for the discovery of new functional molecules can bring enormous societal and technological progress. Here, one crucial question is how to write molecules such that computers can easily process them. In this perspective, we analyze Selfies, a relatively young method for representing molecules in a computer. Since its invention 2 years ago, Selfieshas since simplified and enabled numerous workflows for artificial intelligence (AI) in chemistry and material science.
We take an in-depth look into the future of Selfies and molecular string representations. We detail 16 new future research directions, ranging from new AI applications in chemistry, to the development of robust languages for large chemical domains, to questions about the readability of different chemical languages for humans and machines. Thereby, we hope to open a myriad of exciting doors with consequences in materials science and beyond.
This community paper discusses Selfies, a relatively new representation for molecules at the computer. Selfieswas developed to overcome critical issues concerning the robustness of previously state-of-the-art representations in artificial intelligence applications. We overview the history of molecular string representations and the applications of Selfieswithin the last 2 years. We point out 16 concrete future research directions that will hopefully inspire the community and push ideas of robust representations in the realm of artificial intelligence and machine learning.
The mental lexicon is a complex cognitive system representing information about the words/concepts that one knows. Over decades psychological experiments have shown that conceptual associations ...across multiple, interactive cognitive levels can greatly influence word acquisition, storage, and processing. How can semantic, phonological, syntactic, and other types of conceptual associations be mapped within a coherent mathematical framework to study how the mental lexicon works? Here we review cognitive multilayer networks as a promising quantitative and interpretative framework for investigating the mental lexicon. Cognitive multilayer networks can map multiple types of information at once, thus capturing how different layers of associations might co-exist within the mental lexicon and influence cognitive processing. This review starts with a gentle introduction to the structure and formalism of multilayer networks. We then discuss quantitative mechanisms of psychological phenomena that could not be observed in single-layer networks and were only unveiled by combining multiple layers of the lexicon: (i) multiplex viability highlights language kernels and facilitative effects of knowledge processing in healthy and clinical populations; (ii) multilayer community detection enables contextual meaning reconstruction depending on psycholinguistic features; (iii) layer analysis can mediate latent interactions of mediation, suppression, and facilitation for lexical access. By outlining novel quantitative perspectives where multilayer networks can shed light on cognitive knowledge representations, including in next-generation brain/mind models, we discuss key limitations and promising directions for cutting-edge future research.