The ability to regulate one's own learning is essential for success in online courses. Recent efforts have used clickstream data to create timely, fine-grained, and comprehensive measures of ...self-regulated learning (SRL) in online courses in an attempt to shed light on the process of SRL and to improve the identification of students who lack SRL skills and are at risk of low achievement. However, key questions remain: to what extent do these clickstream measures correspond to traditional self-reported measures about specific SRL constructs? Do these clickstream measures provide more information than existing self-reported measures in predicting course performance? This study used the clickstream data collected from a learning management system to measure two aspects of SRL: time management and effort regulation. We found that the clickstream measures were significantly associated with students' self-reported time management and effort regulation after the course. In addition, these clickstream measures significantly improved predictions of students' performance in the current and subsequent courses over predictions based on self-reported measures alone. These results provide evidence for the validity of the clickstream measures and guide the use of clickstream data to understand the process of SRL and identify students who might not be well served by taking classes online.
•Clickstream data from a learning management system was used to measure two constructs of SRL: time management and effort regulation.•These clickstream measures were significantly associated with students' self-reported time management and effort regulation after the course.•These clickstream measures could significantly improve the prediction of students' course performance over self-reported measures.
The article contributes both conceptually and methodologically to the study of online news consumption by introducing new approaches to measuring user information behaviour and proposing a typology ...of users based on their click behaviour. Using as a case study two online outlets of large national newspapers, it employs computational approaches to detect patterns in time- and content-based user interactions with news content based on clickstream data. The analysis of interactions detects several distinct timelines of news consumption and scrutinises how users switch between news topics during reading sessions. Using clustering analysis, the article then identifies several types of news readers (e.g. samplers, gourmets) and examines their news diets. The results point out the limited variation in topical composition of the news diets between different types of readers and the tendency of these diets to align with the news supply patterns (i.e. the average distribution of topics covered by the outlet).
Self-regulated learning (SRL) refers to how learners steer their own learning. Supporting SRL has been shown to enhance the use of SRL strategies and learning performance in computer-based learning ...environments. However, little is known about supporting SRL in Massive Open Online Courses (MOOCs). In this study, weekly SRL prompts were embedded as videos in a MOOC. We employed a sequential pattern mining algorithm, Sequential Pattern Discovery using Equivalence classes (cSPADE), on gathered log data to explore whether differences exist between learners who viewed the SRL-prompt videos and those who did not. Results showed that SRL-prompt viewers interacted with more course activities and completed these activities in a more similar sequential pattern than non SRL-prompt viewers. Also, SRL-prompt viewers tended to follow the course structure, which has been identified as a behavioral characteristic of students who scored higher on SRL (i.e., comprehensive learners) in previous research. Based on the results, implications for supporting SRL in MOOCs are discussed.
•We examined learners' use of self-regulated learning (SRL) prompts in a MOOC.•Using sequential pattern mining, sequences of learner activities were examined.•Students who viewed more prompts interacted with more course elements.•Viewers of SRL-prompts better follow the course structure than non-viewers.•Exploring sequences of learner activities potentially informs the design of MOOCs.
•We propose a parallel depth-first search with dynamic load balancing.•We propose a parallel algorithm called PCompact-SPADE for mining weighted frequent clickstream patterns.•We experiment on ...various datasets to illustrate the algorithm’s performance and scalability.
In the Internet age, analyzing the behavior of online users can help webstore owners understand customers’ interests. Insights from such analysis can be used to improve both user experience and website design. A prominent task for online behavior analysis is clickstream mining, which consists of identifying customer browsing patterns that reveal how users interact with websites. Recently, this task was extended to consider weights to find more impactful patterns. However, most algorithms for mining weighted clickstream patterns are serial algorithms, which are sequentially executed from the start to the end on one running thread. In real life, data is often very large, and serial algorithms can have long runtimes as they do not fully take advantage of the parallelism capabilities of modern multi-core CPUs. To address this limitation, this paper presents two parallel algorithms named DPCompact-SPADE (Depth load balancing Parallel Compact-SPADE) and APCompact-SPADE (Adaptive Parallel Compact-SPADE) for weighted clickstream pattern mining. Experiments on various datasets show that the proposed parallel algorithm is efficient, and outperforms state-of-the-art serial algorithms in terms of runtime, memory consumption, and scalability.
Research on the relationship between the digital traces of students in Learning Management Systems (LMS) and their academic performance has traditionally been an area of interest in the field of ...learning analytics. Aiming at achieving high interpretability and generalizability, this study reviews past research, defines a new categorization scheme for interactions in LMS and investigates the relationships between clickstream data of students’ activity and course performance, measured as final grade. The results of the multiple regression analysis of Moodle log data collected from three courses of diverse nature using various classifications of interactions suggest that the new categorization, the CILC (Classification of Interactions based on the Learning Cycle), improves the explanatory power when compared to previous classifications. The analysis suggests that the predictive ability of the models could depend on the delivery mode, with predictions improving as the delivery mode transitions from face-to-face learning to online learning. This finding highlights the need for context-specific considerations about the learning process. Compared to previous research, the analysis also reveals nuances that suggest that the relationships may depend on the instructional design. Finally, the findings also seem to support the notion that increased data quantity and quality may improve predictive models. In summary, the study contributes with valuable insights into the interplay between LMS interactions, course delivery mode, instructional design, and academic performance, and advocates for a balanced exploration of white-box and black-box modeling approaches in learning analytics research.
•We define a new categorization for LMS clickstream data based on the learning cycle.•The new categorization outperforms previous ones in explaining student performance.•The need for interpretability justifies white-box and black-box models’ coexistence.•The predictive power of LMS interaction data varies with course delivery mode.•The instructional design might affect the interaction-performance link.
We present a novel method for predicting the evolution of a student's grade in massive open online courses (MOOCs). Performance prediction is particularly challenging in MOOC settings due to ...per-student assessment response sparsity and the need for personalized models. Our method overcomes these challenges by incorporating another, richer form of data collected from each student-lecture video-watching clickstreams-into the machine learning feature set, and using that to train a time series neural network that learns from both prior performance and clickstream data. Through evaluation on two MOOC datasets, we find that our algorithm outperforms a baseline of average past performance by more than 60% on average, and a lasso regression baseline by more than 15%. Moreover, the gains are higher when the student has answered fewer questions, underscoring their ability to provide instructors with early detection of struggling and/or advanced students. We also show that despite these gains, when taken alone, none of the behavioral features are particularly correlated with performance, emphasizing the need to consider their combined effect and nonlinear predictors. Finally, we discuss how course instructors can use these predictive learning analytics to stage student interventions.
The big data stored in massive open online course (MOOC) platforms have become a posed challenge in the Learning Analytics field to analyze the learning behavior of learners, and predict their ...respective performance, related especially to video lecture data, since most learners view the same online lecture videos. This helps to conduct a comprehensive analysis of such behaviors and explore various learning patterns in MOOC video interactions. This paper aims at presenting a visual analysis, which enables course instructors and education experts to analyze clickstream data that were generated by learner interaction with course videos. It also aims at predicting learner performance, which is a vital decision‐making problem, by addressing their issues and improving the educational process. This paper uses a long short‐term memory network (LSTM) on implicit features extracted from video‐clickstreams data to predict learners' performance and enable instructors to make measures for timely intervention. Results show that the accuracy rate of the proposed model is 89%–95% throughout course weeks. The proposed LSTM model outperforms baseline Deep learning (GRU) and simple recurrent neural network by accuracy of 90.30% in the “Mining of Massive Datasets” course, and the “Automata Theory” accuracy is 89%.
This study was motivated by a need to understand the extent to which behavioral indicators of engagement from digital log data are associated with various student learning outcomes above and beyond ...self-reported levels of engagement, and whether the strength of these associations vary depending on the type of learning outcome. Student learning was assessed by way of four distinct learning outcomes that varied according to stakes (low-v. high-stakes) and span (one-time v. aggregated). Participants included high school students between 14 and 18 years of age enrolled in an AP Statistics course (N = 320, M age = 16.76 years, SD age = 0.85; 60.2% female) who had consented to use an online assessment system over the course of an academic year that was designed to provide personalized performance reports. While largely uncorrelated with self-report measures, certain process data variables were significantly correlated with learning outcomes. In particular, students’ frequency of score report checking, an indication of feedback-seeking behavior, while uncorrelated with self-reported student engagement, was associated with all learning outcomes. Other behaviors, such as the number of log-in sessions and the duration of sessions, were not. These findings suggest that process data from online assessment systems can help broaden and deepen our understanding of student behavior above and beyond self-report. That said, given that the volume and complexity of process data can make it challenging to mine and interpret, researchers must consider theory when identifying process data variables that are critical to the understanding of constructs of interest.
•We studied the extent process data from an assessment system predicted learning.•We considered learning outcomes differing by low-/high-stakes and one-time/aggregate.•Clicks to results page, an indication of feedback-seeking, predicted all outcomes.•Number of log-in sessions and average session duration did not predict learning.•Process data variables were largely uncorrelated with self-reported engagement.
Recently, there has been a growing interest in sequential pattern mining in data mining, with a particular focus on clickstream pattern mining. These areas hold the potential for discovering valuable ...patterns. However, traditional mining algorithms in these domains often assume that databases are static, simplifying the mining process. In reality, databases are updated incrementally over time, partially rendering a portion of the previous results invalid. This necessitates rerunning algorithms on updated databases to obtain accurate frequent patterns. As database size increases, this approach can become time-consuming and affect performance. To tackle this issue, we propose PSB-CUP to mine frequent clickstream patterns in an incremental update manner. PSB-CUP employs the concept of search borders to reduce the search space and the information retained in memory. Furthermore, an IDList generation method called “partial imbalance join” was proposed to reconstruct possibly missing information during the incremental process. This join method, however, requires more extra information to be cached in exchange for speed. We then improve this technique by introducing “recursive imbalance join”, removing the need for extra cached data in the PSB-CUP + algorithm. The experimental results show that our proposed algorithms are efficient for incremental clickstream pattern mining.