New high-frequency, automated data collection and analysis algorithms could offer new insights into complex learning processes, especially for tasks in which students have opportunities to generate ...unique open-ended artifacts such as computer programs. These approaches should be particularly useful because the need for scalable project-based and student-centered learning is growing considerably. In this article, we present studies focused on how students learn computer programming, based on data drawn from 154,000 code snapshots of computer programs under development by approximately 370 students enrolled in an introductory undergraduate programming course. We use methods from machine learning to discover patterns in the data and try to predict final exam grades. We begin with a set of exploratory experiments that use fully automated techniques to investigate how much students change their programming behavior throughout all assignments in the course. The results show that students' change in programming patterns is only weakly predictive of course performance. We subsequently hone in on 1 single assignment, trying to map students' learning process and trajectories and automatically identify productive and unproductive (sink) states within these trajectories. Results show that our process-based metric has better predictive power for final exams than the midterm grades. We conclude with recommendations about the use of such methods for assessment, real-time feedback, and course improvement.
Digitising the vision test Piech, Chris; Malik, Ali; Topol, Eric J
The Lancet (British edition),
10/2021, Letnik:
398, Številka:
10308
Journal Article
Recenzirano
...digital vision tests can be self-administered with a smartphone, tablet, or computer, making strides towards future at-home vision testing. StAT also allows a doctor to encode prior beliefs of the ...person's visual ability. ...the test can be run indefinitely until a certain confidence threshold has been achieved for the final result and the algorithm incorporates the fact that people “slip” and give mistaken answers when doing a digital test. EJT is supported by the US National Institutes of Health/National Center for Advancing Translational Sciences grant UL1TR001114.
Providing consistent, individualized feedback to teachers is essential for improving instruction but can be prohibitively resource-intensive in most educational contexts. We develop M-Powering ...Teachers, an automated tool based on natural language processing to give teachers feedback on their uptake of student contributions, a high-leverage dialogic teaching practice that makes students feel heard. We conduct a randomized controlled trial in an online computer science course (N = 1,136 instructors), to evaluate the effectiveness of our tool. We find that M-Powering Teachers improves instructors’ uptake of student contributions by 13% and present suggestive evidence that it also improves students’ satisfaction with the course and assignment completion. These results demonstrate the promise of M-Powering Teachers to complement existing efforts in teachers’ professional development.
To develop and evaluate an automated, portable algorithm to differentiate active corneal ulcers from healed scars using only external photographs.
A convolutional neural network was trained and ...tested using photographs of corneal ulcers and scars.
De-identified photographs of corneal ulcers were obtained from the Steroids for Corneal Ulcers Trial (SCUT), Mycotic Ulcer Treatment Trial (MUTT), and Byers Eye Institute at Stanford University.
Photographs of corneal ulcers (n = 1313) and scars (n = 1132) from the SCUT and MUTT were used to train a convolutional neural network (CNN). The CNN was tested on 2 different patient populations from eye clinics in India (n = 200) and the Byers Eye Institute at Stanford University (n = 101). Accuracy was evaluated against gold standard clinical classifications. Feature importances for the trained model were visualized using gradient-weighted class activation mapping.
Accuracy of the CNN was assessed via F
score. The area under the receiver operating characteristic (ROC) curve (AUC) was used to measure the precision-recall trade-off.
The CNN correctly classified 115 of 123 active ulcers and 65 of 77 scars in patients with corneal ulcer from India (F
score, 92.0% 95% confidence interval (CI), 88.2%-95.8%; sensitivity, 93.5% 95% CI, 89.1%-97.9%; specificity, 84.42% 95% CI, 79.42%-89.42%; ROC: AUC, 0.9731). The CNN correctly classified 43 of 55 active ulcers and 42 of 46 scars in patients with corneal ulcers from Northern California (F
score, 84.3% 95% CI, 77.2%-91.4%; sensitivity, 78.2% 95% CI, 67.3%-89.1%; specificity, 91.3% 95% CI, 85.8%-96.8%; ROC: AUC, 0.9474). The CNN visualizations correlated with clinically relevant features such as corneal infiltrate, hypopyon, and conjunctival injection.
The CNN classified corneal ulcers and scars with high accuracy and generalized to patient populations outside of its training data. The CNN focused on clinically relevant features when it made a diagnosis. The CNN demonstrated potential as an inexpensive diagnostic approach that may aid triage in communities with limited access to eye care.
Deconstructing disengagement Kizilcec, René F.; Piech, Chris; Schneider, Emily
Proceedings of the Third International Conference on Learning Analytics and Knowledge,
04/2013
Conference Proceeding
As MOOCs grow in popularity, the relatively low completion rates of learners has been a central criticism. This focus on completion rates, however, reflects a monolithic view of disengagement that ...does not allow MOOC designers to target interventions or develop adaptive course features for particular subpopulations of learners. To address this, we present a simple, scalable, and informative classification method that identifies a small number of longitudinal engagement trajectories in MOOCs. Learners are classified based on their patterns of interaction with video lectures and assessments, the primary features of most MOOCs to date.
In an analysis of three computer science MOOCs, the classifier consistently identifies four prototypical trajectories of engagement. The most notable of these is the learners who stay engaged through the course without taking assessments. These trajectories are also a useful framework for the comparison of learner engagement between different course structures or instructional approaches. We compare learners in each trajectory and course across demographics, forum participation, video access, and reports of overall experience. These results inform a discussion of future interventions, research, and design directions for MOOCs. Potential improvements to the classification mechanism are also discussed, including the introduction of more fine-grained analytics.
The more frequent collection of response time data is leading to an increased need for an understanding of how such data can be included in measurement models. Models for response time have been ...advanced, but relatively limited large‐scale empirical investigations have been conducted. We take advantage of a large data set from the adaptive NWEA MAP Growth Reading Assessment to shed light on emergent features of response time behavior. We identify two behaviors in particular. The first, response acceleration, is a reduction in response time for responses that occur later in the assessment. We note that such reductions are heterogeneous as a function of estimated ability (lower ability estimates are associated with larger increases in acceleration) and that reductions in response time lead to lower accuracy relative to expectation for lower ability students. The second is within‐person variation in the association between time usage and accuracy. Idiosyncratic within‐person changes in response time have inconsistent implications for accuracy; in some cases additional response time predicts higher accuracy but in other cases additional response time predicts declines in accuracy. These findings have implications for models that incorporate response time and accuracy. Our approach may be useful in other studies of adaptive testing data.
The speed–accuracy trade-off (SAT) suggests that time constraints reduce response accuracy. Its relevance in observational settings—where response time (RT) may not be constrained but respondent ...speed may still vary—is unclear. Using 29 data sets containing data from cognitive tasks, we use a flexible method for identification of the SAT (which we test in extensive simulation studies) to probe whether the SAT holds. We find inconsistent relationships between time and accuracy; marginal increases in time use for an individual do not necessarily predict increases in accuracy. Additionally, the speed–accuracy relationship may depend on the underlying difficulty of the interaction. We also consider the analysis of items and individuals; of particular interest is the observation that respondents who exhibit more within-person variation in response speed are typically of lower ability. We further find that RT is typically a weak predictor of response accuracy. Our findings document a range of empirical phenomena that should inform future modeling of RTs collected in observational settings.
We study the problem of minimizing the delay between when an issue comes up in a course and when instructors get feedback about it. The widespread practice of obtaining midterm and end-of-term ...feedback from students is suboptimal in this regard, especially for large courses: it over-samples at a specific point in the course and can be biased by factors irrelevant to the teaching process. As a solution, we release High Resolution Course Feedback (HRCF), an open-source student feedback mechanism that builds on a surprisingly simple idea: survey each student on random weeks exactly twice per term. Despite the simplicity of its core idea, when deployed to 31 courses totaling a cumulative 6,835 students, HRCF was able to detect meaningful mood changes in courses and significantly improve timely feedback without asking for extra work from students compared to the common practice. An interview with the instructors revealed that HRCF provided constructive and useful feedback about their courses early enough to be acted upon, which would have otherwise been unobtainable through other survey methods. We also explore the possibility of using Large Language Models to flexibly and intuitively organize large volumes of student feedback at scale and discuss how HRCF can be further improved.
Are there structures underlying student work that are universal across every open-ended task? We demonstrate that, across many subjects and assignment types, the probability distribution underlying ...student-generated open-ended work is close to Zipf’s Law. Inferring this latent structure for classroom assignments can help learning analytics researchers, instruction designers, and educators understand the landscape of various student approaches, assess the complexity of assignments, and prioritise pedagogical attention. However, typical classrooms are way too small to witness even the contour of the Zipfian pattern, and it is generally impossible to perform inference for Zipf’s law from such small number of samples. We formalise this difficult task as the Zipf Inference Challenge: (1) Infer the ordering of student-generated works by their underlying probabilities, and (2) Estimate the shape parameter of the underlying distribution in a typical-sized classroom. Our key insight in addressing this challenge is to leverage the densities of the student response landscapes represented by semantic similarity. We show that our “Semantic Density Estimation” method is able to do a much better job at inferring the latent Zipf shape and the probability-ordering of student responses for real world education datasets.