Ensuring stable blood glucose (BG) levels within the norm is crucial for potential long-term health complications prevention when managing a chronic disease like Type 1 diabetes (T1D), as well as ...body weight. Therefore, accurately forecasting blood sugar levels holds significant importance for clinicians and specific users, such as type one diabetic patients. In recent years, Continuous Glucose Monitoring (CGM) devices have been developed and are now in use. However, the ability to forecast future blood glucose values is essential for better management. Previous studies proposed the use of food intake documentation in order to enhance the forecasting accuracy. Unfortunately, these methods require the participants to manually record their daily activities such as food intake, drink and exercise, which creates somewhat inaccurate data, and is hard to maintain along time. To reduce the burden on participants and improve the accuracy of BG level predictions, as well as optimize training and prediction times, this study proposes a framework that continuously tracks participants’ movements using a smartwatch. The framework analyzes sensor data and allows users to document their activities. We developed a model incorporating BG data, smartwatch sensor data, and user-documented activities. This model was applied to a dataset we collected from a dozen participants. Our study’s results indicate that documented activities did not enhance BG level predictions. However, using smartwatch sensors, such as heart rate and step detector data, in addition to blood glucose measurements from the last sixty minutes, significantly improved the predictions.
Professional bicycle racing is a popular sport that has attracted significant attention in recent years. The evolution and ubiquitous use of sensors allow cyclists to measure many metrics including ...power, heart rate, speed, cadence, and more in training and racing. In this paper we explore for the first time assignment of a subset of a team's cyclists to an upcoming race. We introduce RaceFit, a model that recommends, based on recent workouts and past assignments, cyclists for participation in an upcoming race. RaceFit consists of binary classifiers that are trained on pairs of a cyclist and a race, described by their relevant properties (features) such as the cyclist's demographic properties, as well as features extracted from his workout data from recent weeks; as well additional properties of the race, such as its distance, elevation gain, and more. Two main approaches are introduced in recommending on each stage in a race and aggregate from it to the race, or on the entire race. The model training is based on binary label which represent participation of cyclist in a race (or in a stage) in past events. We evaluated RaceFit rigorously on a large dataset of three pro-cycling teams' cyclists and race data achieving up to 80% precision@i. The first experiment had shown that using TP or STRAVA data performs the same. Then the best-performing parameters of the framework are using 5 weeks time window, imputation was effective, and the CatBoost classifier performed best. However, the model with any of the parameters performed always better than the baselines, in which the cyclists are assigned based on their popularity in historical data. Additionally, we present the top-ranked predictive features.
We propose a new pruning constraint when mining frequent temporal patterns to be used as classification and prediction features, the Semantic Adjacency Criterion SAC, which filters out temporal ...patterns that contain potentially semantically contradictory components, exploiting each medical domain’s knowledge. We have defined three SAC versions and tested them within three medical domains (oncology, hepatitis, diabetes) and a frequent-temporal-pattern discovery framework. Previously, we had shown that using SAC enhances the repeatability of discovering the same temporal patterns in similar proportions in different patient groups within the same clinical domain. Here, we focused on SAC’s computational implications for pattern discovery, and for classification and prediction, using the discovered patterns as features, by four different machine-learning methods: Random Forests, Naïve Bayes, SVM, and Logistic Regression. Using SAC resulted in a significant reduction, across all medical domains and classification methods, of up to 97% in the number of discovered temporal patterns, and in the runtime of the discovery process, of up to 98%. Nevertheless, the highly reduced set of only semantically transparent patterns, when used as features, resulted in classification and prediction models whose performance was at least as good as the models resulting from using the complete temporal-pattern set.
Biomedical data, in particular electronic medical records data, include a large number of variables sampled in irregular fashion, often including both time point and time intervals, thus providing ...several challenges for analysis and data mining. Classification of multivariate time series data is a challenging task, but is often necessary for medical care or research. Increasingly, temporal abstraction, in which a series of raw-data time points is abstracted into a set of symbolic time intervals, is being used for classification of multivariate time series. In this paper, we introduce a novel supervised discretization method, geared towards enhancement of classification accuracy, which determines the cutoffs that will best discriminate among classes through the distribution of their states. We present a framework for classification of multivariate time series analysis, which implements three phases: (1) application of a temporal-abstraction process that transforms a series of raw time-stamped data points into a series of symbolic time intervals (based on either unsupervised or supervised temporal abstraction); (2) mining these time intervals to discover frequent temporal-interval relation patterns (TIRPs), using versions of Allen’s 13 temporal relations; (3) using the patterns as features to induce a classifier. We evaluated the framework, focusing on the comparison of three versions of the new, supervised, temporal discretization for classification (TD4C) method, each relying on a different symbolic-state distribution-distance measure among outcome classes, to several commonly used unsupervised methods, on real datasets in the domains of diabetes, intensive care, and infectious hepatitis. Using only three abstract temporal relations resulted in a better classification performance than using Allen’s seven relations, especially when using three symbolic states per variable. Similarly when using the horizontal support and mean duration as the TIRPs feature representation, rather than a binary (existence) representation. The classification performance when using the three versions of TD4C was superior to the performance when using the unsupervised (EWD, SAX, and KB) discretization methods.
Multivariate temporal data analysis ‐ a review Moskovitch, Robert
Wiley interdisciplinary reviews. Data mining and knowledge discovery,
January/February 2022, 2022-01-00, 20220101, Volume:
12, Issue:
1
Journal Article
Peer reviewed
ABSTRACT
The information technology revolution, especially with the adoption of the Internet of Things, longitudinal data in many domains become more available and accessible for secondary analysis. ...Such data provide meaningful opportunities to understand process in many domains along time, but also challenges. A main challenge is the heterogeneity of the temporal variables due to the different types of data, whether a measurement or an event, and type of samplings: fixed or irregular. Other variables can be also events that may or not have duration. In this review, we discuss the various types of temporal data, and the various relevant analysis methods. Starting with fixed frequency variables, with forecasting and time series methods, and proceeding with sequential data, and sequential patterns mining, and time intervals mining for events having various time duration. Also the use of various deep learning based architectures for temporal data is discussed. The challenge of heterogeneous multivariate temporal data analysis and discuss various options to deal with it, focusing on an increasingly used option of transforming the data into symbolic time intervals through temporal ion and the use of time intervals related patterns discovery for temporal knowledge discovery, clustering, classification prediction, and more. Finally, we discuss the overview of the field, and areas in which more studies and contributions are needed.
This article is categorized under:
Algorithmic Development > Spatial and Temporal Data Mining
The increase of monitoring devices increases the availability of multivariate longitudinal data, which provides significant opportunities in understanding how things evolve in various domains. Nevertheless, with opportunities come also challenges.
Mining frequent
Time Intervals-Related Patterns
(
TIRPs
) from series of
symbolic time intervals
offers a comprehensive framework for heterogeneous, multivariate temporal data analysis in various ...application domains. While gaining a growing interest in recent decades, the efficient mining of frequent TIRPs is still a high computational challenge which has also not yet been investigated in its full complexity. The majority of previous methods discover only the first instances of the TIRPs within each series of symbolic time intervals, whereas their re-occurring instances are ignored. This eventually results in an
incomplete
discovery of frequent TIRPs, a problem that lies also in the challenge of mining only the frequent
closed TIRPs
, which was only recently investigated for the first time. In this paper, we introduce TIRPClo—an efficient algorithm for the complete mining of either the entire set of frequent TIRPs, or only the frequent closed TIRPs. The algorithm proposes a non-ambiguous sequential representation of symbolic time intervals series through the intervals’ end-points, as well as a memory-efficient index and a novel method for data projection, due to which it is the first algorithm to guarantee a complete discovery of frequent closed TIRPs. The experimental evaluation conducted on eleven real-world and four synthetic datasets demonstrates that TIRPClo is up to 10 times faster when mining the entire set of frequent TIRPs, and up to more than 100 times faster when mining only the frequent closed TIRPs compared to four state-of-the-art methods, while also reporting lower memory measurements.
Classification of multivariate time series data, often including both time points and intervals at variable frequencies, is a challenging task. We introduce the KarmaLegoSification (KLS) framework ...for classification of multivariate time series analysis, which implements three phases: (1) application of a temporal abstraction process that transforms a series of raw time-stamped data points into a series of symbolic time intervals; (2) mining these symbolic time intervals to discover frequent time-interval-related patterns (TIRPs), using Allen’s temporal relations; and (3) using the TIRPs as features to induce a classifier. To efficiently detect multiple TIRPs (features) in a single entity to be classified, we introduce a new algorithm, SingleKarmaLego, which can be shown to be superior for that purpose over a Sequential TIRPs Detection algorithm. We evaluated the KLS framework on datasets in the domains of diabetes, intensive care, and infectious hepatitis, assessing the effects of the various settings of the KLS framework. Discretization using Symbolic Aggregate approXimation (SAX) led to better performance than using the equal-width discretization (EWD); knowledge-based cut-off definitions when available were superior to both. Using three abstract temporal relations was superior to using the seven core temporal relations. Using an epsilon value larger than zero tended to result in a slightly better accuracy when using the SAX discretization method, but resulted in a reduced accuracy when using EWD, and overall, does not seem beneficial. No feature selection method we tried proved useful. Regarding feature (TIRP) representation, mean duration performed better than horizontal support, which in turn performed better than the default Binary (existence) representation method.
Symbolic Time Intervals (STIs) describe events having a non-zero time duration, which occur in a wide range of application domains. In this paper, we target the challenge of STIs series ...classification (STIC), which refers to the categorization of series of STIs. Over the recent years several advancements have been made in STIC, all of which are based on either distance-metrics or feature-based traditional classifiers, mostly relying on hand-engineering of features. Due to the high computational cost of either distance calculation or feature extraction, most methods also have quite little potential to scale. We introduce INSTINCT – a novel deep learning-based framework for STIC, which 1) proposes an almost fully information-preserving transformation of raw STIs series into real matrices, and 2) presents a novel ensemble of deep inception-based convolutional neural networks for their classification. The evaluation is applied to the six real-world STIC benchmark datasets and demonstrates that INSTINCT significantly improves accuracy over seven state-of-the-art methods, as well as over three deep learning-based baselines. In addition, a comprehensive architecture study of INSTINCT is conducted as well as a scalability analysis, reporting an overall time complexity which is linear in each of the main properties of the input STIs series.
•Novel deep learning-based framework for Symbolic Time Intervals series classification.•Representation of raw Symbolic Time Intervals series as real matrices.•Inception-based networks ensemble for Symbolic Time Intervals series classification.•Improved classification accuracy over state-of-the-art.•Linear time complexity.
We introduce an algorithm, called KarmaLego, for the discovery of frequent symbolic time interval-related patterns (TIRPs). The mined symbolic time intervals can be part of the input, or can be ...generated by a temporal-abstraction process from raw time-stamped data. The algorithm includes a data structure for TIRP-candidate generation and a novel method for efficient candidate-TIRP generation, by exploiting the transitivity property of Allen’s temporal relations. Additionally, since the non-ambiguous definition of TIRPs does not specify the duration of the time intervals, we propose to pre-cluster the time intervals based on their duration to decrease the variance of the supporting instances. Our experimental comparison of the KarmaLego algorithm’s runtime performance with several existing state of the art time intervals pattern mining methods demonstrated a significant speed-up, especially with large datasets and low levels of minimal vertical support. Furthermore, pre-clustering by time interval duration led to an increase in the homogeneity of the duration of the discovered TIRP’s supporting instances’ time intervals components, accompanied, however, by a corresponding decrease in the number of discovered TIRPs.
Develop a new method for continuous prediction that utilizes a single temporal pattern ending with an event of interest and its multiple instances detected in the temporal data.
Use temporal ...abstraction to transform time series, instantaneous events, and time intervals into a uniform representation using symbolic time intervals (STIs). Introduce a new approach to event prediction using a single time intervals-related pattern (TIRP), which can learn models to predict whether and when an event of interest will occur, based on multiple instances of a pattern that end with the event.
The proposed methods achieved an average improvement of 5% AUROC over LSTM-FCN, the best-performed baseline model, out of the evaluated baseline models (RawXGB, Resnet, LSTM-FCN, and ROCKET) that were applied to real-life datasets.
The proposed methods for predicting events continuously have the potential to be used in a wide range of real-world and real-time applications in diverse domains with heterogeneous multivariate temporal data. For example, it could be used to predict panic attacks early using wearable devices or to predict complications early in intensive care unit patients.
Display omitted