In recent years, the availability of time series streaming information has been growing enormously. Learning from real-time data has been receiving increasingly more attention since the last decade. ...Online learning encounters the change in the distribution of data while extracting considerable information from data streams. Hidden data contexts, which are not known to the learning algorithms, are known as concept drift. Classifier classifies incoming instances using past training instances of the data stream. The accuracy of the classifier deteriorates because of the concept drift. The traditional classifiers are not expected to learn the patterns in a non-stationary distribution of data. For any real-time use, the classifier needs to detect the concept drift and adapts over time. In the real-time scenario, we have to deal with semi-supervised and unsupervised data, which provide no or fewer labeled data. The motivation behind this paper is to introduce a survey identified with a broad categorization of concept drift detectors with their key points, limitations, and advantages. Eventually, the article suggests research trends, research challenges, and future work. The adaptive mechanisms are also incorporated in this survey.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Action recognition is a challenging research area in which several convolutional neural networks (CNN) based action recognition methods are recently presented. However, such methods are inefficient ...for real-time online data stream processing with satisfied accuracy. Therefore, in this paper we propose an efficient and optimized CNN based system to process data streams in real-time, acquired from visual sensor of non-stationary surveillance environment. Firstly, frame level deep features are extracted using a pre-trained CNN model. Next, an optimized deep autoencoder (DAE) is introduced to learn temporal changes of the actions in the surveillance stream. Furthermore, a non-linear learning approach, quadratic SVM is trained for the classification of human actions. Finally, an iterative fine-tuning process is added in the testing phase that can update the parameters of trained model using the newly accumulated data of non-stationary environment. Experiments are conducted on benchmark datasets and results reveal the better performance of our system in terms of accuracy and running time compared to state-of-the-art methods. We believe that our proposed system is a suitable candidate for action recognition in surveillance data stream of non-stationary environments.
•Action recognition in online data stream acquired from non-stationary surveillance.•Efficient CNN model is used for frame-level representation.•An optimized deep autoencoder is presented for learning sequences and squeezing high. dimensional features.•Investigated a non-linear learning approach for action recognition.•Iterative fine-tuning of the trained recognition model for newly accumulated data.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
A bunch of stream clustering algorithms have been proposed recently to mine data streams generated at high speeds from hardware platforms and software applications. Density-based methods are widely ...used because they can handle outliers and capture clusters of arbitrary shapes. However, it is still hard to effectively identify multi-density clusters with ambiguous boundaries in a data stream. To address these limitations, this paper introduces a data stream clustering algorithm called TWStream, based on the three-way decision theory. It is a two-stage clustering algorithm based on density. In the online stage, an augmented <inline-formula><tex-math notation="LaTeX">k</tex-math></inline-formula>nn graph is maintained incrementally to accelerate the update of the <inline-formula><tex-math notation="LaTeX">k</tex-math></inline-formula>nn graph. In the offline stage, TWStream introduces the concept of boundary confidence to detect cluster boundaries efficiently and reveal potential cores of clusters. It integrates the skewness and sparsity of the data distribution, as well as the evolving trend of the stream.In the next step, a micro-cluster-based three-way clustering strategy is applied to reconstruct latent clusters. It improves the clustering quality of boundary-ambiguous clusters in a stream using a mutual reachability-based clustering approach and a three-way assignment approach. The proposed algorithm is compared with 9 competitors on 15 data streams. Experimental results show TWStream achieves competitive performance, verifying its effectiveness. The source code of the proposed TWStream can be available at https://github.com/Du-Team/TWStream .
•The MVRL method learns a fused sparse affinity matrix across multiple views.•The MVRL method captures the global and local structures of data objects.•The complementary information is explored by ...exploiting affinity matrices.•The upper bound of computational cost is determined by closed-form solutions.•The dynamic set transfers previously learned knowledge to the arrival data objects.
Data stream clustering provides valuable insights into the evolving patterns of long sequences of continuously generated data objects. Most existing clustering methods focus on single-view data streams. In this paper, we propose a multi-view representation learning (MVRL) method for multi-view clustering of data streams. We first introduce an integrated representation learning model to learn a fused sparse affinity matrix across multiple views for spectral clustering. Motivated by the optimization procedure of the integrated representation learning model, we propose three consecutive stages: collaborative representation, the construction of individual global affinity matrices using a mapping function, and the calculation of a fused sparse affinity matrix using Euclidean projection. These stages allow the effective capture of the global and local structures of high-dimensional data objects. Moreover, each stage has a closed-form solution, which determines the upper bound of the computational cost and memory consumption. We then employ the construction residuals of the collaborative representation to adaptively update a dynamic set, which is used to preserve the representative data objects. The dynamic set efficiently transfers previously learned useful knowledge to the arriving data objects. Extensive experimental results on multi-view data stream datasets demonstrate the effectiveness of the proposed MVRL method.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Multi-label data streams are sequences of multi-label instances arriving over time to a multi-label classifier. The properties of the stream may continuously change due to concept drift. Therefore, ...algorithms must constantly adapt to the new data distributions. In this paper we propose a novel ensemble method for multi-label drifting streams named Adaptive Ensemble of Self-Adjusting Nearest Neighbor Subspaces (AESAKNNS). It leverages a self-adjusting kNN as a base classifier with the advantages of ensembles to adapt to concept drift in the multi-label environment. To promote diverse knowledge within the ensemble, each base classifier is given a unique subset of features and samples to train on. These samples are distributed to classifiers in a probabilistic manner that follows a Poisson distribution as in online bagging. Accompanying these mechanisms, a collection of ADWIN detectors monitor each classifier for the occurrence of a concept drift on the subspace. Upon detection, the algorithm automatically trains additional classifiers in the background to attempt to capture new concepts on new subspaces of features. The dynamic classifier selection chooses the most accurate classifiers from the active and background ensembles to replace the current ensemble. Our experimental study compares the proposed approach with 30 other classifiers, including problem transformation, algorithm adaptation, kNNs, and ensembles on 30 diverse multi-label datasets and 12 performance metrics. Results, validated using non-parametric statistical analysis, support the better performance of the AESAKNNS and highlight the contribution of its components in improving the performance of the ensemble.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
This work aims to connect two rarely combined research directions, i.e., non-stationary data stream classification and data analysis with skewed class distributions. We propose a novel framework ...employing stratified bagging for training base classifiers to integrate data preprocessing and dynamic ensemble selection methods for imbalanced data stream classification. The proposed approach has been evaluated based on computer experiments carried out on 135 artificially generated data streams with various imbalance ratios, label noise levels, and types of concept drift as well as on two selected real streams. Four preprocessing techniques and two dynamic selection methods, used on both bagging classifiers and base estimators levels, were considered. Experimentation results showed that, for highly imbalanced data streams, dynamic ensemble selection coupled with data preprocessing could outperform online and chunk-based state-of-art methods.
•Dynamic classifier selection for non-stationary imbalanced data stream.•Forming classifier ensemble based on stratified bagging.•Employing oversampling and undersampling techniques to prepare DSEL.•Experiments show the effectiveness of preprocessed DES for difficult data streams.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
28.
Wilcoxon Rank Sum Test Drift Detector Barros, Roberto Souto Maior de; Hidalgo, Juan Isidro González; Cabral, Danilo Rafael de Lima
Neurocomputing (Amsterdam),
01/2018, Volume:
275
Journal Article
Peer reviewed
Online learning regards extracting information from large quantities of data (streams) usually affected by changes in the distribution (concept drift). Drift detectors are software that estimate the ...positions of these changes to substitute the base learner and ultimately improve accuracy. Statistical Test of Equal Proportions (STEPD) is a simple, well-known, efficient detector which uses a hypothesis test between two proportions to signal the concept drifts. However, despite identifying the existing drifts close to their correct positions, STEPD tends to identify many false positives. This article examines the application of the Wilcoxon rank sum statistical test for concept drift detection, proposing WSTD. Experiments run in the MOA framework using four artificial dataset generators, with abrupt and gradual drift versions of three sizes, as well as seven real-world datasets, suggest WSTD improves the detections of STEPD and other methods as well as their accuracies in many scenarios.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UL, UM, UPCLJ, UPUK, ZRSKP
Learning non-stationary data streams is challenging due to the unique characteristics of infinite length and evolving property. Current existing works often concentrate on the concept-drift problem ...in data streams. Concept evolution, indicating novel classes are emerged in data streams, has gained growing attention recently due to its practical values in many real-world applications. Thereby, how to design a new robust learning model on data streams to handle concept drift, concept evolution and outliers simultaneously, is of significant importance. To this end, we propose a new data stream classification approach, called EMC, which dynamically learns the Evolving Micro-Clusters to examine both concept drift and evolution. Specifically, to capture time-changing concept, EMC dynamically maintains a set of online micro-clusters and learns their importance with error-based representative learning. Building upon the evolving micro-clusters, the novel class detector is introduced based on a local density perspective, which allows handling the data streams with complex class distribution. Beyond, EMC allows distinguishing concept drift and evolution from noisy instances. Extensive experiments on both synthetic and real-world data sets show that our method has good classification and novel class detection performance compared to state-of-the-art algorithms.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Recently, skyline query processing over data stream has gained a lot of attention especially from the database community owing to its own unique challenges. Skyline queries aims at pruning a search ...space of a potential large multi-dimensional set of objects by keeping only those objects that are not worse than any other. Although an abundance of skyline query processing techniques have been proposed, there is a lack of a Systematic Literature Review (SLR) on current research works pertinent to skyline query processing over data stream. In regard to this, this paper provides a comparative study on the state-of-the-art approaches over the period between 2000 and 2022 with the main aim to help readers understand the key issues which are essential to consider in relation to processing skyline queries over streaming data. Seven digital databases were reviewed in accordance with the Preferred Reporting Items for Systematic Reviews (PRISMA) procedures. After applying both the inclusion and exclusion criteria, 23 primary papers were further examined. The results show that the identified skyline approaches are driven by the need to expedite the skyline query processing mainly due to the fact that data streams are time varying (time sensitive), continuous, real time, volatile, and unrepeatable. Although, these skyline approaches are tailored made for data stream with a common aim, their solutions vary to suit with the various aspects being considered, which include the type of skyline query, type of streaming data, type of sliding window, query processing technique, indexing technique as well as the data stream environment employed. In this paper, a comprehensive taxonomy is developed along with the key aspects of each reported approach, while several open issues and challenges related to the topic being reviewed are highlighted as recommendation for future research direction.