This paper presents a new density-based clustering algorithm, ST-DBSCAN, which is based on DBSCAN. We propose three marginal extensions to DBSCAN related with the identification of (i) core objects, ...(ii) noise objects, and (iii) adjacent clusters. In contrast to the existing density-based clustering algorithms, our algorithm has the ability of discovering clusters according to non-spatial, spatial and temporal values of the objects. In this paper, we also present a spatial–temporal data warehouse system designed for storing and clustering a wide range of spatial–temporal data. We show an implementation of our algorithm by using this data warehouse and present the data mining results.
•Working with unique datasets of EV charging and smart meter load demand.•Distribution networks are not a homogenous group with more capabilities to accommodate EVs than previously suggested.•Spatial ...and temporal diversity of EV charging demand alleviate the impacts on networks.•An extensive recharging infrastructure could enable connection of additional EVs on constrained distribution networks.•Electric utilities could increase the network capability to accommodate EVs by investing in recharging infrastructure.
This work uses a probabilistic method to combine two unique datasets of real world electric vehicle charging profiles and residential smart meter load demand. The data was used to study the impact of the uptake of Electric Vehicles (EVs) on electricity distribution networks. Two real networks representing an urban and rural area, and a generic network representative of a heavily loaded UK distribution network were used. The findings show that distribution networks are not a homogeneous group with a variation of capabilities to accommodate EVs and there is a greater capability than previous studies have suggested. Consideration of the spatial and temporal diversity of EV charging demand has been demonstrated to reduce the estimated impacts on the distribution networks. It is suggested that distribution network operators could collaborate with new market players, such as charging infrastructure operators, to support the roll out of an extensive charging infrastructure in a way that makes the network more robust; create more opportunities for demand side management; and reduce planning uncertainties associated with the stochastic nature of EV charging demand.
Mortality prevention in T2D elderly population having Chronic Kidney Disease (CKD) may be possible thorough risk assessment and predictive modeling. In this study we investigate the ability to ...predict mortality using heterogeneous Electronic Health Records data. Temporal abstraction is employed to transform the heterogeneous multivariate temporal data into a uniform representation of symbolic time intervals, from which then frequent Time Intervals Related Patterns (TIRPs) are discovered. However, in this study a novel representation of the TIRPs is introduced, which enables to incorporate them in Deep Learning Networks. We describe here the use of iTirps and bTirps, in which the TIRPs are represented by a integer and binary vector representing the time respectively. While bTirp represents whether a TIRP’s instance was present, iTirp represents whether multiple instances were present. While the framework showed encouraging results, a major challenge is often the large number of TIRPs, which may cause the models to under-perform. We introduce a novel method for TIRPs’ selection method, called TIRP Ranking Criteria (TRC), which is consists on the TIRP’s metrics, such as the differences in its recurrences, its frequencies, and the average duration difference between the classes. Additionally, we introduce an advanced version, called TRC Redundant TIRP Removal (TRC-RTR), TIRPs that highly correlate are candidates for removal. Then the selected subset of iTirp/bTirps is fed into a Deep Learning architecture like a Recurrent Neural Network or a Convolutional Neural Network. Furthermore, a predictive committee is utilized in which raw data and iTirp data are both used as input. Our results show that iTirps-based models that use a subset of iTirps based on the TRC-RTR method outperform models that use raw data or models that use full set of discovered iTirps.
Display omitted
With the fast development of various positioning techniques such as Global Position System (GPS), mobile devices and remote sensing, spatio-temporal data has become increasingly available nowadays. ...Mining valuable knowledge from spatio-temporal data is critically important to many real-world applications including human mobility understanding, smart transportation, urban planning, public safety, health care and environmental management. As the number, volume and resolution of spatio-temporal data increase rapidly, traditional data mining methods, especially statistics based methods for dealing with such data are becoming overwhelmed. Recently deep learning models such as recurrent neural network (RNN) and convolutional neural network (CNN) have achieved remarkable success in many domains, and are also widely applied in various spatio-temporal data mining (STDM) tasks such as predictive learning, anomaly detection and classification. In this paper, we provide a comprehensive review of recent progress in applying deep learning techniques for STDM. We first categorize the spatio-temporal data into five different types, and then briefly introduce the deep learning models that are widely used in STDM. Next, we classify existing literature based on the types of spatio-temporal data, the data mining tasks, and the deep learning models, followed by the applications of deep learning for STDM in different domains.
Vecchia's approximate likelihood for Gaussian process parameters depends on how the observations are ordered, which has been cited as a deficiency. This article takes the alternative standpoint that ...the ordering can be tuned to sharpen the approximations. Indeed, the first part of the article includes a systematic study of how ordering affects the accuracy of Vecchia's approximation. We demonstrate the surprising result that random orderings can give dramatically sharper approximations than default coordinate-based orderings. Additional ordering schemes are described and analyzed numerically, including orderings capable of improving on random orderings. The second contribution of this article is a new automatic method for grouping calculations of components of the approximation. The grouping methods simultaneously improve approximation accuracy and reduce computational burden. In common settings, reordering combined with grouping reduces Kullback-Leibler divergence from the target model by more than a factor of 60 compared to ungrouped approximations with default ordering. The claims are supported by theory and numerical results with comparisons to other approximations, including tapered covariances and stochastic partial differential equations. Computational details are provided, including the use of the approximations for prediction and conditional simulation. An application to space-time satellite data is presented.
Air quality has drawn much attention in the recent years because it seriously affects people’s health. Nowadays, monitoring stations in a city can provide real-time air quality, but people also ...strongly desire air quality prediction, which is a challenging problem as it depends on several complicated factors, such as weather patterns and spatial-temporal dependencies of air quality. In this paper, we design a data-driven approach that utilizes historical air quality and meteorological data to predict air quality in the future. We propose a deep spatial-temporal ensemble(STE) model which is comprised of three components. The first component is an ensemble method with a weather-pattern-based partitioning strategy. It trains multiple individual models and combines them dynamically. The second one is to discover spatial correlation by analyzing Granger causalities among stations and generating spatial data as relative stations and relative areas. The last one is a temporal predictor based on deep LSTM to learn both long-term and short-term dependencies of air quality. We evaluate our model with data from 35 monitoring stations in Beijing, China. The experiments show that each component of our model makes contribution to the improvement in prediction accuracy and the model is superior to baselines.
The paper describes a new type of evolving connectionist systems (ECOS) called evolving spatio-temporal data machines based on neuromorphic, brain-like information processing principles (eSTDM). ...These are multi-modular computer systems designed to deal with large and fast spatio/spectro temporal data using spiking neural networks (SNN) as major processing modules. ECOS and eSTDM in particular can learn incrementally from data streams, can include ‘on the fly’ new input variables, new output class labels or regression outputs, can continuously adapt their structure and functionality, can be visualised and interpreted for new knowledge discovery and for a better understanding of the data and the processes that generated it. eSTDM can be used for early event prediction due to the ability of the SNN to spike early, before whole input vectors (they were trained on) are presented. A framework for building eSTDM called NeuCube along with a design methodology for building eSTDM using this is presented. The implementation of this framework in MATLAB, Java, and PyNN (Python) is presented. The latter facilitates the use of neuromorphic hardware platforms to run the eSTDM. Selected examples are given of eSTDM for pattern recognition and early event prediction on EEG data, fMRI data, multisensory seismic data, ecological data, climate data, audio-visual data. Future directions are discussed, including extension of the NeuCube framework for building neurogenetic eSTDM and also new applications of eSTDM.
•A two-layer LSTM model for diagnosis of Parkinson's disease (PD) is proposed.•Our model is better in describing the temporal sequential gait data.•Our method outperforms other methods in literature ...for PD severity rating.
When diagnosing Parkinson’s disease (PD), medical specialists normally assess several clinical manifestations of the PD patient and rate a severity level according to established criteria. This rating process is highly depended by doctors’ expertise, which is subjective and inefficient. In this paper, we propose a machine learning based method to automatically rate the PD severity from gait information, in particular, the sequential data of Vertical Ground Reaction Force (VGRF) recorded by foot sensors. We developed a two-channel model that combines Long Short-Term Memory (LSTM) and Convolutional Neural Network (CNN) to learn the spatio-temporal patterns behind the gait data. The model was trained and tested on three public VGRF datasets. Our proposed method outperforms existing ones in terms of prediction accuracy of PD severity levels. We believe the quantitative evaluation provided by our method will benefit clinical diagnosis of Parkinson’s disease.
Reliable traffic prediction is critical to improve safety, stability, and efficiency of intelligent transportation systems. However, traffic prediction is a very challenging problem because traffic ...data are a typical type of spatio-temporal data, which simultaneously shows correlation and heterogeneity both in space and time. Most existing works can only capture the partial properties of traffic data and even assume that the effect of correlation on traffic prediction is globally invariable, resulting in inadequate modeling and unsatisfactory prediction performance. In this paper, we propose a novel end-to-end deep learning model, called ST-3DNet, for traffic raster data prediction. ST-3DNet introduces 3D convolutions to automatically capture the correlations of traffic data in both spatial and temporal dimensions. A novel recalibration (Rc) block is proposed to explicitly quantify the difference of the contributions of the correlations in space. Considering two kinds of temporal properties of traffic data, i.e., local patterns and long-term patterns, ST-3DNet employs two components consisting of 3D convolutions and Rc blocks to, respectively, model the two kinds of patterns and then aggregates them together in a weighted way for the final prediction. The experiments on several real-world traffic datasets, viz., traffic congestion data and crowd flows data, demonstrate that our ST-3DNet outperforms the state-of-the-art baselines.
Predicting flows (e.g., the traffic of vehicles, crowds, and bikes), consisting of the in-out traffic at a node and transitions between different nodes, in a spatio-temporal network plays an ...important role in transportation systems. However, this is a very challenging problem, affected by multiple complex factors, such as the spatial correlation between different locations, temporal correlation among different time intervals, and external factors (like events and weather). In addition, the flow at a node (called node flow) and transitions between nodes (edge flow) mutually influence each other. To address these issues, we propose a multitask deep-learning framework that simultaneously predicts the node flow and edge flow throughout a spatio-temporal network. Based on fully convolutional networks, our approach designs two sophisticated models for predicting node flow and edge flow, respectively. These two models are connected by coupling their latent representations of middle layers, and trained together. The external factor is also integrated into the framework through a gating fusion mechanism. In the edge flow prediction model, we employ an embedding component to deal with the sparse transitions between nodes. We evaluate our method based on the taxicab data in Beijing and New York City. Experimental results show the advantages of our method beyond 11 baselines, such as ConvLSTM, CNN, and Markov Random Field.