One key issue in text mining and natural language processing is how to effectively represent documents using numerical vectors. One classical model is the Bag-of-Words (BoW). In a BoW-based vector ...representation of a document, each element denotes the normalized number of occurrence of a basis term in the document. To count the number of occurrence of a basis term, BoW conducts exact word matching, which can be regarded as a hard mapping from words to the basis term. BoW representation suffers from its intrinsic extreme sparsity, high dimensionality, and inability to capture high-level semantic meanings behind text data. To address the aforementioned issues, we propose a new document representation method named fuzzy Bag-of-Words (FBoW) in this paper. FBoW adopts a fuzzy mapping based on semantic correlation among words quantified by cosine similarity measures between word embeddings. Since word semantic matching instead of exact word string matching is used, the FBoW could encode more semantics into the numerical representation. In addition, we propose to use word clusters instead of individual words as basis terms and develop fuzzy Bag-of-WordClusters (FBoWC) models. Three variants under the framework of FBoWC are proposed based on three different similarity measures between word clusters and words, which are named as <inline-formula><tex-math notation="LaTeX">\text{FBoWC}_{\rm mean}</tex-math></inline-formula>, <inline-formula><tex-math notation="LaTeX">\text{FBoWC}_{\rm max}</tex-math></inline-formula>, and <inline-formula><tex-math notation="LaTeX">\text{FBoWC}_{\rm min}</tex-math></inline-formula>, respectively. Document representations learned by the proposed FBoW and FBoWC are dense and able to encode high-level semantics. The task of document categorization is used to evaluate the performance of learned representation by the proposed FBoW and FBoWC methods. The results on seven real-word document classification datasets in comparison with six document representation learning methods have shown that our methods FBoW and FBoWC achieve the highest classification accuracies.
In modern manufacturing systems and industries, more and more research efforts have been made in developing effective machine health monitoring systems. Among various machine health monitoring ...approaches, data-driven methods are gaining in popularity due to the development of advanced sensing and data analytic techniques. However, considering the noise, varying length and irregular sampling behind sensory data, this kind of sequential data cannot be fed into classification and regression models directly. Therefore, previous work focuses on feature extraction/fusion methods requiring expensive human labor and high quality expert knowledge. With the development of deep learning methods in the last few years, which redefine representation learning from raw data, a deep neural network structure named Convolutional Bi-directional Long Short-Term Memory networks (CBLSTM) has been designed here to address raw sensory data. CBLSTM firstly uses CNN to extract local features that are robust and informative from the sequential input. Then, bi-directional LSTM is introduced to encode temporal information. Long Short-Term Memory networks(LSTMs) are able to capture long-term dependencies and model sequential data, and the bi-directional structure enables the capture of past and future contexts. Stacked, fully-connected layers and the linear regression layer are built on top of bi-directional LSTMs to predict the target value. Here, a real-life tool wear test is introduced, and our proposed CBLSTM is able to predict the actual tool wear based on raw sensory data. The experimental results have shown that our model is able to outperform several state-of-the-art baseline methods.
In modern industries, machine health monitoring systems (MHMS) have been applied wildly with the goal of realizing predictive maintenance including failures tracking, downtime reduction, and assets ...preservation. In the era of big machinery data, data-driven MHMS have achieved remarkable results in the detection of faults after the occurrence of certain failures (diagnosis) and prediction of the future working conditions and the remaining useful life (prognosis). The numerical representation for raw sensory data is the key stone for various successful MHMS. Conventional methods are the labor-extensive as they usually depend on handcrafted features, which require expert knowledge. Inspired by the success of deep learning methods that redefine representation learning from raw data, we propose local feature-based gated recurrent unit (LFGRU) networks. It is a hybrid approach that combines handcrafted feature design with automatic feature learning for machine health monitoring. First, features from windows of input time series are extracted. Then, an enhanced bidirectional GRU network is designed and applied on the generated sequence of local features to learn the representation. A supervised learning layer is finally trained to predict machine condition. Experiments on three machine health monitoring tasks: tool wear prediction, gearbox fault diagnosis, and incipient bearing fault detection verify the effectiveness and generalization of the proposed LFGRU.
•We conduct a detailed review of the applications of recent deep learning models on machine health monitoring tasks and provide our own insights into these models.•Practical studies about ...conventional machine learning models and deep learning models on a challenging tool wear prediction have been given. Related data and code have also been open to public.•We present current deep learning works on machine health monitoring in a well-organized way to facilitate researchers to catch this topic and provide discussions about the future direction in this research topic.
Since 2006, deep learning (DL) has become a rapidly growing research direction, redefining state-of-the-art performances in a wide range of areas such as object recognition, image segmentation, speech recognition and machine translation. In modern manufacturing systems, data-driven machine health monitoring is gaining in popularity due to the widespread deployment of low-cost sensors and their connection to the Internet. Meanwhile, deep learning provides useful tools for processing and analyzing these big machinery data. The main purpose of this paper is to review and summarize the emerging research work of deep learning on machine health monitoring. After the brief introduction of deep learning techniques, the applications of deep learning in machine health monitoring systems are reviewed mainly from the following aspects: Auto-encoder (AE) and its variants, Restricted Boltzmann Machines and its variants including Deep Belief Network (DBN) and Deep Boltzmann Machines (DBM), Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN). In addition, an experimental study on the performances of these approaches has been conducted, in which the data and code have been online. Finally, some new trends of DL-based machine health monitoring methods are discussed.
As a side effect of increasingly popular social media, cyberbullying has emerged as a serious problem afflicting children, adolescents and young adults. Machine learning techniques make automatic ...detection of bullying messages in social media possible, and this could help to construct a healthy and safe social media environment. In this meaningful research area, one critical issue is robust and discriminative numerical representation learning of text messages. In this paper, we propose a new representation learning method to tackle this problem. Our method named semantic-enhanced marginalized denoising auto-encoder (smSDA) is developed via semantic extension of the popular deep learning model stacked denoising autoencoder (SDA). The semantic extension consists of semantic dropout noise and sparsity constraints, where the semantic dropout noise is designed based on domain knowledge and the word embedding technique. Our proposed method is able to exploit the hidden feature structure of bullying information and learn a robust and discriminative representation of text. Comprehensive experiments on two public cyberbullying corpora (Twitter and MySpace) are conducted, and the results show that our proposed approaches outperform other baseline text representation learning methods.
•A relation extractor is developed to extract all relation candidates from sentence.•Features explored from lexical semantic resources may improve performance.•Feature selection may enhance ...performance and reduce computational complexity.
Extracting causal relation underlying natural language is an important issue in knowledge discovery. Most previous studies of casual relation extraction focus on simple cases like causal relations between two noun phrases indicated by fixed verbs or prepositions. For more complicated causal relations, such as causal relations between clauses, the previously developed algorithm may not work. To solve this problem, this paper develops a system that is able to extract causal relations in multi-level language expressions such as, words, phrases and clauses without fixed relators. The information extraction system is composed of a multi-level relation extractor and an ensemble-based relation classifier. It may extract more subtypes of causal relations than previous work because extracting domain is expanded in terms of both syntactic expressions and semantic meanings. In addition, the proposed method outperforms previously developed methods because extended features based on lexical semantic resources are explored. Experiments show that our system achieves an accuracy of 88.69% and F-score of 0.6637 in a dataset with 300 sentences.
Buildings consume quite a lot of energy; hence, the issue of building energy efficiency has attracted a great deal of attention in recent years. A key factor in achieving this objective is occupancy ...information that directly impacts on energy-related building control systems. In this paper, we leverage on environmental sensors that are nonintrusive and cost-effective for building occupancy estimation. Our result relies on feature engineering and learning. The conventional feature engineering requires one to manually extract relevant features without a clear guideline. This blind feature extraction is labor intensive and may miss some significant implicit features. To address this issue, we propose a convolutional deep bidirectional long short-term memory (CDBLSTM) approach that contains a convolutional network and a deep structure to automatically learn significant features from the sensory data without human intervention. Moreover, the long short-term memory networks are able to capture temporal dependencies in the data and the bidirectional structure can take the past and future contexts into consideration for the final identification of occupancy. We have conducted real experiments to evaluate the performance of our proposed CDBLSTM approach. Instead of estimating the exact number of occupants, we attempt to identify the range of occupants, i.e., zero, low, medium, and high, which is adequate for most of building control systems. The experimental results indicate the effectiveness of our proposed approach compared with the state-of-the-art methods.
In recent years, deep compositional models have emerged as a popular technique for representation learning of sentence in computational linguistic and natural language processing. These models ...normally train various forms of neural networks on top of pretrained word embeddings using a task-specific corpus. However, most of these works neglect the multisense nature of words in the pretrained word embeddings. In this paper we introduce topic models to enrich the word embeddings for multisenses of words. The integration of the topic model with various semantic compositional processes leads to topic-aware convolutional neural network and topic-aware long short term memory networks. Different from previous multisense word embeddings models that assign multiple independent and sense-specific embeddings to each word, our proposed models are lightweight and have flexible frameworks that regard word sense as the composition of two parts: a general sense derived from a large corpus and a topic-specific sense derived from a task-specific corpus. In addition, our proposed models focus on semantic composition instead of word understanding. With the help of topic models, we can integrate the topic-specific sense at word-level before the composition and sentence-level after the composition. Comprehensive experiments on five public sentence classification datasets are conducted and the results show that our proposed topic-aware deep compositional models produce competitive or better performance than other text representation learning methods.
•Knowledge-oriented convolutional neural network (K-CNN) performs better than CNN.•Word filters capture linguistic clues of causal relation and alleviate overfitting.•Word filter selection and ...clustering improve the performance of K-CNN.•Semantic features improve precision and recall for complex causal relations.•Combination of knowledge and data improves the performance of deep learning model.
Causal relation extraction is a challenging yet very important task for Natural Language Processing (NLP). There are many existing approaches developed to tackle this task, either rule-based (non-statistical) or machine-learning-based (statistical) method. For rule-based method, extensive manual work is required to construct handcrafted patterns, however, the precision and recall are low due to the complexity of causal relation expressions in natural language. For machine-learning-based method, current approaches either rely on sophisticated feature engineering which is error-prone, or rely on large amount of labeled data which is impractical for causal relation extraction problem. To address the above issues, we propose a Knowledge-oriented Convolutional Neural Network (K-CNN) for causal relation extraction in this paper. K-CNN consists of a knowledge-oriented channel that incorporates human prior knowledge to capture the linguistic clues of causal relationship, and a data-oriented channel that learns other important features of causal relation from the data. The convolutional filters in knowledge-oriented channel are automatically generated from lexical knowledge bases such as WordNet and FrameNet. We propose filter selection and clustering techniques to reduce dimensionality and improve the performance of K-CNN. Furthermore, additional semantic features that are useful for identifying causal relations are created. Three datasets have been used to evaluate the ability of K-CNN to effectively extract causal relation from texts, and the model outperforms current state-of-art models for relation extraction.
Model-Based Online Learning With Kernels Li, Guoqi; Wen, Changyun; Li, Zheng Guo ...
IEEE transaction on neural networks and learning systems,
03/2013, Volume:
24, Issue:
3
Journal Article
New optimization models and algorithms for online learning with Kernels (OLK) in classification, regression, and novelty detection are proposed in a reproducing Kernel Hilbert space. Unlike the ...stochastic gradient descent algorithm, called the naive online Reg minimization algorithm (NORMA), OLK algorithms are obtained by solving a constrained optimization problem based on the proposed models. By exploiting the techniques of the Lagrange dual problem like Vapnik's support vector machine (SVM), the solution of the optimization problem can be obtained iteratively and the iteration process is similar to that of the NORMA. This further strengthens the foundation of OLK and enriches the research area of SVM. We also apply the obtained OLK algorithms to problems in classification, regression, and novelty detection, including real time background substraction, to show their effectiveness. It is illustrated that, based on the experimental results of both classification and regression, the accuracy of OLK algorithms is comparable with traditional SVM-based algorithms, such as SVM and least square SVM (LS-SVM), and with the state-ofthe-art algorithms, such as Kernel recursive least square (KRLS) method and projectron method, while it is slightly higher than that of NORMA. On the other hand, the computational cost of the OLK algorithm is comparable with or slightly lower than existing online methods, such as above mentioned NORMA, KRLS, and projectron methods, but much lower than that of SVM-based algorithms. In addition, different from SVM and LS-SVM, it is possible for OLK algorithms to be applied to non-stationary problems. Also, the applicability of OLK in novelty detection is illustrated by simulation results.