Any bearing faults are a leading cause of motor damage and bring economic losses. Fast and accurate identification of bearing faults is valuable for preventing damaging the whole equipment and ...continuously running industrial processes without interruption. Vibration signals from a running motor can be utilized to diagnose a bearing health condition. This study proposes a detection method for bearing faults based on two types of neural networks from motor vibration data. The proposed method uses an autoencoder neural network for constructing a new motor vibration feature and a feed-forward neural network for the final detection. The constructed signal feature enhances the prediction performance by focusing more on a fault type that is difficult to detect. We conducted experiments on the CWRU bearing datasets. The experimental study shows that the proposed method improves the performance of the feed-forward neural network and outperforms the other machine learning algorithms.
Named Entity Recognition (NER) in the healthcare domain involves identifying and categorizing disease, drugs, and symptoms for biosurveillance, extracting their related properties and activities, and ...identifying adverse drug events appearing in texts. These tasks are important challenges in healthcare. Analyzing user messages in social media networks such as Twitter can provide opportunities to detect and manage public health events. Twitter provides a broad range of short messages that contain interesting information for information extraction. In this paper, we present a Health-Related Named Entity Recognition (HNER) task using healthcare-domain ontology that can recognize health-related entities from large numbers of user messages from Twitter. For this task, we employ a deep learning architecture which is based on a recurrent neural network (RNN) with little feature engineering. To achieve our goal, we collected a large number of Twitter messages containing health-related information, and detected biomedical entities from the Unified Medical Language System (UMLS). A bidirectional long short-term memory (BiLSTM) model learned rich context information, and a convolutional neural network (CNN) was used to produce character-level features. The conditional random field (CRF) model predicted a sequence of labels that corresponded to a sequence of inputs, and the Viterbi algorithm was used to detect health-related entities from Twitter messages. We provide comprehensive results giving valuable insights for identifying medical entities in Twitter for various applications. The BiLSTM-CRF model achieved a precision of 93.99%, recall of 73.31%, and F1-score of 81.77% for disease or syndrome HNER; a precision of 90.83%, recall of 81.98%, and F1-score of 87.52% for sign or symptom HNER; and a precision of 94.85%, recall of 73.47%, and F1-score of 84.51% for pharmacologic substance named entities. The ontology-based manual annotation results show that it is possible to perform high-quality annotation despite the complexity of medical terminology and the lack of context in tweets.
Recently, rapid improvements in technology and decrease in sequencing costs have made RNA-Seq a widely used technique to quantify gene expression levels. Various normalization approaches have been ...proposed, owing to the importance of normalization in the analysis of RNA-Seq data. A comparison of recently proposed normalization methods is required to generate suitable guidelines for the selection of the most appropriate approach for future experiments.
In this paper, we compared eight non-abundance (RC, UQ, Med, TMM, DESeq, Q, RPKM, and ERPKM) and two abundance estimation normalization methods (RSEM and Sailfish). The experiments were based on real Illumina high-throughput RNA-Seq of 35- and 76-nucleotide sequences produced in the MAQC project and simulation reads. Reads were mapped with human genome obtained from UCSC Genome Browser Database. For precise evaluation, we investigated Spearman correlation between the normalization results from RNA-Seq and MAQC qRT-PCR values for 996 genes. Based on this work, we showed that out of the eight non-abundance estimation normalization methods, RC, UQ, Med, TMM, DESeq, and Q gave similar normalization results for all data sets. For RNA-Seq of a 35-nucleotide sequence, RPKM showed the highest correlation results, but for RNA-Seq of a 76-nucleotide sequence, least correlation was observed than the other methods. ERPKM did not improve results than RPKM. Between two abundance estimation normalization methods, for RNA-Seq of a 35-nucleotide sequence, higher correlation was obtained with Sailfish than that with RSEM, which was better than without using abundance estimation methods. However, for RNA-Seq of a 76-nucleotide sequence, the results achieved by RSEM were similar to without applying abundance estimation methods, and were much better than with Sailfish. Furthermore, we found that adding a poly-A tail increased alignment numbers, but did not improve normalization results.
Spearman correlation analysis revealed that RC, UQ, Med, TMM, DESeq, and Q did not noticeably improve gene expression normalization, regardless of read length. Other normalization methods were more efficient when alignment accuracy was low; Sailfish with RPKM gave the best normalization results. When alignment accuracy was high, RC was sufficient for gene expression calculation. And we suggest ignoring poly-A tail during differential gene expression analysis.
Emotion detection and recognition from text is a recent essential research area in Natural Language Processing (NLP) which may reveal some valuable input to a variety of purposes. Nowadays, writings ...take many forms of social media posts, micro-blogs, news articles, customer review, etc., and the content of these short-texts can be a useful resource for text mining to discover an unhide various aspects, including emotions. The previously presented models mainly adopted word embedding vectors that represent rich semantic/syntactic information and those models cannot capture the emotional relationship between words. Recently, some emotional word embeddings are proposed but it requires semantic and syntactic information vice versa. To address this issue, we proposed a novel neural network architecture, called SENN (Semantic-Emotion Neural Network) which can utilize both semantic/syntactic and emotional information by adopting pre-trained word representations. SENN model has mainly two sub-networks, the first sub-network uses bidirectional Long-Short Term Memory (BiLSTM) to capture contextual information and focuses on semantic relationship, the second sub-network uses the convolutional neural network (CNN) to extract emotional features and focuses on the emotional relationship between words from the text. We conducted a comprehensive performance evaluation for the proposed model using standard real-world datasets. We adopted the notion of Ekman's six basic emotions. The experimental results show that the proposed model achieves a significantly superior quality of emotion recognition with various state-of-the-art approaches and further can be improved by other emotional word embeddings.
•MIQ-Tree structure for mining high utility itemsets is proposed.•MU-Growth algorithm is suggested to prune candidates effectively in the mining process.•Experimental results show that MU-Growth ...outperforms the other algorithms.
High utility itemset mining considers the importance of items such as profit and item quantities in transactions. Recently, mining high utility itemsets has emerged as one of the most significant research issues due to a huge range of real world applications such as retail market data analysis and stock market prediction. Although many relevant algorithms have been proposed in recent years, they incur the problem of generating a large number of candidate itemsets, which degrade mining performance. In this paper, we propose an algorithm named MU-Growth (Maximum Utility Growth) with two techniques for pruning candidates effectively in mining process. Moreover, we suggest a tree structure, named MIQ-Tree (Maximum Item Quantity Tree), which captures database information with a single-pass. The proposed data structure is restructured for reducing overestimated utilities. Performance evaluation shows that MU-Growth not only decreases the number of candidates but also outperforms state-of-the-art tree-based algorithms with overestimated methods in terms of runtime with a similar memory usage.
Novelty detection is a classification problem to identify abnormal patterns; therefore, it is an important task for applications such as fraud detection, fault diagnosis and disease detection. ...However, when there is no label that indicates normal and abnormal data, it will need expensive domain and professional knowledge, so an unsupervised novelty detection approach will be used. On the other hand, nowadays, using novelty detection on high dimensional data is a big challenge and previous research suggests approaches based on principal component analysis (PCA) and an autoencoder in order to reduce dimensionality. In this paper, we propose deep autoencoders with density based clustering (DAE-DBC); this approach calculates compressed data and error threshold from deep autoencoder model, sending the results to a density based cluster. Points that are not involved in any groups are not considered a novelty; the grouping points will be defined as a novelty group depending on the ratio of the points exceeding the error threshold. We have conducted the experiment by substituting components to show that the components of the proposed method together are more effective. As a result of the experiment, the DAE-DBC approach is more efficient; its area under the curve (AUC) is shown to be 13.5 percent higher than state-of-the-art algorithms and other versions of the proposed method that we have demonstrated.
Multivessel disease (MVD) is an independent risk factor for poor prognosis in acute myocardial infarction patients. Although several global risk scoring systems (RSS) are in use in clinical practice, ...there is no dedicated RSS for MVD in ST-segment elevation myocardial infarction (STEMI). The primary objective of this study is to develop a novel RSS to estimate the prognosis of patients with MVD in STEMI.We used the Korean Acute Myocardial Infarction Registry (KAMIR) to identify 2,030 STEMI patients with MVD who underwent appropriate percutaneous coronary intervention (PCI). Their data were analyzed to develop a new RSS. The prognostic power of this RSS was validated with 2,556 STEMI patients with MVD in the Korean Working Group on Myocardial Infarction Registry (KORMI).Six prognostic factors related to all-cause death in STEMI patients with MVD were age, serum creatinine, Killip Class, lower body weight, decrease in left ventricular ejection fraction, and history of cerebrovascular disease. The RSS for all-cause death was constructed using these risk factors and their statistical weight. The RSS had appropriate performance (c-index: 0.72) in the KORMI validation cohort.We developed a novel RSS that estimates all-cause death in the year following discharge for patients with MVD in STEMI appropriately treated by PCI. This novel RSS was transformed into a simple linear risk score to yield a simplified estimate prognosis of MVD among STEMI patients.
A number of block-based image-denoising methods have been presented in the literature. Those methods, however, are generally adapted to denoising the Gaussian noise, and subsequently do not show good ...performance for denoising random-valued impulse, and salt-and-pepper noise. We propose an efficient block-based image-denoising method, which is devised specially for fast denoising of impulse noise. The method first constructs a set of array pointers to image blocks containing a specific pixel value at a specific location. With this scheme, finding of blocks similar to a given block can be done by considering only the blocks pointed by the pointers corresponding to the pixel values of the block without comparing all the blocks in the input image. The experimental results show that the proposed method can achieve superior denoising performance in terms of computational time and signal-to-noise ratio measure.
•We introduce a novel algorithm mining WMFPs with only one scan over sliding window-based data stream environment.•We also provide a strategy which can prune unnecessary operations causing ...meaningless pattern generation in single paths.•In performance evaluation, we show that our approach presents better performance than previous algorithms.
As data have been accumulated more quickly in recent years, corresponding databases have also become huger, and thus, general frequent pattern mining methods have been faced with limitations that do not appropriately respond to the massive data. To overcome this problem, data mining researchers have studied methods which can conduct more efficient and immediate mining tasks by scanning databases only once. Thereafter, the sliding window model, which can perform mining operations focusing on recently accumulated parts over data streams, was proposed, and a variety of mining approaches related to this have been suggested. However, it is hard to mine all of the frequent patterns in the data stream environment since generated patterns are remarkably increased as data streams are continuously extended. Thus, methods for efficiently compressing generated patterns are needed in order to solve that problem. In addition, since not only support conditions but also weight constraints expressing items’ importance are one of the important factors in the pattern mining, we need to consider them in mining process. Motivated by these issues, we propose a novel algorithm, weighted maximal frequent pattern mining over data streams based on sliding window model (WMFP-SW) to obtain weighted maximal frequent patterns reflecting recent information over data streams. Performance experiments report that MWFP-SW outperforms previous algorithms in terms of runtime, memory usage, and scalability.
Hypertension and prehypertension are risk factors for cardiovascular diseases. However, the associations of both prehypertension and hypertension with anthropometry, blood parameters, and spirometry ...have not been investigated. The purpose of this study was to identify the risk factors for prehypertension and hypertension in middle-aged Korean adults and to study prediction models of prehypertension and hypertension combined with anthropometry, blood parameters, and spirometry. Binary logistic regression analysis was performed to assess the statistical significance of prehypertension and hypertension, and prediction models were developed using logistic regression, naïve Bayes, and decision trees. Among all risk factors for prehypertension, body mass index (BMI) was identified as the best indicator in both men odds ratio (OR) = 1.429, 95% confidence interval (CI) = 1.304⁻1.462) and women (OR = 1.428, 95% CI = 1.204⁻1.453). In contrast, among all risk factors for hypertension, BMI (OR = 1.993, 95% CI = 1.818⁻2.186) was found to be the best indicator in men, whereas the waist-to-height ratio (OR = 2.071, 95% CI = 1.884⁻2.276) was the best indicator in women. In the prehypertension prediction model, men exhibited an area under the receiver operating characteristic curve (AUC) of 0.635, and women exhibited a predictive power with an AUC of 0.777. In the hypertension prediction model, men exhibited an AUC of 0.700, and women exhibited an AUC of 0.845. This study proposes various risk factors for prehypertension and hypertension, and our findings can be used as a large-scale screening tool for controlling and managing hypertension.