The struggle between security analysts and malware developers is a never-ending battle with the complexity of malware changing as quickly as innovation grows. Current state-of-the-art research focus ...on the development and application of machine learning techniques for malware detection due to its ability to keep pace with malware evolution. This survey aims at providing a systematic and detailed overview of machine learning techniques for malware detection and in particular, deep learning techniques. The main contributions of the paper are: (1) it provides a complete description of the methods and features in a traditional machine learning workflow for malware detection and classification, (2) it explores the challenges and limitations of traditional machine learning and (3) it analyzes recent trends and developments in the field with special emphasis on deep learning approaches. Furthermore, (4) it presents the research issues and unsolved challenges of the state-of-the-art techniques and (5) it discusses the new directions of research. The survey helps researchers to have an understanding of the malware detection field and of the new developments and directions of research explored by the scientific community to tackle the problem.
•It presents a systematic review of M.L. approaches for malware detection.•Traditional approaches are classified into static, dynamic and hybrid approaches.•It provides a detailed description of the features in a traditional M.L. worflkow.•It introduces new research directions such as deep learning and multimodal approaches.•It discusses the research issues and challenges faced by security researchers.
Effective wind power prediction will facilitate the world’s long-term goal in sustainable development. However, a drawback of wind as an energy source lies in its high variability, resulting in a ...challenging study in wind power forecasting. To solve this issue, a novel data-driven approach is proposed for wind power forecasting by integrating data pre-processing & re-sampling, anomalies detection & treatment, feature engineering, and hyperparameter tuning based on gated recurrent deep learning models, which is systematically presented for the first time. Besides, a novel deep learning neural network of Gated Recurrent Unit (GRU) is successfully developed and critically compared with the algorithm of Long Short-term Memory (LSTM). Initially, twelve features were engineered into the predictive model, which are wind speeds at four different heights, generator temperature, and gearbox temperature. The simulation results showed that, in terms of wind power forecasting, the proposed approach can capture a high degree of accuracy at lower computational costs. It can also be concluded that GRU outperformed LSTM in predictive accuracy under all observed tests, which provided faster training process and less sensitivity to noise in the used Supervisory Control and Data Acquisition (SCADA) datasets.
•The designed models have been validated against SCADA measurements.•Isolation forest improved the accuracy of deep-learning-based GRU and LSTM.•Proposed approaches capture a high degree of accuracy at lower computational costs.•A novel deep learning model of Gated Recurrent Unit (GRU) is effectively developed.•A novel data-driven approach is proposed for wind power forecasting.
Traditional tourism demand forecasting models may face challenges when massive amounts of search intensity indices are adopted as tourism demand indicators. Using a deep learning approach, this ...research studied the framework in forecasting monthly Macau tourist arrival volumes. The empirical results demonstrated that the deep learning approach significantly outperforms support vector regression and artificial neural network models. Moreover, the construction and identification of highly relevant features from the proposed deep network architecture provide practitioners with a means of understanding the relationships between various tourist demand forecasting factors and tourist arrival volumes.
This article also launches the Annals of Tourism Research Curated Collection on Tourism Demand Forecasting, a special selection of research in this field
•A deep learning method is presented to forecast tourist demand.•The introduced method represents an automated approach to feature engineering.•The method overcomes the linearity limitations of existing lag order detection.•The case study on Macau confirms the superior performance of the proposed approach.•The introduced method can be applied to different tourism destinations.
Real-time, accurate, and stable forecasting plays a vital role in making strategic decisions in the smart grid (SG). This ensures economic savings, effective planning, and reliable and secure power ...system operation. However, accurate and stable forecasting is challenging due to the uncertain and intermittent electric load behavior. In this context, a rigid forecasting model with assertive stochastic and non-linear behavior capturing abilities is needed. Thus, a support vector regression (SVR) model emerged to cater the non-linear time-series predictions. However, it suffers from computational complexity and hard-to-tune appropriate parameters problem. Due to these problems, forecasting results of SVR are not as accurate as required. To solve such problems, a novel hybrid approach is developed by integrating feature engineering (FE) and modified fire-fly optimization (mFFO) algorithm with SVR, namely FE-SVR-mFFO forecasting framework. FE eliminates redundant and irrelevant features to ensure high computational efficiency. The mFFO algorithm obtains and tunes the SVR model’s appropriate parameters to effectively avoid trapping into local optimum and returns accurate forecasting results. Besides, most literature studies are focused on forecast accuracy improvement. However, the forecasting model’s effectiveness and productiveness are determined equally by its stability and convergence rate. Considering only one objective (accuracy or stability or convergence rate) is inadequate; thus, the proposed FE-SVR-mFFO forecasting framework achieves these three relatively independent objectives simultaneously. To evaluate the effectiveness and applicability of the proposed framework, real half-hourly load data of five states of Australia (New South Wales (NSW), Queensland (QLD), South Australia (SA), Tasmania (TAS), and Victoria (VIC)) are employed as a case study. Experimental results show that the proposed framework outperforms benchmark frameworks like EMD-SVR-PSO, FS-TSFE-CBSSO, VMD-FFT-IOSVR, and DCP-SVM-WO in terms of accuracy, stability, and convergence rate.
•A novel FE-SVR-mFFO model is proposed for electric load forecasting.•Feature engineering method is used to speed up the SVR model training process.•A modified firefly algorithm is introduced to select and optimize SVR parameters.•Prediction results confirm the proposed model’s applicability in aspects of objectives compared to conventional models.•The model is generic and can be applied to a variety of indicators and geographic regions.
In modern industries, machine health monitoring systems (MHMS) have been applied wildly with the goal of realizing predictive maintenance including failures tracking, downtime reduction, and assets ...preservation. In the era of big machinery data, data-driven MHMS have achieved remarkable results in the detection of faults after the occurrence of certain failures (diagnosis) and prediction of the future working conditions and the remaining useful life (prognosis). The numerical representation for raw sensory data is the key stone for various successful MHMS. Conventional methods are the labor-extensive as they usually depend on handcrafted features, which require expert knowledge. Inspired by the success of deep learning methods that redefine representation learning from raw data, we propose local feature-based gated recurrent unit (LFGRU) networks. It is a hybrid approach that combines handcrafted feature design with automatic feature learning for machine health monitoring. First, features from windows of input time series are extracted. Then, an enhanced bidirectional GRU network is designed and applied on the generated sequence of local features to learn the representation. A supervised learning layer is finally trained to predict machine condition. Experiments on three machine health monitoring tasks: tool wear prediction, gearbox fault diagnosis, and incipient bearing fault detection verify the effectiveness and generalization of the proposed LFGRU.
The proliferation of data collection technologies often results in large data sets with many observations and many variables. In practice, highly relevant engineered features are often groups of ...predictors that share a common regression coefficient (i.e., the predictors in the group affect the response only via their collective sum), where the groups are unknown in advance and must be discovered from the data. We propose an algorithm called coefficient tree regression (CTR) to discover the group structure and fit the resulting regression model. In this regard CTR is an automated way of engineering new features, each of which is the collective sum of the predictors within each group. The algorithm can be used when the number of variables is larger than, or smaller than, the number of observations. Creating new features that affect the response in a similar manner improves predictive modeling, especially in domains where the relationships between predictors are not known a priori. CTR borrows computational strategies from both linear regression (fast model updating when adding/modifying a feature in the model) and regression trees (fast partitioning to form and split groups) to achieve outstanding computational and predictive performance. Finding features that represent hidden groups of predictors (i.e., a hidden ontology) that impact the response only via their sum also has major interpretability advantages, which we demonstrate with a real data example of predicting political affiliations with television viewing habits. In numerical comparisons over a variety of examples, we demonstrate that both computational expense and predictive performance are far superior to existing methods that create features as groups of predictors. Moreover, CTR has overall predictive performance that is comparable to or slightly better than the regular lasso method, which we include as a reference benchmark for comparison even though it is non-group-based, in addition to having substantial computational and interpretive advantages over lasso.
Display omitted
Artificial intelligence (AI) has the potential to reshape pharmaceutical formulation development through its ability to analyze and continuously monitor large datasets. Fused ...deposition modeling (FDM) three-dimensional printing (3DP) has made significant advancements in the field of oral drug delivery with personalized drug-loaded formulations being designed, developed and dispensed for the needs of the patient. The FDM 3DP process begins with the production of drug-loaded filaments by hot melt extrusion (HME), followed by the printing of a drug product using a FDM 3D printer. However, the optimization of the fabrication parameters is a time-consuming, empirical trial approach, requiring expert knowledge. Here, M3DISEEN, a web-based pharmaceutical software, was developed to accelerate FDM 3D printing using AI machine learning techniques (MLTs). In total, 614 drug-loaded formulations were designed from a comprehensive list of 145 different pharmaceutical excipients, 3D printed and assessed in-house. To build the predictive tool, a dataset was constructed and models were trained and tested at a ratio of 75:25. Significantly, the AI models predicted key fabrication parameters with accuracies of 76% and 67% for the printability and the filament characteristics, respectively. Furthermore, the AI models predicted the HME and FDM processing temperatures with a mean absolute error of 8.9 °C and 8.3 °C, respectively. Strikingly, the AI models achieved high levels of accuracy by solely inputting the pharmaceutical excipient trade names. Therefore, AI provides an effective holistic modeling technology and software to streamline and advance 3DP as a significant technology within drug development. M3DISEEN is available at (http://m3diseen.com/predictions/).
•The paper proposes a model for Fault Detection and Diagnosis of rotating machinery and validates it on different datasets.•The paper proposes a multi-domain feature set composed of FFT, CWT and raw ...sensory signals revealing the fault signatures.•The combination of the proposed CLSTM and the multi-domain features outperforms the state-of-the-art FDD models.•A sensitivity analysis is conducted on the burst-length, illustrating its importance on the performance of the FDD models.
Fault Detection and Diagnosis (FDD) of rotating machinery plays a key role in reducing the maintenance costs of the manufacturing systems. How to improve the FDD accuracy is an open and challenging issue. To make full use of signals and reveal all the fault features, this paper proposes a new feature engineering model which combines Fast Fourier Transform (FFT), Continuous Wavelet Transform (CWT) and statistical features of raw signals. Then a novel Convolutional Long Short-Term Memory (CLSTM) is developed to understand and classify these multi-channel array inputs. In order to evaluate the effectiveness of the proposed model, three different datasets are used. The paper performs a sensitivity analysis on the input channels to evaluate the efficiency of the proposed multi-domain feature set in different DL architectures, where CLSTM shows its superiority in understanding the feature set. Secondly, a comprehensive review of the state-of-the-art models is conducted, and twelve algorithms are chosen for the comparison to evaluate the performance of the proposed FDD model. The paper also performs an input length sensitivity analysis, showing that the proposed model can achieve 100 % of accuracy with shorter inputs compared to other models, meaning that it causes less delay in an online condition monitoring system. The results demonstrate the superiority of the proposed model over the state-of-the-art models in terms of accuracy on different datasets.
Offshore wind is a rapidly maturing renewable energy that has presented a large growth over the last decade. This increase in offshore wind capacity has led to the need for more effective monitoring ...strategies, as currently, Operation and Maintenance (O&M) costs make up to 30% of the overall cost of energy. This study presented a novel data-driven approach to condition monitoring systems by utilizing the existing Supervisory Control And Data Acquisition (SCADA) system and integrating a wide range of machine learning and data mining techniques namely: data pre-processing & re-sampling, anomalies detection & treatment, feature engineering, and hyperparameter optimization, to design a Normal Behaviour Model of the generator for fault detection purposes. An ensemble model of the Extreme Gradient Boosting (XGBoost) framework was successfully developed and critically compared with a Long Short-Term Memory (LSTM) deep learning neural network. The results showed that, in terms of temperature prediction, the proposed methodology captures a high level of accuracy at low computational costs. Moreover, it can be concluded that XGBoost outperformed LSTM in predictive accuracy whilst requiring smaller training times and showcasing a smaller sensitivity to noise that existed in the SCADA database.
•A novel data-driven approach is proposed for offshore wind turbine fault detection.•Introduced XGBoost models resisted better performance than LSTM method.•Proposed approaches capture a high degree of accuracy at lower computational costs.