Because of their effectiveness in broad practical applications, LSTM networks have received a wealth of coverage in scientific journals, technical blogs, and implementation guides. However, in most ...articles, the inference formulas for the LSTM network and its parent, RNN, are stated axiomatically, while the training formulas are omitted altogether. In addition, the technique of “unrolling” an RNN is routinely presented without justification throughout the literature. The goal of this tutorial is to explain the essential RNN and LSTM fundamentals in a single document. Drawing from concepts in Signal Processing, we formally derive the canonical RNN formulation from differential equations. We then propose and prove a precise statement, which yields the RNN unrolling technique. We also review the difficulties with training the standard RNN and address them by transforming the RNN into the “Vanilla LSTM”11The nickname “Vanilla LSTM” symbolizes this model’s flexibility and generality (Greff et al., 2015). network through a series of logical arguments. We provide all equations pertaining to the LSTM system together with detailed descriptions of its constituent entities. Albeit unconventional, our choice of notation and the method for presenting the LSTM system emphasizes ease of understanding. As part of the analysis, we identify new opportunities to enrich the LSTM system and incorporate these extensions into the Vanilla LSTM network, producing the most general LSTM variant to date. The target reader has already been exposed to RNNs and LSTM networks through numerous available resources and is open to an alternative pedagogical approach. A Machine Learning practitioner seeking guidance for implementing our new augmented LSTM model in software for experimentation and research will find the insights and derivations in this treatise valuable as well.
•Recurrent Neural Network (RNN) definition follows from Delay Differential Equations.•RNN unfolding technique is formally justified as approximating an infinite sequence.•Long Short-Term Memory Network (LSTM) can be logically rationalized from RNN.•System diagrams with complete derivation of LSTM training equations are provided.•New LSTM extensions: external input gate and convolutional input context windows.
•An hour ahead forecasting of power output for three different PV systems.•A hybrid deep learning algorithm (SSA-RNN-LSTM) is proposed.•The proposed model is better than RNN-LSTM, GA-RNN-LSTM and ...PSO-RNN-LSTM.•The model is robust for three different PV systems over four years data.
The integration of photovoltaic energy into a grid demands accurate power output forecasting. In this research, an hour ahead prediction of power output is performed on an annual basis over real data period (2016–2019) for three different PV systems based on polycrystalline, monocrystalline, and thin-film technologies. The solar radiation, ambient temperature, module temperature and wind speed are the considered input parameters, while the power output of each PV system is the output parameter. A hybrid deep learning (DL) method (SSA-RNN-LSTM) is proposed for an hour ahead prediction of output power for each PV system. The proposed technique is compared with GA-RNN-LSTM, PSO-RNN-LSTM and RNN-LSTM. The considered forecasting accuracy measurement parameters are RMSE, MSE, MAE and coefficient of determination (R2). The findings elaborate that SSA-RNN-LSTM has shown better forecasting accuracy with the lowest (RMSE and MSE), highest (R2) and highest convergence speed compared to other methods. The proposed model has shown testing (RMSE and MAE) of (19.14% and 21.57%), (15.4% and 10.81%) and (22.9% and 25.2%) lower than RNN-LSTM for polycrystalline, monocrystalline and thin-film PV systems respectively. Furthermore, the proposed model is found more robust in predicting the power output for three different PV systems over four years data period.
•A thorough review of techniques, algorithms, datasets, and tasks for fake news detection.•An overview of text processing deep learning architectures for handling fake news detection as a text ...classification task.•A novel, hybrid CNN-RNN model for the task.•An extensive evaluation on benchmark datasets with very positive results.
The explosion of social media allowed individuals to spread information without cost, with little investigation and fewer filters than before. This amplified the old problem of fake news, which became a major concern nowadays due to the negative impact it brings to the communities. In order to tackle the rise and spreading of fake news, automatic detection techniques have been researched building on artificial intelligence and machine learning. The recent achievements of deep learning techniques in complex natural language processing tasks, make them a promising solution for fake news detection too. This work proposes a novel hybrid deep learning model that combines convolutional and recurrent neural networks for fake news classification. The model was successfully validated on two fake news datasets (ISO and FA-KES), achieving detection results that are significantly better than other non-hybrid baseline methods. Further experiments on the generalization of the proposed model across different datasets, had promising results.
The aim of this paper is to map agricultural crops by classifying satellite image time series. Domain experts in agriculture work with crop type labels that are organised in a hierarchical tree ...structure, where coarse classes (like orchards) are subdivided into finer ones (like apples, pears, vines, etc.). We develop a crop classification method that exploits this expert knowledge and significantly improves the mapping of rare crop types. The three-level label hierarchy is encoded in a convolutional, recurrent neural network (convRNN), such that for each pixel the model predicts three labels at different level of granularity. This end-to-end trainable, hierarchical network architecture allows the model to learn joint feature representations of rare classes (e.g., apples, pears) at a coarser level (e.g., orchard), thereby boosting classification performance at the fine-grained level. Additionally, labelling at different granularity also makes it possible to adjust the output according to the classification scores; as coarser labels with high confidence are sometimes more useful for agricultural practice than fine-grained but very uncertain labels. We validate the proposed method on a new, large dataset that we make public. ZueriCrop covers an area of 50 km × 48 km in the Swiss cantons of Zurich and Thurgau with a total of 116′000 individual fields spanning 48 crop classes, and 28,000 (multi-temporal) image patches from Sentinel-2. We compare our proposed hierarchical convRNN model with several baselines, including methods designed for imbalanced class distributions. The hierarchical approach performs superior by at least 9.9 percentage points in F1-score.
•Vegetation species identification and mapping.•Crop mapping from image time series.•Deep learning with multi-scale label hierarchies.
•LSTM and GRU networks are used for short term runoff predictions.•These models process rainfall and runoff sequence data better than ANN models.•No time step optimization is required.•GRU has simple ...structures and performs as well as LSTM.
Runoff forecasting is an important approach for flood mitigation. Many machine learning models have been proposed for runoff forecasting in recent years. To reconstruct the time series of runoff data into a standard machine learning dataset, a sliding window method is usually used to pre-process the data, with the size of the window as a variable parameter which is commonly referred to as the time step. Conventional machine learning methods, such as artificial neural network models (ANN), require optimization of the time step because both too small and too large time steps reduce prediction accuracy. In this work two popular variants of Recurrent Neural Network (RNN) named Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks were employed to develop new data-driven flood forecasting models. GRU and LSTM models are in theory able to filter redundant information automatically, and therefore a large time step is expected to not reduce prediction accuracy. The three models (LSTM, GRU, and ANN) were applied to simulate runoff in the Yutan station control catchment, Fujian Province, Southeast China, using hourly discharge measurements of one runoff station and hourly rainfall of four rainfall stations from 2000 to 2014. Results show that the prediction accuracy of LSTM and GRU models increases with increasing time step, and eventually stabilizes. This allows selection of a relatively large time step in practical runoff prediction without first evaluating and optimizing the time step required by conventional machine learning models. We also show that LSTM and GRU models perform better than ANN models when the time step is optimized. GRU models have fewer parameters and less complicated structures compared to LSTM models, and our results show that GRU models perform equally well as LSTM models. GRU may be the preferred method in short term runoff predictions since it requires less time for model training.
Gating Revisited: Deep Multi-Layer RNNs That can be Trained Turkoglu, Mehmet Ozgur; DaAronco, Stefano; Wegner, Jan Dirk ...
IEEE transactions on pattern analysis and machine intelligence,
08/2022, Letnik:
44, Številka:
8
Journal Article
Recenzirano
Odprti dostop
We propose a new STAckable Recurrent cell (STAR) for recurrent neural networks (RNNs), which has fewer parameters than widely used LSTM 16 and GRU 10 while being more robust against vanishing or ...exploding gradients. Stacking recurrent units into deep architectures suffers from two major limitations: (i) many recurrent cells (e.g., LSTMs) are costly in terms of parameters and computation resources; and (ii) deep RNNs are prone to vanishing or exploding gradients during training. We investigate the training of multi-layer RNNs and examine the magnitude of the gradients as they propagate through the network in the "vertical" direction. We show that, depending on the structure of the basic recurrent unit, the gradients are systematically attenuated or amplified. Based on our analysis we design a new type of gated cell that better preserves gradient magnitude. We validate our design on a large number of sequence modelling tasks and demonstrate that the proposed STAR cell allows to build and train deeper recurrent architectures, ultimately leading to improved performance while being computationally more efficient.
Financial time series forecasting is undoubtedly the top choice of computational intelligence for finance researchers in both academia and the finance industry due to its broad implementation areas ...and substantial impact. Machine Learning (ML) researchers have created various models, and a vast number of studies have been published accordingly. As such, a significant number of surveys exist covering ML studies on financial time series forecasting. Lately, Deep Learning (DL) models have appeared within the field, with results that significantly outperform their traditional ML counterparts. Even though there is a growing interest in developing models for financial time series forecasting, there is a lack of review papers that solely focus on DL for finance. Hence, the motivation of this paper is to provide a comprehensive literature review of DL studies on financial time series forecasting implementation. We not only categorized the studies according to their intended forecasting implementation areas, such as index, forex, and commodity forecasting, but we also grouped them based on their DL model choices, such as Convolutional Neural Networks (CNNs), Deep Belief Networks (DBNs), and Long-Short Term Memory (LSTM). We also tried to envision the future of the field by highlighting its possible setbacks and opportunities for the benefit of interested researchers.
•We reviewed all searchable articles of deep learning (DL) for financial time series forecasting.•RNN based DL models (LSTM and GRU included) are the most common.•We compared DL models according to their performances in different forecasted asset classes.•To best of our knowledge, this is the first comprehensive DL survey for financial time series forecasting.•We provided current status of DL in financial time series forecasting, also highlighted the future opportunities.
COVID-19 was declared a global pandemic by the World Health Organisation (WHO) on 11th March 2020. Many researchers have, in the past, attempted to predict a COVID outbreak and its effect. Some have ...regarded time-series variables as primary factors which can affect the onset of infectious diseases like influenza and severe acute respiratory syndrome (SARS). In this study, we have used public datasets provided by the European Centre for Disease Prevention and Control for developing a prediction model for the spread of the COVID-19 outbreak to and throughout Malaysia, Morocco and Saudi Arabia. We have made use of certain effective deep learning (DL) models for this purpose. We assessed some specific major features for predicting the trend of the existing COVID-19 outbreak in these three countries. In this study, we also proposed a DL approach that includes recurrent neural network (RNN) and long short-term memory (LSTM) networks for predicting the probable numbers of COVID-19 cases. The LSTM models showed a 98.58% precision accuracy while the RNN models showed a 93.45% precision accuracy. Also, this study compared the number of coronavirus cases and the number of resulting deaths in Malaysia, Morocco and Saudi Arabia. Thereafter, we predicted the number of confirmed COVID-19 cases and deaths for a subsequent seven days. In this study, we presented their predictions using the data that was available up to December 3rd, 2020.