Reservoir computing has emerged in the last decade as an alternative to gradient descent methods for training recurrent neural networks. Echo State Network (ESN) is one of the key reservoir computing ...“flavors”. While being practical, conceptually simple, and easy to implement, ESNs require some experience and insight to achieve the hailed good performance in many tasks. Here we present practical techniques and recommendations for successfully applying ESNs, as well as some more advanced application-specific modifications.
Echo State Networks and Liquid State Machines introduced a new paradigm in artificial recurrent neural network (RNN) training, where an RNN (the
reservoir) is generated randomly and only a readout is ...trained. The paradigm, becoming known as
reservoir computing, greatly facilitated the practical application of RNNs and outperformed classical fully trained RNNs in many tasks. It has lately become a vivid research field with numerous extensions of the basic idea, including reservoir adaptation, thus broadening the initial paradigm to
using different methods for training the reservoir and the readout. This review systematically surveys both current ways of generating/adapting the reservoirs and training different types of readouts. It offers a natural conceptual classification of the techniques, which transcends boundaries of the current “brand-names” of reservoir methods, and thus aims to help in unifying the field and providing the reader with a detailed “map” of it.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UL, UM, UPCLJ, UPUK
Cross-Validation (CV) is still uncommon in time series modeling. Echo State Networks (ESNs), as a prime example of Reservoir Computing (RC) models, are known for their fast and precise one-shot ...learning, that often benefit from good hyper-parameter tuning. This makes them ideal to change the status quo. We discuss CV of time series for predicting a concrete time interval of interest, suggest several schemes for cross-validating ESNs and introduce an efficient algorithm for implementing them. This algorithm is presented as two levels of optimizations of doing
k
-fold CV. Training an RC model typically consists of two stages: (i) running the reservoir with the data and (ii) computing the optimal readouts. The first level of our optimization addresses the most computationally expensive part (i) and makes it remain constant irrespective of
k
. It dramatically reduces reservoir computations in any type of RC system and is enough if
k
is small. The second level of optimization also makes the (ii) part remain constant irrespective of large
k
, as long as the dimension of the output is low. We discuss when the proposed validation schemes for ESNs could be beneficial, three options for producing the final model and empirically investigate them on six different real-world datasets, as well as do empirical computation time experiments. We provide the code in an online repository. Proposed CV schemes give better and more stable test performance in all the six different real-world datasets, three task types. Empirical run times confirm our complexity analysis. In most situations,
k
-fold CV of ESNs and many other RC models can be done for virtually the same time and space complexity as a simple single-split validation. This enables CV to become a standard practice in RC.
Full text
Available for:
EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, KILJ, KISLJ, MFDPS, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
Standard echo state networks (ESNs) are built from simple additive units with a sigmoid activation function. Here we investigate ESNs whose reservoir units are leaky integrator units. Units of this ...type have individual state dynamics, which can be exploited in various ways to accommodate the network to the temporal characteristics of a learning task. We present stability conditions, introduce and investigate a stochastic gradient descent method for the optimization of the global learning parameters (input and output feedback scalings, leaking rate, spectral radius) and demonstrate the usefulness of leaky-integrator ESNs for (i) learning very slow dynamic systems and replaying the learnt system at different speeds, (ii) classifying relatively slow and noisy time series (the Japanese Vowel dataset — here we obtain a zero test error rate), and (iii) recognizing strongly time-warped dynamic patterns.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UL, UM, UPCLJ, UPUK
Convolutional Neural Networks (CNN) possess many positive qualities when it comes to spatial raster data. Translation invariance enables CNNs to detect features regardless of their position in the ...scene. However, in some domains, like geospatial, not all locations are exactly equal. In this work, we propose localized convolutional neural networks that enable convolutional architectures to learn local features in addition to the global ones. We investigate their instantiations in the form of learnable inputs, local weights, and a more general form. They can be added to any convolutional layers, easily end-to-end trained, introduce minimal additional complexity, and let CNNs retain most of their benefits to the extent that they are needed. In this work we address spatio-temporal prediction: test the effectiveness of our methods on a synthetic benchmark dataset and tackle three real-world wind prediction datasets. For one of them, we propose a method to spatially order the unordered data. We compare the recent state-of-the-art spatio-temporal prediction models on the same data. Models that use convolutional layers can be and are extended with our localizations. In all these cases our extensions improve the results, and thus often the state-of-the-art. We share all the code at a public repository.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
The spatial QRS-T angle is a promising health indicator for risk stratification of sudden cardiac death (SCD). Thus far, the angle is estimated solely from 12-lead electrocardiogram (ECG) systems ...uncomfortable for ambulatory monitoring. Methods to estimate QRS-T angles from reduced-lead ECGs registered with consumer healthcare devices would, therefore, facilitate ambulatory monitoring. (1) Objective: Develop a method to estimate spatial QRS-T angles from reduced-lead ECGs. (2) Approach: We designed a deep learning model to locate the QRS and T wave vectors necessary for computing the QRS-T angle. We implemented an original loss function to guide the model in the 3D space to search for each vector’s coordinates. A gradual reduction of ECG leads from the largest publicly available dataset of clinical 12-lead ECG recordings (PTB-XL) is used for training and validation. (3) Results: The spatial QRS-T angle can be estimated from leads {I, II, aVF, V2} with sufficient accuracy (absolute mean and median errors of 11.4° and 7.3°) for detecting abnormal angles without sacrificing patient comfortability. (4) Significance: Our model could enable ambulatory monitoring of spatial QRS-T angles using patch- or textile-based ECG devices. Populations at risk of SCD, like chronic cardiac and kidney disease patients, might benefit from this technology.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
Data are often sampled irregularly in time. Dealing with this using Recurrent Neural Networks (RNNs) traditionally involved ignoring the fact, feeding the time differences as additional inputs, or ...resampling the data. All these methods have their shortcomings. We propose an elegant straightforward alternative approach where instead the RNN is in effect resampled in time to match the time of the data or the task at hand. We use Echo State Network (ESN) and Gated Recurrent Unit (GRU) as the basis for our solution. Such RNNs can be seen as discretizations of continuous-time dynamical systems, which gives a solid theoretical ground to our approach. Our Task-Synchronized ESN (TSESN) and GRU (TSGRU) models allow for a direct model time setting and require no additional training, parameter tuning, or computation (solving differential equations or interpolating data) compared to their regular counterparts, thus retaining their original efficiency. We confirm empirically that our models can effectively compensate for the time-non-uniformity of the data and demonstrate that they compare favorably to data resampling, classical RNN methods, and alternative RNN models proposed to deal with time irregularities on several real-world nonuniform-time datasets. We open-source the code at https://github.com/oshapio/task-synchronized-RNNs .
Recent advancements in Large Language Models (LLMs) and their utilization in code generation tasks have significantly reshaped the field of software development. Despite the remarkable efficacy of ...code completion solutions in mainstream programming languages, their performance lags when applied to less ubiquitous formats such as OpenAPI definitions. This study evaluates the OpenAPI completion performance of GitHub Copilot, a prevalent commercial code completion tool, and proposes a set of task-specific optimizations leveraging Meta's open-source model Code Llama. A semantics-aware OpenAPI completion benchmark proposed in this research is used to perform a series of experiments through which the impact of various prompt-engineering and fine-tuning techniques on the Code Llama model's performance is analyzed. The fine-tuned Code Llama model reaches a peak correctness improvement of 55.2% over GitHub Copilot despite utilizing 25 times fewer parameters than the commercial solution's underlying Codex model. Additionally, this research proposes an enhancement to a widely used code infilling training technique, addressing the issue of underperformance when the model is prompted with context sizes smaller than those used during training. The dataset, the benchmark, and the model fine-tuning code are made publicly available.
Everyone wants to write beautiful and correct text, yet the lack of language skills, experience, or hasty typing can result in errors. By employing the recent advances in transformer architectures, ...we construct a grammatical error correction model for Lithuanian, the language rich in archaic features. We compare subword and byte-level approaches and share our best trained model, achieving F\(_{0.5}\)=0.92, and accompanying code, in an online open-source repository.