Recognizing lines of unconstrained handwritten text is a challenging task. The difficulty of segmenting cursive or overlapping characters, combined with the need to exploit surrounding context, has ...led to low recognition rates for even the best current recognizers. Most recent progress in the field has been made either through improved preprocessing or through advances in language modeling. Relatively little work has been done on the basic recognition algorithms. Indeed, most systems rely on the same hidden Markov models that have been used for decades in speech and handwriting recognition, despite their well-known shortcomings. This paper proposes an alternative approach based on a novel type of recurrent neural network, specifically designed for sequence labeling tasks where the data is hard to segment and contains long-range bidirectional interdependencies. In experiments on two large unconstrained handwriting databases, our approach achieves word recognition accuracies of 79.7 percent on online data and 74.1 percent on offline data, significantly outperforming a state-of-the-art HMM-based system. In addition, we demonstrate the network's robustness to lexicon size, measure the individual influence of its hidden layers, and analyze its use of context. Last, we provide an in-depth discussion of the differences between the network and HMMs, suggesting reasons for the network's superior performance.
In this paper, we present bidirectional Long Short Term Memory (LSTM) networks, and a modified, full gradient version of the LSTM learning algorithm. We evaluate Bidirectional LSTM (BLSTM) and ...several other network architectures on the benchmark task of framewise phoneme classification, using the TIMIT database. Our main findings are that bidirectional networks outperform unidirectional ones, and Long Short Term Memory (LSTM) is much faster and also more accurate than both standard Recurrent Neural Nets (RNNs) and time-windowed Multilayer Perceptrons (MLPs). Our results support the view that contextual information is crucial to speech processing, and suggest that BLSTM is an effective architecture with which to exploit it.
1
1
An abbreviated version of some portions of this article appeared in (
Graves and Schmidhuber, 2005), as part of the IJCNN 2005 conference proceedings, published under the IEEE copyright.
Traditional methods of computer vision and machine learning cannot match human performance on tasks such as the recognition of handwritten digits or traffic signs. Our biologically plausible, wide ...and deep artificial neural network architectures can. Small (often minimal) receptive fields of convolutional winner-take-all neurons yield large network depth, resulting in roughly as many sparsely connected neural layers as found in mammals between retina and visual cortex. Only winner neurons are trained. Several deep neural columns become experts on inputs preprocessed in different ways; their predictions are averaged. Graphics cards allow for fast training. On the very competitive MNIST handwriting benchmark, our method is the first to achieve near-human performance. On a traffic sign recognition benchmark it outperforms humans by a factor of two. We also improve the state-of-the-art on a plethora of common image classification benchmarks.
Chronic GVHD (cGVHD) remains the most important cause of late non-relapse mortality post allogeneic hematopoietic SCT (HSCT). Although first-line treatment of cGVHD with steroids is well established, ...evidence for second-line treatment remains limited. Here, we report a dual center retrospective analysis of the off-label salvage treatment of steroid-refractory cGVHD with everolimus. Out of 80 patients with a median age of 50 (17-70) years, 14 (17%) suffered from mild, 39 (49%) from moderate and 27 (34%) from severe cGVHD. At the final analysis, median follow-up after introduction of everolimus was 724 (14-2205) days. Thirty-four patients (43%) required the addition of further immunosuppression during everolimus-based therapy. Global NIH Severity Score improved in 34 patients (43%), remained stable in 37 patients (46%) and worsened in 9 patients (11%). The total sum of Global NIH Severity Scores in all patients assessable was significantly reduced after treatment with everolimus (P<0.0001). Most frequent grade 3/4 toxicities included infections (n=30) and thrombocytopenia (n=15). There was a single case of relapse. Everolimus-based salvage treatment of refractory cGVHD results in significant improvement of the NIH Severity Score without impairing control of the malignant disease. Finally, these preliminary results demand further verification in prospective trials.
Objective: We show that state-of-the-art deep neural networks achieve superior results in regression-based multi-class proportional myoelectric hand prosthesis control than two common baseline ...approaches, and we analyze the neural network mapping to explain why this is the case. Methods: Feedforward neural networks and baseline systems are trained on an offline corpus of 11 able-bodied subjects and 4 prosthesis wearers, using the <inline-formula><tex-math notation="LaTeX">R^2</tex-math></inline-formula> score as metric. Analysis is performed using diverse qualitative and quantitative approaches, followed by a rigorous evaluation. Results: Our best neural networks have at least three hidden layers with at least 128 neurons per layer; smaller architectures, as used by many prior studies, perform substantially worse. The key to good performance is to both optimally regress the target movement, and to suppress spurious movements. Due to the properties of the underlying data, this is impossible to achieve with linear methods, but can be attained with high exactness using sufficiently large neural networks. Conclusion: Neural networks perform significantly better than common linear approaches in the given task, in particular when sufficiently large architectures are used. This can be explained by salient properties of the underlying data, and by theoretical and experimental analysis of the neural network mapping. Significance: To the best of our knowledge, this work is the first one in the field which not only reports that large and deep neural networks are superior to existing architectures, but also explains this result.
Issue Title: Special Issue: Meta-Learning We introduce a general and in a certain sense time-optimal way of solving one problem after another, efficiently searching the space of programs that compute ...solution candidates, including those programs that organize and manage and adapt and reuse earlier acquired knowledge. The Optimal Ordered Problem Solver (OOPS) draws inspiration from Levin's Universal Search designed for single problems and universal Turing machines. It spends part of the total search time for a new problem on testing programs that exploit previous solution-computing programs in computable ways. If the new problem can be solved faster by copy-editing/invoking previous code than by solving the new problem from scratch, then OOPS will find this out. If not, then at least the previous solutions will not cause much harm. We introduce an efficient, recursive, backtracking-based way of implementing OOPS on realistic computers with limited storage. Experiments illustrate how OOPS can greatly profit from metalearning or metasearching, that is, searching for faster search procedures.PUBLICATION ABSTRACT
Long Short-Term Memory Hochreiter, Sepp; Schmidhuber, Jürgen
Neural computation,
11/1997, Volume:
9, Issue:
8
Journal Article
Peer reviewed
Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's ...(1991) analysis of this problem, then address it by introducing a novel, efficient, gradient based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is
. 1. Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.
In response to Rodriguez's recent article (2001), we compare the performance of simple recurrent nets and long short-term memory recurrent nets on context-free and context-sensitive languages.
The temporal distance between events conveys information essential for numerous sequential tasks such as motor control and rhythm detection. While Hidden Markov Models tend to ignore this ...information, recurrent neural networks (RNNs) can in principle learn to make use of it. We focus on Long Short-Term Memory (LSTM) because it has been shown to outperform other RNNs on tasks involving long time lags. We find that LSTM augmented by "peephole connections" from its internal cells to its multiplicative gates can learn the fine distinction between sequences of spikes spaced either 50 or 49 time steps apart without the help of any short training exemplars. Without external resets or teacher forcing, our LSTM variant also learns to generate stable streams of precisely timed spikes and other highly nonlinear periodic patterns. This makes LSTM a promising approach for tasks that require the accurate measurement or generation of time intervals.