Gaussian processes (GPs) provide a principled, practical, probabilistic approach to learning in kernel machines. GPs have received increased attention in the machine-learning community over the past ...decade, and this book provides a long-needed systematic and unified treatment of theoretical and practical aspects of GPs in machine learning. The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics.The book deals with the supervised-learning problem for both regression and classification, and includes detailed algorithms. A wide variety of covariance (kernel) functions are presented and their properties discussed. Model selection is discussed both from a Bayesian and a classical perspective. Many connections to other well-known techniques from machine learning and statistics are discussed, including support-vector machines, neural networks, splines, regularization networks, relevance vector machines and others. Theoretical issues including learning curves and the PAC-Bayesian framework are treated, and several approximation methods for learning with large datasets are discussed. The book contains illustrative examples and exercises, and code and datasets are available on the Web. Appendixes provide mathematical background and a discussion of Gaussian Markov processes.
Autonomous learning has been a promising direction in control and robotics for more than a decade since data-driven learning allows to reduce the amount of engineering knowledge, which is otherwise ...required. However, autonomous reinforcement learning (RL) approaches typically require many interactions with the system to learn controllers, which is a practical limitation in real systems, such as robots, where many interactions can be impractical and time consuming. To address this problem, current learning approaches typically require task-specific knowledge in form of expert demonstrations, realistic simulators, pre-shaped policies, or specific knowledge about the underlying dynamics. In this paper, we follow a different approach and speed up learning by extracting more information from data. In particular, we learn a probabilistic, non-parametric Gaussian process transition model of the system. By explicitly incorporating model uncertainty into long-term planning and controller learning our approach reduces the effects of model errors, a key problem in model-based learning. Compared to state-of-the art RL our model-based policy search method achieves an unprecedented speed of learning. We demonstrate its applicability to autonomous learning in real robot and control tasks.
In the Bayesian mixture modeling framework it is possible to infer the necessary number of components to model the data and therefore it is unnecessary to explicitly restrict the number of ...components. Nonparametric mixture models sidestep the problem of finding the "correct" number of mixture components by assuming infinitely many components. In this paper Dirichlet process mixture (DPM) models are cast as infinite mixture models and inference using Markov chain Monte Carlo is described. The specification of the priors on the model parameters is often guided by mathematical and practical convenience. The primary goal of this paper is to compare the choice of conjugate and non-conjugate base distributions on a particular class of DPM models which is widely used in applications, the Dirichlet process Gaussian mixture model (DPGMM). We compare computational efficiency and modeling performance of DPGMM defined using a conjugate and a conditionally conjugate base distribution. We show that better density models can result from using a wider class of priors with no or only a modest increase in computational effort.
We propose a principled algorithm for robust Bayesian filtering and smoothing in nonlinear stochastic dynamic systems when both the transition function and the measurement function are described by ...non-parametric Gaussian process (GP) models. GPs are gaining increasing importance in signal processing, machine learning, robotics, and control for representing unknown system functions by posterior probability distributions. This modern way of system identification is more robust than finding point estimates of a parametric function representation. Our principled filtering/smoothing approach for GP dynamic systems is based on analytic moment matching in the context of the forward-backward algorithm. Our numerical evaluations demonstrate the robustness of the proposed approach in situations where other state-of-the-art Gaussian filters and smoothers can fail.
Techniques known as Nonlinear Set Membership prediction or Lipschitz Interpolation are approaches to supervised machine learning that utilise presupposed Lipschitz properties to perform inference ...over unobserved function values. Provided a bound on the true best Lipschitz constant of the target function is known a priori, they offer convergence guarantees, as well as bounds around the predictions. Considering a more general setting that builds on Lipschitz continuity, we propose an online method for estimating the Lipschitz constant online from function value observations that are possibly corrupted by bounded noise. Utilising this as a data-dependent hyper-parameter gives rise to a nonparametric machine learning method, for which we establish strong universal approximation guarantees. That is, we show that our prediction rule can learn any continuous function on compact support in the limit of increasingly dense data, up to a worst-case error that can be bounded by the level of observational error. We also consider applications of our nonparametric regression method to learning-based control. For a class of discrete-time settings, we establish convergence guarantees on the closed-loop tracking error of our online learning-based controllers. To provide evidence that our method can be beneficial not only in theory but also in practice, we apply it in the context of nonparametric model-reference adaptive control (MRAC). Across a range of simulated aircraft roll-dynamics and performance metrics our approach outperforms recently proposed alternatives that were based on Gaussian processes and RBF-neural networks.
Manifold Gaussian Processes for regression Calandra, Roberto; Peters, Jan; Rasmussen, Carl Edward ...
2016 International Joint Conference on Neural Networks (IJCNN),
2016-July
Conference Proceeding
Open access
Off-the-shelf Gaussian Process (GP) covariance functions encode smoothness assumptions on the structure of the function to be modeled. To model complex and non-differentiable functions, these ...smoothness assumptions are often too restrictive. One way to alleviate this limitation is to find a different representation of the data by introducing a feature space. This feature space is often learned in an unsupervised way, which might lead to data representations that are not useful for the overall regression task. In this paper, we propose Manifold Gaussian Processes, a novel supervised method that jointly learns a transformation of the data into a feature space and a GP regression from the feature space to observed space. The Manifold GP is a full GP and allows to learn data representations, which are useful for the overall regression task. As a proof-of-concept, we evaluate our approach on complex non-smooth functions where standard GPs perform poorly, such as step functions and robotics tasks with contacts.
Gaussian processes (GPs) provide a principled, practical, probabilistic
approach to learning in kernel machines. GPs have received increased attention in
the machine-learning community over the past ...decade, and this book provides a
long-needed systematic and unified treatment of theoretical and practical aspects of
GPs in machine learning. The treatment is comprehensive and self-contained, targeted
at researchers and students in machine learning and applied statistics.The book
deals with the supervised-learning problem for both regression and classification,
and includes detailed algorithms. A wide variety of covariance (kernel) functions
are presented and their properties discussed. Model selection is discussed both from
a Bayesian and a classical perspective. Many connections to other well-known
techniques from machine learning and statistics are discussed, including
support-vector machines, neural networks, splines, regularization networks,
relevance vector machines and others. Theoretical issues including learning curves
and the PAC-Bayesian framework are treated, and several approximation methods for
learning with large datasets are discussed. The book contains illustrative examples
and exercises, and code and datasets are available on the Web. Appendixes provide
mathematical background and a discussion of Gaussian Markov processes.
By interacting with human and learning from reward signals, reinforcement learning is an ideal way to build conversational AI. Concerning the expenses of real-users' responses, improving ...sample-efficiency has been the key issue when applying reinforcement learning in real-world spoken dialogue systems (SDS). Handcrafted action masks are commonly used to rule out impossible actions and accelerate the training process. However, the handcrafted action mask can barely be generalized to unseen domains. In this paper, we propose trainable-action-mask (TAM) which learns from data automatically without handcrafting complicated rules. In our experiments in Cambridge Restaurant domain, TAM requires only 30% of training data, compared with the baseline, to reach the 80% success rate and it also shows robustness to noisy environments.
Sparse variational approximations are popular methods for scaling up inference and learning in Gaussian processes to larger datasets. For \(N\) training points, exact inference has \(O(N^3)\) cost; ...with \(M \ll N\) features, state of the art sparse variational methods have \(O(NM^2)\) cost. Recently, methods have been proposed using more sophisticated features; these promise \(O(M^3)\) cost, with good performance in low dimensional tasks such as spatial modelling, but they only work with a very limited class of kernels, excluding some of the most commonly used. In this work, we propose integrated Fourier features, which extends these performance benefits to a very broad class of stationary covariance functions. We motivate the method and choice of parameters from a convergence analysis and empirical exploration, and show practical speedup in synthetic and real world spatial regression tasks.