There has been recently a trend to study linear system identification with high order finite impulse response (FIR) models using the regularized least-squares approach. One key of this approach is to ...solve the hyper-parameter estimation problem that is usually nonconvex. Our goal here is to investigate implementation of algorithms for solving the hyper-parameter estimation problem that can deal with both large data sets and possibly ill-conditioned computations. In particular, a QR factorization based matrix-inversion-free algorithm is proposed to evaluate the cost function in an efficient and accurate way. It is also shown that the gradient and Hessian of the cost function can be computed based on the same QR factorization. Finally, the proposed algorithm and ideas are verified by Monte-Carlo simulations on a large data-bank of test systems and data sets.
Most of the currently used techniques for linear system identification are based on classical estimation paradigms coming from mathematical statistics. In particular, maximum likelihood and ...prediction error methods represent the mainstream approaches to identification of linear dynamic systems, with a long history of theoretical and algorithmic contributions. Parallel to this, in the machine learning community alternative techniques have been developed. Until recently, there has been little contact between these two worlds. The first aim of this survey is to make accessible to the control community the key mathematical tools and concepts as well as the computational aspects underpinning these learning techniques. In particular, we focus on kernel-based regularization and its connections with reproducing kernel Hilbert spaces and Bayesian estimation of Gaussian processes. The second aim is to demonstrate that learning techniques tailored to the specific features of dynamic systems may outperform conventional parametric approaches for identification of stable linear systems.
Intrigued by some recent results on impulse response estimation by kernel and nonparametric techniques, we revisit the old problem of transfer function estimation from input–output measurements. We ...formulate a classical regularization approach, focused on finite impulse response (FIR) models, and find that regularization is necessary to cope with the high variance problem. This basic, regularized least squares approach is then a focal point for interpreting other techniques, like Bayesian inference and Gaussian process regression. The main issue is how to determine a suitable regularization matrix (Bayesian prior or kernel). Several regularization matrices are provided and numerically evaluated on a data bank of test systems and data sets. Our findings based on the data bank are as follows. The classical regularization approach with carefully chosen regularization matrices shows slightly better accuracy and clearly better robustness in estimating the impulse response than the standard approach–the prediction error method/maximum likelihood (PEM/ML) approach. If the goal is to estimate a model of given order as well as possible, a low order model is often better estimated by the PEM/ML approach, and a higher order model is often better estimated by model reduction on a high order regularized FIR model estimated with careful regularization. Moreover, an optimal regularization matrix that minimizes the mean square error matrix is derived and studied. The importance of this result lies in that it gives the theoretical upper bound on the accuracy that can be achieved for this classical regularization approach.
Model estimation and structure detection with short data records are two issues that receive increasing interests in System Identification. In this paper, a multiple kernel-based regularization ...method is proposed to handle those issues. Multiple kernels are conic combinations of fixed kernels suitable for impulse response estimation, and equip the kernel-based regularization method with three features. First, multiple kernels can better capture complicated dynamics than single kernels. Second, the estimation of their weights by maximizing the marginal likelihood favors sparse optimal weights, which enables this method to tackle various structure detection problems, e.g., the sparse dynamic network identification and the segmentation of linear systems. Third, the marginal likelihood maximization problem is a difference of convex programming problem. It is thus possible to find a locally optimal solution efficiently by using a majorization minimization algorithm and an interior point method where the cost of a single interior-point iteration grows linearly in the number of fixed kernels. Monte Carlo simulations show that the locally optimal solutions lead to good performance for randomly generated starting points.
Inspired by the recent promising developments of Bayesian learning techniques in the context of system identification, this paper proposes a Transfer Function estimator, based on Gaussian process ...regression. Contrary to existing kernel-based impulse response estimators, a frequency domain approach is adopted. This leads to a formulation and implementation which is seamlessly valid for both continuous- and discrete-time systems, and which conveniently enables the selection of the frequency band of interest. A pragmatic approach is proposed in an output error framework, from sampled input and output data. The transient is dealt with by estimating it simultaneously with the transfer function.
Modelling the transfer function and the transient as Gaussian processes allows for the incorporation of relevant prior knowledge on the system, in the form of suitably designed kernels. The SS (Stable Spline) and DC (Diagonal Correlated) kernels from the literature are translated to the frequency domain, and are proven to impose the stability of the estimated transfer function. Specifically, the estimates are shown to be stable rational functions in the frequency variable. The hyperparameters of the kernel are tuned via marginal likelihood maximisation.
The comparison between the proposed method and three existing methods from the literature–the regularised finite impulse response (RFIR) estimator, the Local Polynomial Method (LPM), and the Local Rational Method for Frequency Response Function estimation–is illustrated on simulations on multiple case studies.
Input design is an important issue for classical system identification methods but has not been investigated for the kernel-based regularization method (KRM) until very recently. In this paper, we ...consider the input design problem of KRMs for LTI system identification. Different from the recent result, we adopt a Bayesian perspective and in particular make use of scalar measures (e.g., the A-optimality, D-optimality, and E-optimality) of the Bayesian mean square error matrix as the design criteria subject to power-constraint on the input. Instead of solving the optimization problem directly, we propose a two-step procedure. In the first step, by making suitable assumptions on the unknown input, we construct a quadratic map (transformation) of the input such that the transformed input design problems are convex, and the global minima of the transformed input design problem can thus be found efficiently by applying well-developed convex optimization software packages. In the second step, we derive the characterization of the optimal input based on the global minima found in the first step by solving the inverse image of the quadratic map. In addition, we derive analytic results for some special types of kernels, which provide insights on the input design and also its dependence on the kernel structure.
There are two key issues for the kernel-based regularization method: one is how to design a suitable kernel to embed in the kernel the prior knowledge of the LTI system to be identified, and the ...other one is how to tune the kernel such that the resulting regularized impulse response estimator can achieve a good bias–variance tradeoff. In this paper, we focus on the issue of kernel design. Depending on the type of the prior knowledge, we propose two methods to design kernels: one is from a machine learning perspective and the other one is from a system theory perspective. We also provide analysis results for both methods, which not only enhances our understanding for the existing kernels but also directs the design of new kernels.
Neural Networks (NN) are a family of models for a broad range of emerging machine learning and pattern recondition applications. NN techniques are conventionally executed on general-purpose ...processors (such as CPU and GPGPU), which are usually not energy-efficient since they invest excessive hardware resources to flexibly support various workloads. Consequently, application-specific hardware accelerators for neural networks have been proposed recently to improve the energy-efficiency. However, such accelerators were designed for a small set of NN techniques sharing similar computational patterns, and they adopt complex and informative instructions (control signals) directly corresponding to high-level functional blocks of an NN (such as layers), or even an NN as a whole. Although straightforward and easy-to-implement for a limited set of similar NN techniques, the lack of agility in the instruction set prevents such accelerator designs from supporting a variety of different NN techniques with sufficient flexibility and efficiency. In this paper, we propose a novel domain-specific Instruction Set Architecture (ISA) for NN accelerators, called Cambricon, which is a load-store architecture that integrates scalar, vector, matrix, logical, data transfer, and control instructions, based on a comprehensive analysis of existing NN techniques. Our evaluation over a total of ten representative yet distinct NN techniques have demonstrated that Cambricon exhibits strong descriptive capacity over a broad range of NN techniques, and provides higher code density than general-purpose ISAs such as ×86, MIPS, and GPGPU. Compared to the latest state-of-the-art NN accelerator design DaDianNao 5 (which can only accommodate 3 types of NN techniques), our Cambricon-based accelerator prototype implemented in TSMC 65nm technology incurs only negligible latency/power/area overheads, with a versatile coverage of 10 different NN benchmarks.
Maximum Entropy Kernels for System Identification Carli, Francesca Paola; Tianshi Chen; Ljung, Lennart
IEEE transactions on automatic control,
03/2017, Volume:
62, Issue:
3
Journal Article, Web Resource
Peer reviewed
Open access
Bayesian nonparametric approaches have been recently introduced in system identification scenario where the impulse response is modeled as the realization of a zero-mean Gaussian process whose ...covariance (kernel) has to be estimated from data. In this scheme, quality of the estimates crucially depends on the parametrization of the covariance of the Gaussian process. A family of kernels that have been shown to be particularly effective in the system identification framework is the family of Diagonal/Correlated (DC) kernels. Maximum entropy properties of a related family of kernels, the Tuned/Correlated (TC) kernels, have been recently pointed out in the literature. In this technical note, we show that maximum entropy properties indeed extend to the whole family of DC kernels. The maximum entropy interpretation can be exploited in conjunction with results on matrix completion problems in the graphical models literature to shed light on the structure of the DC kernel. In particular, we prove that the DC kernel admits a closed-form factorization, inverse, and determinant. These results can be exploited both to improve the numerical stability and to reduce the computational complexity associated with the computation of the DC estimator.
DaDianNao: A Neural Network Supercomputer Tao Luo; Shaoli Liu; Ling Li ...
IEEE transactions on computers,
2017-Jan.-1, 2017-1-1, 20170101, Volume:
66, Issue:
1
Journal Article
Peer reviewed
Open access
Many companies are deploying services largely based on machine-learning algorithms for sophisticated processing of large amounts of data, either for consumers or industry. The state-of-the-art and ...most popular such machine-learning algorithms are Convolutional and Deep Neural Networks (CNNs and DNNs), which are known to be computationally and memory intensive. A number of neural network accelerators have been recently proposed which can offer high computational capacity/area ratio, but which remain hampered by memory accesses. However, unlike the memory wall faced by processors on general-purpose workloads, the CNNs and DNNs memory footprint, while large, is not beyond the capability of the on-chip storage of a multi-chip system. This property, combined with the CNN/DNN algorithmic characteristics, can lead to high internal bandwidth and low external communications, which can in turn enable high-degree parallelism at a reasonable area cost. In this article, we introduce a custom multi-chip machine-learning architecture along those lines, and evaluate performance by integrating electrical and optical inter-chip interconnects separately. We show that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 656.63× over a GPU, and reduce the energy by 184.05× on average for a 64-chip system. We implement the node down to the place and route at 28 nm, containing a combination of custom storage and computational units, with electrical inter-chip interconnects.