This article reviews the recent development of adaptive dynamic programming (ADP) with applications in control. First, its applications in optimal regulation are introduced, and some skilled and ...efficient algorithms are presented. Next, the use of ADP to solve game problems, mainly nonzero-sum game problems, is elaborated. It is followed by applications in large-scale systems. Note that although the functions presented in this article are based on continuous-time systems, various applications of ADP in discrete-time systems are also analyzed. Moreover, in each section, not only some existing techniques are discussed, but also possible directions for future work are pointed out. Finally, some overall prospects for the future are given, followed by conclusions of this article. Through a comprehensive and complete investigation of its applications in many existing fields, this article fully demonstrates that the ADP intelligent control method is promising in today's artificial intelligence era. Furthermore, it also plays a significant role in promoting economic and social development.
In this paper, a value iteration adaptive dynamic programming (ADP) algorithm is developed to solve infinite horizon undiscounted optimal control problems for discrete-time nonlinear systems. The ...present value iteration ADP algorithm permits an arbitrary positive semi-definite function to initialize the algorithm. A novel convergence analysis is developed to guarantee that the iterative value function converges to the optimal performance index function. Initialized by different initial functions, it is proven that the iterative value function will be monotonically nonincreasing, monotonically nondecreasing, or nonmonotonic and will converge to the optimum. In this paper, for the first time, the admissibility properties of the iterative control laws are developed for value iteration algorithms. It is emphasized that new termination criteria are established to guarantee the effectiveness of the iterative control laws. Neural networks are used to approximate the iterative value function and compute the iterative control law, respectively, for facilitating the implementation of the iterative ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the present method.
This paper investigates the optimal consensus control problem for discrete-time multi-agent systems with completely unknown dynamics by utilizing a data-driven reinforcement learning method. It is ...known that the optimal consensus control for multi-agent systems relies on the solution of the coupled Hamilton-Jacobi-Bellman equation, which is generally impossible to be solved analytically. Even worse, most real-world systems are too complicated to obtain accurate mathematical models. To overcome these deficiencies, a data-based adaptive dynamic programming method is presented using the current and past system data rather than the accurate system models also instead of the traditional identification scheme which would cause the approximation residual errors. First, we establish a discounted performance index and formulate the optimal consensus problem via Bellman optimality principle. Then, we introduce the policy iteration algorithm which motivates this paper. To implement the proposed online action-dependent heuristic dynamic programming method, two neural networks (NNs), 1) critic NN and 2) actor NN, are employed to approximate the iterative performance index functions and control policies, respectively, in real time. Finally, two simulation examples are provided to demonstrate the effectiveness of the proposed method.
In this paper, a novel discrete-time deterministic Q-learning algorithm is developed. In each iteration of the developed Q-learning algorithm, the iterative Q function is updated for all the state ...and control spaces, instead of updating for a single state and a single control in traditional Q-learning algorithm. A new convergence criterion is established to guarantee that the iterative Q function converges to the optimum, where the convergence criterion of the learning rates for traditional Q-learning algorithms is simplified. During the convergence analysis, the upper and lower bounds of the iterative Q function are analyzed to obtain the convergence criterion, instead of analyzing the iterative Q function itself. For convenience of analysis, the convergence properties for undiscounted case of the deterministic Q-learning algorithm are first developed. Then, considering the discounted factor, the convergence criterion for the discounted case is established. Neural networks are used to approximate the iterative Q function and compute the iterative control law, respectively, for facilitating the implementation of the deterministic Q-learning algorithm. Finally, simulation results and comparisons are given to illustrate the performance of the developed algorithm.
In this paper, a discrete-time optimal control scheme is developed via a novel local policy iteration adaptive dynamic programming algorithm. In the discrete-time local policy iteration algorithm, ...the iterative value function and iterative control law can be updated in a subset of the state space, where the computational burden is relaxed compared with the traditional policy iteration algorithm. Convergence properties of the local policy iteration algorithm are presented to show that the iterative value function is monotonically nonincreasing and converges to the optimum under some mild conditions. The admissibility of the iterative control law is proven, which shows that the control system can be stabilized under any of the iterative control laws, even if the iterative control law is updated in a subset of the state space. Finally, two simulation examples are given to illustrate the performance of the developed method.
In this paper, a new iterative adaptive dynamic programming (ADP) algorithm is developed to solve optimal control problems for infinite horizon discrete-time nonlinear systems with finite ...approximation errors. First, a new generalized value iteration algorithm of ADP is developed to make the iterative performance index function converge to the solution of the Hamilton-Jacobi-Bellman equation. The generalized value iteration algorithm permits an arbitrary positive semi-definite function to initialize it, which overcomes the disadvantage of traditional value iteration algorithms. When the iterative control law and iterative performance index function in each iteration cannot accurately be obtained, for the first time a new "design method of the convergence criteria" for the finite-approximation-error-based generalized value iteration algorithm is established. A suitable approximation error can be designed adaptively to make the iterative performance index function converge to a finite neighborhood of the optimal performance index function. Neural networks are used to implement the iterative ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the developed method.
In this paper, a novel mixed iterative adaptive dynamic programming (ADP) algorithm is developed to solve the optimal battery energy management and control problem in smart residential microgrid ...systems. Based on the data of the load and electricity rate, two iterations are constructed, which are P-iteration and V-iteration, respectively. The V-iteration is implemented based on value iteration, which aims to obtain the iterative control law sequence in each period. The P-iteration is implemented based on policy iteration, which updates the iterative value function according to the iterative control law sequence. Properties of the developed mixed iterative ADP algorithm are analyzed. It is shown that the iterative value function is monotonically nonincreasing and converges to the solution of the Bellman equation. In each iteration, it is proven that the performance index function is finite under the iterative control law sequence. Finally, numerical results and comparisons are given to illustrate the performance of the developed algorithm.
In this paper, a novel adaptive dynamic programming (ADP) algorithm, called "iterative zero-sum ADP algorithm," is developed to solve infinite-horizon discrete-time two-player zero-sum games of ...nonlinear systems. The present iterative zero-sum ADP algorithm permits arbitrary positive semidefinite functions to initialize the upper and lower iterations. A novel convergence analysis is developed to guarantee the upper and lower iterative value functions to converge to the upper and lower optimums, respectively. When the saddle-point equilibrium exists, it is emphasized that both the upper and lower iterative value functions are proved to converge to the optimal solution of the zero-sum game, where the existence criteria of the saddle-point equilibrium are not required. If the saddle-point equilibrium does not exist, the upper and lower optimal performance index functions are obtained, respectively, where the upper and lower performance index functions are proved to be not equivalent. Finally, simulation results and comparisons are shown to illustrate the performance of the present method.
In this paper, we propose a novel computational method for solving non-linear optimal control problems. The method is based on the use of Fourier-Hermite series for approximating the action-value ...function arising in dynamic programming instead of the conventional Taylor-series expansion used in differential dynamic programming (DDP). The coefficients of the Fourier-Hermite series can be numerically computed by using sigma-point methods, which leads to a novel class of sigma-point based dynamic programming methods. We also prove the quadratic convergence of the method and experimentally test its performance against other methods.