This paper is concerned with a new discrete-time policy iteration adaptive dynamic programming (ADP) method for solving the infinite horizon optimal control problem of nonlinear systems. The idea is ...to use an iterative ADP technique to obtain the iterative control law, which optimizes the iterative performance index function. The main contribution of this paper is to analyze the convergence and stability properties of policy iteration method for discrete-time nonlinear systems for the first time. It shows that the iterative performance index function is nonincreasingly convergent to the optimal solution of the Hamilton-Jacobi-Bellman equation. It is also proven that any of the iterative control laws can stabilize the nonlinear systems. Neural networks are used to approximate the performance index function and compute the optimal control law, respectively, for facilitating the implementation of the iterative ADP algorithm, where the convergence of the weight matrices is analyzed. Finally, the numerical results and analysis are presented to illustrate the performance of the developed method.
In this paper, a novel iterative Q-learning method called "dual iterative Q-learning algorithm" is developed to solve the optimal battery management and control problem in smart residential ...environments. In the developed algorithm, two iterations are introduced, which are internal and external iterations, where internal iteration minimizes the total cost of power loads in each period, and the external iteration makes the iterative Q-function converge to the optimum. Based on the dual iterative Q-learning algorithm, the convergence property of the iterative Q-learning method for the optimal battery management and control problem is proven for the first time, which guarantees that both the iterative Q-function and the iterative control law reach the optimum. Implementing the algorithm by neural networks, numerical results and comparisons are given to illustrate the performance of the developed algorithm.
In this paper, a new iterative adaptive dynamic programming (ADP) algorithm is developed to solve optimal control problems for infinite-horizon discrete-time nonlinear systems with finite ...approximation errors. The idea is to use an iterative ADP algorithm to obtain the iterative control law that makes the iterative performance index function reach the optimum. When the iterative control law and the iterative performance index function in each iteration cannot be accurately obtained, the convergence conditions of the iterative ADP algorithm are obtained. When convergence conditions are satisfied, it is shown that the iterative performance index functions can converge to a finite neighborhood of the greatest lower bound of all performance index functions under some mild assumptions. Neural networks are used to approximate the performance index function and compute the optimal control policy, respectively, for facilitating the implementation of the iterative ADP algorithm. Finally, two simulation examples are given to illustrate the performance of the present method.
In this paper, a novel optimal energy storage control scheme is investigated in smart grid environments with solar renewable energy. Based on the idea of adaptive dynamic programming (ADP), a ...self-learning algorithm is constructed to obtain the iterative control law sequence of the battery. Based on the data of the real-time electricity price (electricity rate in brief), the load demand (load in brief), and the solar renewable energy (solar energy in brief), the optimal performance index function, which minimizes the total electricity cost and simultaneously extends the battery's lifetime, is established. A new analysis method of the iterative ADP algorithm is developed to guarantee the convergence of the iterative value function to the optimum under iterative control law sequence for any time index in a period. Numerical results and comparisons are presented to illustrate the effectiveness of the developed algorithm.
The design of stabilizing controller for uncertain nonlinear systems with control constraints is a challenging problem. The constrained-input coupled with the inability to identify accurately the ...uncertainties motivates the design of stabilizing controller based on reinforcement-learning (RL) methods. In this paper, a novel RL-based robust adaptive control algorithm is developed for a class of continuous-time uncertain nonlinear systems subject to input constraints. The robust control problem is converted to the constrained optimal control problem with appropriately selecting value functions for the nominal system. Distinct from typical action-critic dual networks employed in RL, only one critic neural network (NN) is constructed to derive the approximate optimal control. Meanwhile, unlike initial stabilizing control often indispensable in RL, there is no special requirement imposed on the initial control. By utilizing Lyapunov's direct method, the closed-loop optimal control system and the estimated weights of the critic NN are proved to be uniformly ultimately bounded. In addition, the derived approximate optimal control is verified to guarantee the uncertain nonlinear system to be stable in the sense of uniform ultimate boundedness. Two simulation examples are provided to illustrate the effectiveness and applicability of the present approach.
This paper establishes an off-policy integral reinforcement learning (IRL) method to solve nonlinear continuous-time (CT) nonzero-sum (NZS) games with unknown system dynamics. The IRL algorithm is ...presented to obtain the iterative control and off-policy learning is used to allow the dynamics to be completely unknown. Off-policy IRL is designed to do policy evaluation and policy improvement in the policy iteration algorithm. Critic and action networks are used to obtain the performance index and control for each player. The gradient descent algorithm makes the update of critic and action weights simultaneously. The convergence analysis of the weights is given. The asymptotic stability of the closed-loop system and the existence of Nash equilibrium are proved. The simulation study demonstrates the effectiveness of the developed method for nonlinear CT NZS games with unknown system dynamics.
In this paper, a new iterative adaptive dynamic programming (ADP) method is proposed to solve a class of continuous-time nonlinear two-person zero-sum differential games. The idea is to use the ADP ...technique to obtain the optimal control pair iteratively which makes the performance index function reach the saddle point of the zero-sum differential games. If the saddle point does not exist, the mixed optimal control pair is obtained to make the performance index function reach the mixed optimum. Stability analysis of the nonlinear systems is presented and the convergence property of the performance index function is also proved. Two simulation examples are given to illustrate the performance of the proposed method.
In this paper, a novel distributed iterative adaptive dynamic programming (ADP) method is developed to solve the multibattery optimal coordination control problems for home energy management systems. ...According to system transformations, the multi-input optimal control problem is transformed into a single-input optimal control problem, where all the batteries are implemented at their worst performance. Next, based on the worst-performance optimal control law, an effective distributed iterative ADP algorithm is developed, where, in each iteration, only a single-input optimization problem is implemented. Convergence properties of the distributed iterative ADP algorithm are developed to show that the iterative performance index function converges to the optimum. Finally, numerical analysis is given to illustrate the performance of the developed algorithm.
In this paper, a novel iterative adaptive dynamic programming (ADP)-based infinite horizon self-learning optimal control algorithm, called generalized policy iteration algorithm, is developed for ...nonaffine discrete-time (DT) nonlinear systems. Generalized policy iteration algorithm is a general idea of interacting policy and value iteration algorithms of ADP. The developed generalized policy iteration algorithm permits an arbitrary positive semidefinite function to initialize the algorithm, where two iteration indices are used for policy improvement and policy evaluation, respectively. It is the first time that the convergence, admissibility, and optimality properties of the generalized policy iteration algorithm for DT nonlinear systems are analyzed. Neural networks are used to implement the developed algorithm. Finally, numerical examples are presented to illustrate the performance of the developed algorithm.
An intelligent-optimal control scheme for unknown nonaffine nonlinear discrete-time systems with discount factor in the cost function is developed in this paper. The iterative adaptive dynamic ...programming algorithm is introduced to solve the optimal control problem with convergence analysis. Then, the implementation of the iterative algorithm via globalized dual heuristic programming technique is presented by using three neural networks, which will approximate at each iteration the cost function, the control law, and the unknown nonlinear system, respectively. In addition, two simulation examples are provided to verify the effectiveness of the developed optimal control approach.