The stability of fractional singular systems with time delay is discussed. Considering the singularity of the system, a system is decomposed into two subsystems. Through fractional Laplacian ...transformation and inverse Laplacian transformation on the subsystems, the expression of the state variables in time domain is obtained. According to the characteristics of Mittag-Leffler function, some inequalities that have important influence on stability are derived. Finally, a new sufficient condition is found to make the fractional singular systems with time delay asymptotically stable when the fractional order belongs to1 <
α
< 2. Meanwhile, the sufficient condition is also obtained to make the system stable under the nonlinear disturbance. All processes are proved and numerical examples are provided to show the validity and feasibility of the proposed method.
A least squares temporal difference with gradient correction (LS-TDC) algorithm and its kernel-based version kernel-based LS-TDC (KLS-TDC) are proposed as policy evaluation algorithms for ...reinforcement learning (RL). LS-TDC is derived from the TDC algorithm. Attributed to TDC derived by minimizing the mean-square projected Bellman error, LS-TDC has better convergence performance. The least squares technique is used to omit the size-step tuning of the original TDC and enhance robustness. For KLS-TDC, since the kernel method is used, feature vectors can be selected automatically. The approximate linear dependence analysis is performed to realize kernel sparsification. In addition, a policy iteration strategy motivated by KLS-TDC is constructed to solve control learning problems. The convergence and parameter sensitivities of both LS-TDC and KLS-TDC are tested through on-policy learning, off-policy learning, and control learning problems. Experimental results, as compared with a series of corresponding RL algorithms, demonstrate that both LS-TDC and KLS-TDC have better approximation and convergence performance, higher efficiency for sample usage, smaller burden of parameter tuning, and less sensitivity to parameters.
Research on the fractional order system is becoming more and more popular. Most of the fractional order controller design methods focus on single-input-single-output processes. In this paper, a ...fractional order internal model controller with inverted decoupling is proposed to handle non-integer order two-input-two-output systems with time delay. The fractional order two-input-two-output (FO-TITO) process is decoupled by inverted decoupling method. The fractional order internal model control (IMC) is then used to simplify the tuning process. Because of the complexity of multiple time delay, the condition of FO-TITO process with time delay is discussed. In order to ensure the robustness of the system, a Maximum sensitivity function is used to tune the parameters. Then Lyapunov stability theory is applied to verify the stability of the system. The proposed controller provides ideal performance for both set point-tracking and disturbance rejection and is robust to process gain variations. Numerical results show the performance of the proposed method.
Sparse learning is an efficient technique for feature selection and avoiding overfitting in machine learning research areas. Considering sparse learning for real-world problems with online learning ...demands in neural networks, an online sparse supervised learning of extreme learning machine (ELM) algorithm is proposed based on alternative direction method of multipliers (ADMM), termed OAL1-ELM. In OAL1-ELM, an <inline-formula> <tex-math notation="LaTeX">\ell _{1} </tex-math></inline-formula>-regularization penalty is added in loss function for generating a sparse solution to enhance the generalization ability. This convex combinatorial loss function is solved by using ADMM in a distributed way. Furthermore, an improved ADMM is used to reduce computational complexity and to achieve online learning. The proposed algorithm can learn data one-by-one or batch-by-batch. The convergence analysis for the fixed point of the solution is given to show the efficiency and optimality of the proposed method. The experimental results show that the proposed method can obtain a sparse solution and have strong generalization performance in a wide range of regression tasks, multiclass classification tasks, and a real-world industrial project.
Reinforcement learning (RL) is an important machine learning paradigm that can be used for learning from the data obtained by the human-computer interface and the interaction in human-centered smart ...systems. One of the essential problems in RL algorithms is the value functions. Value functions are usually estimated via linearly parameterized value functions. Prior RL algorithms that generalize in this way required learning times tuning the linear weights leaving out the basis function. In fact, basis functions in value function approximation also have a significant influence on the performance. In this paper, a new adaptive policy evaluation network based on recursive least squares temporal difference (TD) with gradient correction (adaptive RC network) is proposed. Basis functions in the proposed algorithm were adaptive optimized, mainly aiming at the widths. In the proposed algorithm, TD error and value function were estimated by RC algorithm and value function approximation. The gradient derived from the squares of TD error was used to update the widths of basis functions. Therefore, the RC network can adjust its network parameters in an adaptive way with a self-organizing approach according to the progress in learning. Empirical results based on the three RL benchmarks show the performance and applicability of the proposed adaptive RC network.
Control of the fed-batch ethanol fermentation processes to produce maximum product ethanol is one of the key issues in the bioreactor system. However, ethanol fermentation processes exhibit complex ...behavior and nonlinear dynamics with respect to the cell mass. substrate, feed-rate, etc. An improved dual heuristic programming algorithm based on the least squares temporal difference with gradient correction (LSTDC) algorithm (LSTDC-DHP) is proposed to solve the learning control problem of a fed-batch ethanol fermentation process. As a new algorithm of adaptive critic designs, LSTDC-DHP is used to realize online learning control of chemical dynamical plants, where LSTDC is commonly employed to approximate the value functions. Application of the LSTDC-DHP algorithm to ethanol fermentation process can realize efficient online learning control in continuous spaces. Simulation results demonstrate the effectiveness of LSTDC-DHP, and show that LSTDC-DHP can obtain the near-optimal feed rate trajectory faster than other-based algorithms.
In practical control problems with multiple conflicting objectives, multi-objective optimization (MOO) problems must be simultaneously addressed. To tackle these challenges, scholars have extensively ...studied multi-objective reinforcement learning (MORL) in recent years. However, due to the complexity of the system and the difficulty in determining preferences between objectives, complex continuous control processes involving MOO problems still require further research. In this study, an innovative goal-oriented MORL algorithm is proposed. The agent is better guided for optimization through adaptive thresholds and goal selection strategy. Additionally, the reward function is refined based on the chosen objective. To validate the approach, a comprehensive environment for the fermentation process is designed. Experimental results show that our proposed algorithm surpasses other benchmark algorithms in most performance metrics. Moreover, the Pareto solution set found by our algorithm is closer to the true Pareto frontier of fermentation problems.
•A goal-oriented multiobjective reinforcement learning algorithm is proposed.•Adaptive threshold setting is implemented based on sparse Pareto solutions.•A novel goal expansion method is introduced for improved generalization.•Superior quality of Pareto solutions is achieved in a fermentation process.
Actor-critic (AC) learning control architecture has been regarded as an important framework for reinforcement learning (RL) with continuous states and actions. In order to improve learning efficiency ...and convergence property, previous works have been mainly devoted to solve regularization and feature learning problem in the policy evaluation. In this article, we propose a novel AC learning control method with regularization and feature selection for policy gradient estimation in the actor network. The main contribution is that <inline-formula> <tex-math notation="LaTeX">\ell _{1} </tex-math></inline-formula>-regularization is used on the actor network to achieve the function of feature selection. In each iteration, policy parameters are updated by the regularized dual-averaging (RDA) technique, which solves a minimization problem that involves two terms: one is the running average of the past policy gradients and the other is the <inline-formula> <tex-math notation="LaTeX">\ell _{1} </tex-math></inline-formula>-regularization term of policy parameters. Our algorithm can efficiently calculate the solution of the minimization problem, and we call the new adaptation of policy gradient RDA-policy gradient (RDA-PG). The proposed RDA-PG can learn stochastic and deterministic near-optimal policies. The convergence of the proposed algorithm is established based on the theory of two-timescale stochastic approximation. The simulation and experimental results show that RDA-PG performs feature selection successfully in the actor and learns sparse representations of the actor both in stochastic and deterministic cases. RDA-PG performs better than existing AC algorithms on standard RL benchmark problems with irrelevant features or redundant features.
In policy evaluation of reinforcement learning tasks, the temporal difference (TD) learning with value function approximation has been widely studied. However, feature representation has a decisive ...influence on both accuracy of value function approximation and convergence rate. Therefore, it is important to develop the feature selection theory and methods that can efficiently prevent overfitting and improve estimation accuracy in TD learning algorithms. In this article, we propose an online sparse TD learning algorithm for policy evaluation by using <inline-formula> <tex-math notation="LaTeX">\ell _{1} </tex-math></inline-formula>-regualrization for feature selection. The per-step-time runtime computational complexity of the proposed algorithm is linear with respect to feature dimension. The loss function is defined as a nested optimization with <inline-formula> <tex-math notation="LaTeX">\ell _{1} </tex-math></inline-formula>-regularization penalty, and the solver minimizes two suboptimization problems by running stochastic gradient descent and regularized dual averaging method, alternately. The convergence results for the fixed points are also established. The experiments on benchmarks with high-dimensional features show the abilities of learning and generalization of the proposed algorithms.