Summary
One desired aspect of microservice architecture is the ability to self‐adapt its own architecture and behavior in response to changes in the operational environment. To achieve the desired ...high levels of self‐adaptability, this research implements distributed microservice architecture model running a swarm cluster, as informed by the Monitor, Analyze, Plan, and Execute over a shared Knowledge (MAPE‐K) model. The proposed architecture employs multiadaptation agents supported by a centralized controller, which can observe the environment and execute a suitable adaptation action. The adaptation planning is managed by a deep recurrent Q‐learning network (DRQN). It is argued that such integration between DRQN and Markov decision process (MDP) agents in a MAPE‐K model offers distributed microservice architecture with self‐adaptability and high levels of availability and scalability. Integrating DRQN into the adaptation process improves the effectiveness of the adaptation and reduces any adaptation risks, including resource overprovisioning and thrashing. The performance of DRQN is evaluated against deep Q‐learning and policy gradient algorithms, including (1) a deep Q‐learning network (DQN), (2) a dueling DQN (DDQN), (3) a policy gradient neural network, and (4) deep deterministic policy gradient. The DRQN implementation in this paper manages to outperform the aforementioned algorithms in terms of total reward, less adaptation time, lower error rates, plus faster convergence and training time. We strongly believe that DRQN is more suitable for driving the adaptation in distributed services‐oriented architecture and offers better performance than other dynamic decision‐making algorithms.
This article develops two novel output feedback (OPFB) <inline-formula> <tex-math notation="LaTeX">Q </tex-math></inline-formula>-learning algorithms, on-policy <inline-formula> <tex-math ...notation="LaTeX">Q </tex-math></inline-formula>-learning and off-policy <inline-formula> <tex-math notation="LaTeX">Q </tex-math></inline-formula>-learning, to solve <inline-formula> <tex-math notation="LaTeX">H_{\infty } </tex-math></inline-formula> static OPFB control problem of linear discrete-time (DT) systems. The primary contribution of the proposed algorithms lies in a newly developed OPFB control algorithm form for completely unknown systems. Under the premise of satisfying disturbance attenuation conditions, the conditions for the existence of the optimal OPFB solution are given. The convergence of the proposed <inline-formula> <tex-math notation="LaTeX">Q </tex-math></inline-formula>-learning methods, and the difference and equivalence of two algorithms are rigorously proven. Moreover, considering the effects brought by probing noise for the persistence of excitation (PE), the proposed off-policy <inline-formula> <tex-math notation="LaTeX">Q </tex-math></inline-formula>-learning method has the advantage of being immune to probing noise and avoiding biasedness of solution. Simulation results are presented to verify the effectiveness of the proposed approaches.
Q-learning-based operation strategies are being recently applied for optimal operation of energy storage systems, where, a Q-table is used to store Q-values for all possible state-action pairs. ...However, Q-learning faces challenges when it comes to large state space problems, i.e., continuous state space problems or problems with environment uncertainties. In order to address the limitations of Q-learning, this paper proposes a distributed operation strategy using double deep Q-learning method. It is applied to managing the operation of a community battery energy storage system (CBESS) in a microgrid system. In contrast to Q-learning, the proposed operation strategy is capable of dealing with uncertainties in the system in both grid-connected and islanded modes. This is due to the utilization of a deep neural network as a function approximator to estimate the Q-values. Moreover, the proposed method can mitigate the overestimation that is the major drawback of the standard deep Q-learning. The proposed method trains the model faster by decoupling the selection and evaluation processes. Finally, the performance of the proposed double deep Q-learning-based operation method is evaluated by comparing its results with a centralized approach-based operation.
This paper presents a complete simulation and reinforcement learning solution to train mobile agents’ strategy of route tracking and avoiding mutual collisions. The aim was to achieve such ...functionality with limited resources, w.r.t. model input and model size itself. The designed models prove to keep agents safely on the track. Collision avoidance agent’s skills developed in the course of model training are primitive but rational. Small size of the model allows fast training with limited computational resources.
This paper proposes a novel framework for home energy management (HEM) based on reinforcement learning in achieving efficient home-based demand response (DR). The concerned hour-ahead energy ...consumption scheduling problem is duly formulated as a finite Markov decision process (FMDP) with discrete time steps. To tackle this problem, a data-driven method based on neural network (NN) and <inline-formula> <tex-math notation="LaTeX">{Q} </tex-math></inline-formula>-learning algorithm is developed, which achieves superior performance on cost-effective schedules for HEM system. Specifically, real data of electricity price and solar photovoltaic (PV) generation are timely processed for uncertainty prediction by extreme learning machine (ELM) in the rolling time windows. The scheduling decisions of the household appliances and electric vehicles (EVs) can be subsequently obtained through the newly developed framework, of which the objective is dual, i.e., to minimize the electricity bill as well as the DR induced dissatisfaction. Simulations are performed on a residential house level with multiple home appliances, an EV and several PV panels. The test results demonstrate the effectiveness of the proposed data-driven based HEM framework.
Display omitted
•The overall model for the hybrid electric tracked vehicle is built in detail.•Fast Q-learning algorithm is applied to derive energy management strategy.•An efficient online energy ...management strategy update framework is constructed.•Hardware-in-loop simulation experiment is conducted to validate the performance.•The strategy improves fuel economy and has potential for real-time applications.
The energy management approach of hybrid electric vehicles has the potential to overcome the increasing energy crisis and environmental pollution by reducing the fuel consumption. This paper proposes an online updating energy management strategy to improve the fuel economy of hybrid electric tracked vehicles. As the basis of the research, the overall model for the hybrid electric tracked vehicle is built in detail and validated through the field experiment. To accelerate the convergence rate of the control policy calculation, a novel reinforcement learning algorithm called fast Q-learning is applied which improves the computational speed by 16%. The cloud-computation is presented to afford the main computation burden to realize the online updating energy management strategy in hardware-in-loop simulation bench. The Kullback-Leibler divergence rate to trigger the update of the control strategy is designed and realized in hardware-in-loop simulation bench. The simulation results show that the fuel consumption of the fast Q-learning based online updating strategy is 4.6% lower than that of stationary strategy, and is close to that of dynamic programming strategy. Besides, the computation time of the proposed method is only 1.35 s which is much shorter than that of dynamic programming based method. The results indicate that the proposed energy management strategy can greatly improve the fuel economy and have the potential to be applied in the real-time application. Moreover, the adaptability of the online energy management strategy is validated in three realistic driving schedules.
In recent times, sequence-to-sequence (seq2seq) models have gained a lot of popularity and provide state-of-the-art performance in a wide variety of tasks, such as machine translation, headline ...generation, text summarization, speech-to-text conversion, and image caption generation. The underlying framework for all these models is usually a deep neural network comprising an encoder and a decoder. Although simple encoder-decoder models produce competitive results, many researchers have proposed additional improvements over these seq2seq models, e.g., using an attention-based model over the input, pointer-generation models, and self-attention models. However, such seq2seq models suffer from two common problems: 1) exposure bias and 2) inconsistency between train/test measurement. Recently, a completely novel point of view has emerged in addressing these two problems in seq2seq models, leveraging methods from reinforcement learning (RL). In this survey, we consider seq2seq problems from the RL point of view and provide a formulation combining the power of RL methods in decision-making with seq2seq models that enable remembering long-term memories. We present some of the most recent frameworks that combine the concepts from RL and deep neural networks. Our work aims to provide insights into some of the problems that inherently arise with current approaches and how we can address them with better RL models. We also provide the source code for implementing most of the RL models discussed in this paper to support the complex task of abstractive text summarization and provide some targeted experiments for these RL models, both in terms of performance and training time.
Cognitive networks (CNs) are one of the key enablers for the Internet of Things (IoT), where CNs will play an important role in the future Internet in several application scenarios, such as ...healthcare, agriculture, environment monitoring, and smart metering. However, the current low packet transmission efficiency of IoT faces a problem of the crowded spectrum for the rapidly increasing popularities of various wireless applications. Hence, the IoT that uses the advantages of cognitive technology, namely the cognitive radio-based IoT (CIoT), is a promising solution for IoT applications. A major challenge in CIoT is the packet transmission efficiency using CNs. Therefore, a new Q-learning-based transmission scheduling mechanism using deep learning for the CIoT is proposed to solve the problem of how to achieve the appropriate strategy to transmit packets of different buffers through multiple channels to maximize the system throughput. A Markov decision process-based model is formulated to describe the state transformation of the system. A relay is used to transmit packets to the sink for the other nodes. To maximize the system utility in different system states, the reinforcement learning method, i.e., the Q learning algorithm, is introduced to help the relay to find the optimal strategy. In addition, the stacked auto-encoders deep learning model is used to establish the mapping between the state and the action to accelerate the solution of the problem. Finally, the experimental results demonstrate that the new action selection method can converge after a certain number of iterations. Compared with other algorithms, the proposed method can better transmit packets with less power consumption and packet loss.
Different kinds of compensation topologies are widely used in wireless power transfer (WPT) systems, resulting in loads with different compensation topologies hardly obtaining the same power from the ...same power transmitter. In addition, the load power and system efficiency are hard to hold constant when the mutual inductance or load resistance varies, and these two variation parameters are both hard to be accurately identified. These severely limit the practical popularization of the WPT technology. Therefore, a new strategy to improve the compatibility and performance of the WPT system based on the Q-learning algorithm and switch-controlled-capacitor (SCC) is proposed in this article. The Q-learning algorithm is used to train an intelligent offline database by monitoring the voltage and current in the primary side of the system to obtain an optimized compensation capacitor on the primary side without knowing the mutual inductance and load resistance. A 1-kW prototype is established and the experimental results demonstrate that the load side can receive desired power when the load changes from one kind of compensation topology to another, or the load resistance and mutual inductance vary together. Meanwhile, the coil-to-coil efficiency can also be effectively maintained.