Deep reinforcement learning-based energy management strategy play an essential role in improving fuel economy and extending fuel cell lifetime for fuel cell hybrid electric vehicles. In this work, ...the traditional Deep Q-Network is compared with the Deep Q-Network with prioritized experience replay. Furthermore, the Deep Q-Network with prioritized experience replay is designed for energy management strategy to minimize hydrogen consumption and compared with the dynamic programming. Moreover, the fuel cell system degradation is incorporated into the objective function, and a balance between fuel economy and fuel cell system degradation is achieved by adjusting the degradation weight and the hydrogen consumption weight. Finally, the combined driving cycle is selected to further verify the effectiveness of the proposed strategy in unfamiliar driving environments and untrained situations. The training results under UDDS show that the fuel economy of the EMS decreases by 0.53 % when fuel cell system degradation is considered, reaching 88.73 % of the DP-based EMS in the UDDS, and the degradation of fuel cell system is effectively suppressed. At the same time, the computational efficiency is improved by more than 70 % compared to the DP-based strategy.
•A deep reinforcement learning energy management framework is developed.•An improved Deep Q-Network algorithm is used for energy management.•A PER-DQN-based energy management that considers the degradation of fuel cell is proposed.•A combined driving cycle is selected to further verify the effectiveness of the proposed strategy.
•A novel framework for the RDBM is proposed based on the predicted operation trajectory of the preceding train.•A cooperative collision-avoidance control methodology is proposed to ensure the safety ...and enhance the operation efficiency.•The DQN algorithm is introduced to learn the safe and efficient control strategy.•The effectiveness of the proposed approach is verified by experimental simulations
To further improve the line transport capacity, virtual coupling has become a frontier hot topic in the field of rail transit. Specially, the safe and efficient following control strategy based on relative distance braking mode (RDBM) is one of the core technologies. This paper innovatively proposes a cooperative collision-avoidance control methodology, which can enhance the operation efficiency on the premise of ensuring the safety. Firstly, a novel framework for the RDBM based on the predicted trajectory of the preceding train is proposed for the train collision-avoidance control. To reduce the train following distance, a cooperative control model is further proposed and is formulated as a Markov decision process. Then, the Deep-Q-Network (DQN) algorithm is introduced to solve the efficient control problem by learning the safe and efficient control strategy for the following train where the critical elements of the reinforcement learning framework are designed. Finally, experimental simulations are conducted based on the simulated environment to illustrate the effectiveness of the proposed approach. Compared with the absolute distance braking mode (ADBM), the minimum following distance between the adjacent trains can be reduced by 70.23% on average via the proposed approach while the safety can be guaranteed.
This paper proposes an A* guiding deep Q-network (AG-DQN) algorithm for solving the pathfinding problem of an automated guided vehicle (AGV) in a robotic mobile fulfillment system (RMFS), that is, a ...parts-to-picker storage system with numerous AGVs replacing manual labor to improve the efficiency of picking work in warehouses. The pathfinding problem in an RMFS has characteristics such as changing scenes, narrow spaces, and significant decision-making time requirements. The A* algorithm and its variants have been widely used to address this problem. In this paper, we propose a reinforcement learning algorithm for a single AGV that uses the A* algorithm to guide the DQN algorithm. This makes the training process faster and requires less decision-making time than the A* algorithm. The trained neural network in the AG-DQN algorithm requires only the layout information of the current system to guide the AGV to complete a series of randomly assigned tasks. We used the AG-DQN algorithm to control the AGV pathfinding and complete tasks at different scales and layouts of the RMFS models, including traditional rectangular layouts and certain special layouts (e.g., fishbone layouts). The results show that the AG-DQN can train the AGV to find the correct shortest path to complete all tasks in less training time than the standard DQN algorithm. In addition, the decision-making time of the AG-DQN is less than that of the A* algorithm. The AG-DQN algorithm saved 49.92% and 71.51% of the decision-making time for the small- and large-scale RMFS models, respectively. Thus, the AG-DQN algorithm offers valuable insights into AGV control in an RMFS.
•Apply RL algorithm for the pathfinding problem of an RMFS.•An improved DQN (AG-DQN) algorithm greatly improves the training efficiency.•Largely shorts the decision-making time for pathfinding.
As a newly emerging computing paradigm, edge computing shows great capability in supporting and boosting 5G and Internet-of-Things (IoT) oriented applications, e.g., scientific workflows with ...low-latency, elastic, and on-demand provisioning of computational resources. However, the geographically distributed IoT resources are usually interconnected with each other through unreliable communications and ever-changing contexts, which brings in strong heterogeneity, potential vulnerability, and instability of computing infrastructures at different levels. It thus remains a challenge to enforce high fault-tolerance of edge-IoT scientific computing task flows, especially when the supporting computing infrastructures are deployed in a collaborative, distributed, and dynamic environment that is prone to faults and failures. This work proposes a novel fault-tolerant scheduling approach for edge-IoT collaborative workflows. The proposed approach first conducts a dependency-based task allocation analysis, then leverages a Primary-Backup (PB) strategy for tolerating task failures that occur at edge nodes, and finally designs a deep Q-learning algorithm for identifying the near-optimal workflow task scheduling scheme. We conduct extensive simulative case studies on multiple randomly-generated workflow and real-world edge-IoT server position datasets. Results clearly suggest that our proposed method outperforms the state-of-the-art competitors in terms of task completion ratio, server active time, and resource utilization.
The unmanned warehouse dispatching system of the ‘goods to people’ model uses a structure mainly based on a handling robot, which saves considerable manpower and improves the efficiency of the ...warehouse picking operation. However, the optimal performance of the scheduling system algorithm has high requirements. This study uses a deep Q-network (DQN) algorithm in a deep reinforcement learning algorithm, which combines the Q-learning algorithm, an empirical playback mechanism, and the volume-based technology of productive neural networks to generate target Q-values to solve the problem of multi-robot path planning. The aim of the Q-learning algorithm in deep reinforcement learning is to address two shortcomings of the robot path-planning problem: slow convergence and excessive randomness. Preceding the start of the algorithmic process, prior knowledge and prior rules are used to improve the DQN algorithm. Simulation results show that the improved DQN algorithm converges faster than the classic deep reinforcement learning algorithm and can more quickly learn the solutions to path-planning problems. This improves the efficiency of multi-robot path planning.
This paper analyzes and researches the network attack in the electric power information environment. The intrusion attack steps are examined, and the Bayesian inference method is applied to ...investigate the attack source information network delivery. The success probability of the network attack is quantified by likelihood. Noisy Net, Dueling DQN, Soft Q-learning, Prioritized Experience Playback Mechanism, and ICM model are integrated to improve the DQN algorithm from different perspectives. A NDSPI-DQN algorithm is proposed based on Bayesian inference. The experimental results show that comparing the convergence performance of DQN, PPO, and this paper’s algorithm, both this paper’s algorithm and the PPO algorithm can converge to the maximum cumulative reward value within 1000 rounds, and this paper’s algorithm can converge to the optimal value within 350 rounds. In an environment with 120 hosts, the optimal path discovery success rate of this paper’s algorithm is 97.23%. The optimal number of iterations and average running time are 1.12 times and 3.81 seconds. The proposed method is suitable for large-scale power information networks with higher execution efficiency.
The use of deep reinforcement learning algorithms for strategy formulation in supply chain management enables the nodes in the supply chain to better improve their management strategies. In this ...paper, a supply chain model is constructed as a starting point, and deep reinforcement learning algorithms are introduced on this basis. Firstly, the decision problem of uncertainty is handled by the reinforcement learning method of functions, and the DQN algorithm (deep neural network algorithm) is divided into two parts for iterative rules. Then the target network is established to make the iterative process more stable, to improve the convergence of the algorithm, evaluate the loss function in the training process of the network, and to determine its influence factor. Then the neural network is used to improve the iteration rule, improve the output layer, select the final action, and define the model expectation reward. Finally, the Bellman equation is fitted to the function by a deep neural network to calculate the final result. The experimental results show that by analyzing and constructing the cost of international logistics under supply chain management, the capacity utilization rate of ocean freight link is 57% The unloading link is 74% and the total capacity utilization rate is calculated as 76%. It shows that using deep reinforcement learning algorithms under international logistics supply chain management is feasible and necessary for improving the management strategy research of supply chains.
Based on the Deep Q-Network(DQN) algorithm of reinforcement learning, an active fault-tolerance method with incremental action is proposed for the control system with sensor faults of the ...once-through steam generator(OTSG). In this paper, we first establish the OTSG model as the interaction environment for the agent of reinforcement learning. The reinforcement learning agent chooses an action according to the system state obtained by the pressure sensor, the incremental action can gradually approach the optimal strategy for the current fault, and then the agent updates the network by different rewards obtained in the interaction process. In this way, we can transform the active fault tolerant control process of the OTSG to the reinforcement learning agent's decision-making process. The comparison experiments compared with the traditional reinforcement learning algorithm(RL) with fixed strategies show that the active fault-tolerant controller designed in this paper can accurately and rapidly control under sensor faults so that the pressure of the OTSG can be stabilized near the set-point value, and the OTSG can run normally and stably.
In cloud platform applications, the user’s goal is to obtain high-quality application services, while the service provider’s goal is to obtain revenue by performing the tasks submitted by the user. ...The platform built by the service provider’s application resources needs to improve the mapping between service requests and resources to achieve higher value. Through the current situation of resource management in the cloud environment, it is found that many task scheduling and resource allocation algorithms are still affected by factors such as the diversity, dynamics, and multiple constraints of resources and tasks. This paper focuses on Software as a Service (SaaS) applications’ task scheduling and resource configuration in a dynamic and uncertain cloud environment. It is a challenging online scheduling problem to automatically and intelligently allocate user task requests that continually reach SaaS applications to appropriate resources for execution. To this end, a real-time task scheduling method based on deep reinforcement learning is proposed, which automatically and intelligently allocates user task requests that continually reach SaaS applications to appropriate resources for execution. In this way, the limited virtual machine resources rented by SaaS providers can be used in a balanced and efficient manner. In the experiment, by comparing with other five task scheduling algorithms, it is proved that the algorithm proposed in this paper not only improves the execution efficiency of better deploying workflow in IaaS public cloud, but also makes the resources provided by SaaS are used in a balanced and efficient manner.