This article proposes a reinforcement-learning (RL) approach for optimizing charging scheduling and pricing strategies that maximize the system objective of a public electric vehicle (EV) charging ...station. The proposed algorithm is "online" in the sense that the charging and pricing decisions made at each time depend only on the observation of past events, and is "model-free" in the sense that the algorithm does not rely on any assumed stochastic models of uncertain events. To cope with the challenge arising from the time-varying continuous state and action spaces in the RL problem, we first show that it suffices to optimize the total charging rates to fulfill the charging requests before departure times. Then, we propose a feature-based linear function approximator for the state-value function to further enhance the efficiency and generalization ability of the proposed algorithm. Through numerical simulations with real-world data, we show that the proposed RL algorithm achieves on average 138.5% higher charging-station profit than representative benchmark algorithms.
Computer clusters, cloud computing and the exploitation of parallel architectures and algorithms have become the norm when dealing with scientific applications that work with large quantities of data ...and perform complex and time-consuming calculations. With the rise of social media applications and smart devices, the amount of digital data and the velocity at which it is produced have increased exponentially, determining the development of distributed system frameworks and platforms that increase productivity, consistency, fault-tolerance and security of parallel applications. The performance of such systems is mainly influenced by the architectural disposition and composition of the physical machines, the resource allocation and the scheduling of jobs and tasks. This paper proposes a reinforcement learning algorithm to solve the scheduling problem in distributed systems. The machine learning technique takes into consideration the heterogeneity of the nodes and their disposition within the grid, and the arrangement of tasks in a directed acyclic graph of dependencies, ultimately determining a scheduling policy for a better execution time. This paper also proposes a platform, in which the algorithm is implemented, that offers scheduling as a service to distributed systems.
•Reinforcement learning algorithm for scheduling problem.•Integrate machine learning methods in systems that use task schedulers.•Q-learning and state–action–reward–state–action methods.•DAG scheduling on dynamic clusters.•Variable tasks and task classification.
In this paper, we propose a deep state-action-reward-state-action (SARSA) <inline-formula> <tex-math notation="LaTeX">\lambda </tex-math></inline-formula> learning approach for optimising the uplink ...resource allocation in non-orthogonal multiple access (NOMA) aided ultra-reliable low-latency communication (URLLC). To reduce the mean decoding error probability in time-varying network environments, this work designs a reliable learning algorithm for providing a long-term resource allocation, where the reward feedback is based on the instantaneous network performance. With the aid of the proposed algorithm, this paper addresses three main challenges of the reliable resource sharing in NOMA-URLLC networks: 1) user clustering; 2) Instantaneous feedback system; and 3) Optimal resource allocation. All of these designs interact with the considered communication environment. Lastly, we compare the performance of the proposed algorithm with conventional Q-learning and SARSA Q-learning algorithms. The simulation outcomes show that: 1) Compared with the traditional Q learning algorithms, the proposed solution is able to converge within 200 episodes for providing as low as <inline-formula> <tex-math notation="LaTeX">10^{-2} </tex-math></inline-formula> long-term mean error; 2) NOMA assisted URLLC outperforms traditional OMA systems in terms of decoding error probabilities; and 3) The proposed feedback system is efficient for the long-term learning process.
Inhalable particulate matter with a diameter of less than 2.5 μm spatio-temporal prediction technology is an important tool for environmental governance in urban traffic congestion areas. A new ...Ensemble Graph Attention Reinforcement Learning Recursive Network is proposed to create a multi-data-driven spatio-temporal prediction method with excellent application value. The modeling process includes three basic steps. In step I, the graph attention network is used to effectively aggregate the spatio-temporal correlation characteristics of the original air pollutant data. In step II, the features extracted from the graph attention network are transferred to the long short-term memory network and the temporal convolutional network and the prediction models are constructed respectively. In step III, the reinforcement learning algorithm effectively analyzes the adaptability of the two different models to the data sets and realizes ensemble based on continuous optimization of weights. By comparing the experimental results of the listed cases, the following points can be summarized: (a) the graph attention network can effectively aggregate the spatio-temporal correlation characteristics of the original data and optimize the performance of the predictor. (b) The reinforcement learning algorithm effectively realizes the integration of several neural networks and improves the comprehensive adaptability and generalization capabilities of the model. (c) The proposed model in this paper has great application potential and value in spatial and temporal prediction and has achieved better performance than the other 25 benchmark models.
•A new multi-index driven spatio-temporal PM2.5 prediction model is proposed.•GAT is applied to study the spatio-temporal correlation between different sites.•The Sarsa algorithm is used to build the ensemble learning model with the strongest ability to adapt to different data.
To address the issue of significant unpredictability and intermittent nature of renewable energy sources, particularly wind and solar power, this paper introduces a novel optimization model based on ...online reinforcement learning. Initially, an energy management optimization model is designed to achieve plan adherence and minimize energy storage (ES) operation costs, taking into account the inherent challenges of wind power-photovoltaic energy storage systems (WPESS). An online reinforcement learning framework is employed, which defines various state variables, action variables, and reward functions for the energy management optimization model. The state-action-reward-state-action (SARSA) algorithm is applied to learn the joint scheduling strategy for the microgrid system, utilizing its iterative exploration mechanisms and interaction with the environment. This strategy aims to accomplish the goals of effective power tracking and reduction of storage charging and discharging. The proposed method's effectiveness is validated using a residential community with electric vehicle (EV) charging loads as a test case. Numerical analyses demonstrate that the approach is not reliant on traditional mathematical models and adapts well to the uncertainties and complex constraints of the WPESS, maintaining a low-cost requirement while achieving computational efficiency significantly higher than that of the model predictive control (MPC) and deep Q-network (DQN) algorithm.
Non-orthogonal multiple access (NOMA) exploits the potential of the power domain to enhance the connectivity for the Internet of Things (IoT). Due to time-varying communication channels, dynamic user ...clustering is a promising method to increase the throughput of NOMA-IoT networks. This article develops an intelligent resource allocation scheme for uplink NOMA-IoT communications. To maximise the average performance of sum rates, this work designs an efficient optimization approach based on two reinforcement learning algorithms, namely deep reinforcement learning (DRL) and SARSA-learning. For light traffic, SARSA-learning is used to explore the safest resource allocation policy with low cost. For heavy traffic, DRL is used to handle traffic-introduced huge variables. With the aid of the considered approach, this work addresses two main problems of fair resource allocation in NOMA techniques: 1) allocating users dynamically and 2) balancing resource blocks and network traffic. We analytically demonstrate that the rate of convergence is inversely proportional to network sizes. Numerical results show that: 1) Compared with the optimal benchmark scheme, the proposed DRL and SARSA-learning algorithms have lower complexity with acceptable accuracy and 2) NOMA-enabled IoT networks outperform the conventional orthogonal multiple access based IoT networks in terms of system throughput.
Due to their high maneuverability and flexible deployment, unmanned aerial vehicles (UAVs) could be an alternative option for a scenario where Internet of Things (IoT) devices consume high energy to ...achieve the required data rate when they are far away from the terrestrial base station (BS). Therefore, this article has proposed an energy-efficient UAV-assisted IoT network where a low-altitude quad-rotor UAV provides mobile data collection service from static IoT devices. We develop a novel optimization framework that minimizes the total energy consumption of all devices by jointly optimizing the UAV's trajectory, devices association, and respectively, transmit power allocation at every time slot while ensuring that every device should achieve a given data rate constraint. As this joint optimization problem is nonconvex and combinatorial, we adopt a reinforcement learning (RL)-based solution methodology that effectively decouples it into three individual optimization subproblems. The formulated optimization problem has transformed into a Markov decision process (MDP) where the UAV learns its trajectory according to its current state and corresponding action for maximizing the generated reward under the current policy. Finally, we conceive state-action-reward-state-action, a low complexity iterative algorithm for updating the current policy of UAV, that achieves an excellent computational complexity-optimality tradeoff. Numerical results validate the analysis and provide various insights on optimal UAV trajectory. The proposed methodology reduces the total energy consumption of all devices by 6.91%, 8.48%, and 9.94% in 80, 100, and 120 available time slots of UAV, respectively, compared to the particle swarm optimization (PSO) algorithm.
•A bi-objective milk-run material distribution scheduling problem that considers balancing line-side inventory and green production targets is proposed.•The influence of combined optimized kanban ...quantity and material bin capacity on the formulation of material distribution scheduling plan is investigated.•A multi-objective artificial electric field algorithm with SARSA selection mechanism is proposed for the problem.•The proposed algorithm shows superior performance in several metrics.
In automotive mixed-model assembly lines (MMALs), a large number of different parts need to be supplied to the assembly lines on time, which poses significant logistical challenges for manufacturers. However, consistently supplying parts for MMALs is a very complex issue due to factors such as diverse component requirements and logistical coordination in the supply chain. In this paper, we propose a bi-objective optimization problem to minimize the line-side inventory and energy consumption in a milk-run material distribution system. Meanwhile, the number of Kanban and the capacity of the material bin that affect the scheduling is jointly optimized, so that the material distribution scheduling plan is optimized. Considering the character of the problem, a multi-objective artificial electric field algorithm with SARSA mechanism (MOAEFASA) is developed to solve the problem. The algorithm proposed combines the merits of the artificial electric field algorithm (AEFA) and the framework of the non-dominated sorting genetic algorithm (NSGA-II). In addition, several optimization strategies are used to optimize the performance of the algorithm. Finally, the validity of the mathematical model is verified through the Epsilon constraint method and the superiority of the MOAEFASA is illustrated by numerical experiments with four outstanding meta-heuristics.
Pedestrian simulation is complex because there are different levels of behavior modeling. At the lowest level, local interactions between agents occur; at the middle level, strategic and tactical ...behaviors appear like overtakings or route choices; and at the highest level path-planning is necessary. The agent-based pedestrian simulators either focus on a specific level (mainly in the lower one) or define strategies like the layered architectures to independently manage the different behavioral levels. In our Multi-Agent Reinforcement-Learning-based Pedestrian simulation framework (MARL-Ped) the situation is addressed as a whole. Each embodied agent uses a model-free Reinforcement Learning (RL) algorithm to learn autonomously to navigate in the virtual environment. The main goal of this work is to demonstrate empirically that MARL-Ped generates learned behaviors adapted to the level required by the pedestrian scenario. Three different experiments, described in the pedestrian modeling literature, are presented to test our approach: (i) election of the shortest path vs. quickest path; (ii) a crossing between two groups of pedestrians walking in opposite directions inside a narrow corridor; (iii) two agents that move in opposite directions inside a maze. The results show that MARL-Ped solves the different problems, learning individual behaviors with characteristics of pedestrians (local control that produces adequate fundamental diagrams, route-choice capability, emergence of collective behaviors and path-planning). Besides, we compared our model with that of Helbing’s social forces, a well-known model of pedestrians, showing similarities between the pedestrian dynamics generated by both approaches. These results demonstrate empirically that MARL-Ped generates variate plausible behaviors, producing human-like macroscopic pedestrian flow.
This work is concerned with the design of state-feedback, and static output-feedback controllers for uncertain discrete-time systems. The reinforcement learning (RL) method is employed and the ...controller to be designed is considered as an agent changing the behavior of the plant, which is the environment. A State-Action-Reward-State-Action (SARSA) algorithm is developed to achieve this goal. This is an open problem, as this offline design through the usage of RL is an approach not so well explored in the literature. The gain matrices are used directly as design variables in the SARSA algorithm, and a time-varying incremental step is employed. The method uses a grid in the uncertain parameters to place the poles of the closed-loop system in a disk on the complex plane. In addition, a stability test based on the Lyapunov theory is performed to provide a hard stability certificate for the closed-loop system. Numerical experiments from the literature are used to illustrate the efficacy of the method, through the use of benchmark examples and exhaustive testing.