•A deep reinforcement learning based real-time scheduling for Automated Guided Vehicles is proposed.•Useful policy can be achieved through continuous training process.•Adaptive and efficient ...decisions can be made based on the proposed approach.
Driven by the recent advances in industry 4.0 and industrial artificial intelligence, Automated Guided Vehicles (AGVs) has been widely used in flexible shop floor for material handling. However, great challenges aroused by the high dynamics, complexity, and uncertainty of the shop floor environment still exists on AGVs real-time scheduling. To address these challenges, an adaptive deep reinforcement learning (DRL) based AGVs real-time scheduling approach with mixed rule is proposed to the flexible shop floor to minimize the makespan and delay ratio. Firstly, the problem of AGVs real-time scheduling is formulated as a Markov Decision Process (MDP) in which state representation, action representation, reward function, and optimal mixed rule policy, are described in detail. Then a novel deep q-network (DQN) method is further developed to achieve the optimal mixed rule policy with which the suitable dispatching rules and AGVs can be selected to execute the scheduling towards various states. Finally, the case study based on a real-world flexible shop floor is illustrated and the results validate the feasibility and effectiveness of the proposed approach.
Unmanned aerial vehicle (UAV) can be utilized as a relay to connect nodes with long distance, which can achieve significant throughput gain owing to its mobility and line-of-sight (LoS) channel with ...ground nodes. However, such LoS channels make UAV transmission easy to eavesdrop. In this paper, we propose a novel scheme to guarantee the security of UAV-relayed wireless networks with caching via jointly optimizing the UAV trajectory and time scheduling. For every two users that have cached the required file for the other, the UAV broadcasts the files together to these two users, and the eavesdropping can be disrupted. For the users without caching, we maximize their minimum average secrecy rate by jointly optimizing the trajectory and scheduling, with the secrecy rate of the caching users satisfied. The corresponding optimization problem is difficult to solve due to its non-convexity, and we propose an iterative algorithm via successive convex optimization to solve it approximately. Furthermore, we also consider a benchmark scheme in which we maximize the minimum average secrecy rate among all users by jointly optimizing the UAV trajectory and time scheduling when no user has the caching ability. Simulation results are provided to show the effectiveness and efficiency of our proposed scheme.
This paper investigates the time scheduling for a backscatter-aided radio-frequency-powered cognitive radio network, where multiple secondary transmitters transmit data to the same secondary gateway ...in the backscatter mode and the harvest-then-transmit mode. With many secondary transmitters connected to the network, the total transmission demand of the secondary transmitters may frequently exceed the transmission capacity of the secondary network. As such, the secondary gateway is more likely to assign the time resource, i.e., the backscattering time in the backscatter mode and the transmission time in the harvest-then-transmit mode, to the secondary transmitters with higher transmission valuations. Therefore, according to a variety of demand requirements from secondary transmitters, we design two auction-based time scheduling mechanisms for the time resource assignment. In the auctions, the secondary gateway acts as the seller as well as the auctioneer, and the secondary transmitters act as the buyers to bid for the time resource. We design the winner determination, the time scheduling, and the pricing schemes for both the proposed auction-based mechanisms. Furthermore, the economic properties, such as individual rationality and truthfulness, and the computational efficiency of our proposed mechanisms are analytically evaluated. The simulation results demonstrate the effectiveness of our proposed mechanisms.
Real-time decision-making in power system scheduling is imperative in response to the increasing integration of renewable energy. This paper proposes a novel framework leveraging Reinforcement ...Learning from Demonstration (RLfD) to address complex unit commitment (UC) and optimal power flow (OPF) challenges, called GridZero-Imitation (GZ-I). Unlike traditional RL approaches that require complex reward function designs and have limited performance insurance, our method employs intuitive rewards and expert demonstrations to regularize the RL training. The demonstrations can be collected from asynchronous reanalysis of an expert solver, enabling RL to synergize with expert knowledge. Specifically, we conduct a decoupled training approach, employing two separate policy networks, RL and expert. During the Monte Carlo Tree Search (MCTS) process, action candidates from the expert policy foster a guided search mechanism, which is especially helpful in the early training stage. This framework alleviates the speed bottleneck typical of physics-based solvers in online decision-making, and also significantly enhances control performance and convergence speed of RL scheduling agents, as validated by substantial improvements in a 126-node real provincial test case.
•Developed GridZero-Imitation, enhancing RL with expert demonstrations for power scheduling.•Demonstrates substantial improvements in control performance and speed in a real-case study.•Employs Monte Carlo Tree Search for guided action selection, improving early-stage training efficiency.•Significantly accelerates decision-making over traditional methods by over 100x, with high reliability.•Adapts to topology changes, ensuring robustness against operational uncertainties in power grids.
•We proposed an RL-based MDRs selection mechanism for the RTS problem.•A two-level SOM is used to determine the system state class.•A Q-learning algorithm is used as a reinforcement learning ...agent.•Our approach performs better than a previously proposed MDRs and SDR approach.
Previous studies of the real-time scheduling (RTS) problem domain indicate that using a multiple dispatching rules (MDRs) strategy for the various zones in the system can enhance the production performance to a greater extent than using a single dispatching rule (SDR) over a given scheduling interval for all the machines in the shop floor control system. This approach is feasible but the drawback of the previously proposed MDRs method is its inability to respond to changes in the shop floor environment. The RTS knowledge base (KB) is not static, so it would be useful to establish a procedure that maintains the KB incrementally if important changes occur in the manufacturing system. To address this issue, we propose reinforcement learning (RL)-based RTS using the MDRs mechanism by incorporating two main mechanisms: (1) an off-line learning module and (2) a Q-learning-based RL module. According to various performance criteria over a long period, the proposed approach performs better than the previously proposed MDRs method, the machine learning-based RTS using the SDR approach, and heuristic individual dispatching rules.
This paper proposes a hybrid-relaying scheme empowered by a self-sustainable intelligent reflecting surface (IRS) in a wireless powered communication network (WPCN), to simultaneously improve the ...performance of downlink energy transfer (ET) from a hybrid access point (HAP) to multiple users and uplink information transmission (IT) from users to the HAP. We propose time-switching (TS) and power-splitting (PS) schemes for the IRS, where the IRS can harvest energy from the HAP's signals by switching between energy harvesting and signal reflection in the TS scheme or adjusting its reflection amplitude in the PS scheme. For both the TS and PS schemes, we formulate the sum-rate maximization problems by jointly optimizing the IRS's phase shifts for both ET and IT and network resource allocation. To address each problem's non-convexity, we propose a two-step algorithm to obtain the near-optimal solution with high accuracy. To show the structure of resource allocation, we also investigate the optimal solutions for the schemes with random phase shifts. Through numerical results, we show that our proposed schemes can achieve significant system sum-rate gain compared to the baseline scheme without IRS.
The problem of relaxed real-time scheduling stabilization of nonlinear systems in the Takagi-Sugeno fuzzy model form is studied by proposing a new alterable-weights-based ranking switching mechanism. ...Thanks to the proposed alterable-weights-based ranking switching mechanism, a new fuzzy switching controller is developed with a set of activated modes that are adjusted by the real-time joint distribution of normalized fuzzy weighting functions. It is worth noting that those existing real-time scheduling stabilization results can be improved without introducing additional offline computational burden while solving control gain matrices. More importantly, less conservative stabilization conditions lead to a smaller degree of the fuzzy homogenous polynomially parameter-dependent switching controller, and thus, less online computational burden is required in the actual application. The effectiveness and superiority of the proposed method are verified by two simulation examples in the numerical section.
We study a set of scheduling problems in a distributed flow‐shop scheduling system consisting of several flow‐shop production systems (factories) working in parallel. Our objective is to assign the ...jobs to the factories, and to devise a job schedule for each of the factories such that the weighted number of jobs completed in just‐in‐time mode is maximized. We classify computational complexity of the problems, including the special cases of unit weights and job‐ or machine‐independent processing times.
Dynamic events and transportation constraints would significantly affect the full utilization of resources and the reduction of production costs in distributed job shops. Therefore, in this paper, a ...deep reinforcement learning algorithm (DRL)-based real-time scheduling method is developed to minimize the mean tardiness of the dynamic distributed job shop scheduling problem with transfers (DDJSPT) considering random job arrivals. Firstly, the proposed DDJSPT is modeled as a Markov decision process (MDP). Then, ten problem-oriented state features covering four aspects of factories, machines, jobs, and operations are elaborately extracted from the dynamic distributed job shop. After that, eleven composite rules considering the uniqueness of DDJSPT are constructed as a pool of actions to intelligently prioritize unfinished jobs and allocate the selected job to an appropriate factory. Moreover, a justified reward function adapted from the objective is designed for better convergence of DRLs. Subsequently, five DRLs are employed to address the DDJSPT, encompassing deep Q-network (DQN), double DQN (DDQN), dueling DQN (DlDQN), trust region policy optimization (TRPO), and proximal policy optimization (PPO). Finally, grounded in numerical comparison experiments under 243 production configurations of the DDJSPT, the effectiveness and generalization of DRL-based scheduling methods are credibly verified and confirmed.
•The use of real-time information can significantly improve scheduling decisions.•Baseline schedules quality is essential for the quality of the realized schedules.•Both event-driven and continuous ...rescheduling policies show similar performance.•The event-driven rescheduling policy is more computationally efficient.•Complete-rescheduling policy performs better than left/right shifting policy.•The predictive-reactive and proactive–reactive models show similar performance.
The utilization of real-time information in production scheduling decisions becomes possible with the help of new developments in Information Technology and Industrial Informatics, such as Industry 4.0. Regardless of the beliefs that the availability of such information will enhance scheduling decisions, several questions and concerns have been reported. One such question is to what extent can the availability of real-time information enhance scheduling decisions? Another concern is how can such information be utilized to advance scheduling decisions and when should it be used? Moreover, there is a general assumption that continuous rescheduling using real-time system updates is beneficial to some extent. However, this general assumption has not been extensively investigated in complex manufacturing systems, such as flexible job shops. Therefore, in this paper, our objective is to study the above-mentioned research questions by developing real-time scheduling (RTS) models for the flexible job-shop scheduling problem (FJSP) with unexpected new job arrivals and machine random breakdowns. We investigate how real-time updates on unexpected arrivals, the availability of machines (downtimes and recovery times), and the completion times of operations can be utilized to generate new schedules (i.e., rescheduling). The performance of the developed RTS models is also investigated under different settings for shop-floor events, different rescheduling strategies, rescheduling policies, and scheduling methods. Lastly, results, conclusions, and several promising research avenues are provided.