Reinforcement Learning has proven to be capable of solving complex tasks like playing video games, robotics control, speech or image recognition and processing. Transferring Reinforcement Learning ...into engineering design helps to overcome two current issues of data-driven Design Automation in engineering design. First, dealing with sparse training data resulting from differing design samples. Second, overcoming the limited number of samples in the training data as consequence of short or insufficient product history. To introduce an alternative approach for Design Automation, this contribution studies feasibility, training effort and transferability of Reinforcement Learning in engineering design. The presented method maps engineering requirements and parametric models into learning environments and provides a novel approach for design automation. In addition to that, the contribution summarises the hyperparameters, which design engineers have to set prior to training, and introduces a novel transfer learning concept for Reinforcement Learning in related design tasks. The support is probed by design tasks of performance-oriented bike parts. Case-independent indicators are presented to estimate the case-specific training effort, the effects of hyperparameter variation and the effects of transferring a pretrained agent to related design tasks. Finally, the findings are used to compare Reinforcement Learning to other data-independent Design Automation approaches to assess potential fields of application for Reinforcement Learning in engineering design.
•Method for Reinforcement Learning as an alternative Design Automation approach.•Parametric CAD models as learning environments and requirements as reward functions.•Quantitative Indicators for training effort estimation to enable comparability.•Studies on feasibility, training effort and effects of pre-training.•Identification of potential user groups for Reinforcement Learning.
Task scheduling, which plays a vital role in cloud computing, is a critical factor that determines the performance of cloud computing. From the booming economy of information processing to the ...increasing need of quality of service (QoS) in the business of networking, the dynamic task-scheduling problem has attracted worldwide attention. Due to its complexity, task scheduling has been defined and classified as an NP-hard problem. Additionally, most dynamic online task scheduling often manages tasks in a complex environment, which makes it even more challenging to balance and satisfy the benefits of each aspect of cloud computing. In this paper, we propose a novel artificial intelligence algorithm, called deep Q-learning task scheduling (DQTS), that combines the advantages of the Q-learning algorithm and a deep neural network. This new approach is aimed at solving the problem of handling directed acyclic graph (DAG) tasks in a cloud computing environment. The essential idea of our approach uses the popular deep Q-learning (DQL) method in task scheduling, where fundamental model learning is primarily inspired by DQL. Based on developments in WorkflowSim, experiments are conducted that comparatively consider the variance of makespan and load balance in task scheduling. Both simulation and real-life experiments are conducted to verify the efficiency of optimization and learning abilities in DQTS. The result shows that when compared with several standard algorithms precoded in WorkflowSim, DQTS has advantages regarding learning ability, containment, and scalability. In this paper, we have successfully developed a new method for task scheduling in cloud computing.
•Gated Recurrent Unit is proposed to extract informative features from raw financial data.•Reward function is designed with risk-adjusted ratio for trading strategies for stable returns in the ...volatile condition.•Two adaptive stock trading strategies are proposed for quantitative stock trading.•The system outperforms the Turtle trading strategy and achieve more stable returns.
The increasing complexity and dynamical property in stock markets are key challenges of the financial industry, in which inflexible trading strategies designed by experienced financial practitioners fail to achieve satisfactory performance in all market conditions. To meet this challenge, adaptive stock trading strategies with deep reinforcement learning methods are proposed. For the time-series nature of stock market data, the Gated Recurrent Unit (GRU) is applied to extract informative financial features, which can represent the intrinsic characteristics of the stock market for adaptive trading decisions. Furthermore, with the tailored design of state and action spaces, two trading strategies with reinforcement learning methods are proposed as GDQN (Gated Deep Q-learning trading strategy) and GDPG (Gated Deterministic Policy Gradient trading strategy). To verify the robustness and effectiveness of GDQN and GDPG, they are tested both in the trending and in the volatile stock market from different countries. Experimental results show that the proposed GDQN and GDPG not only outperform the Turtle trading strategy but also achieve more stable returns than a state-of-the-art direct reinforcement learning method, DRL trading strategy, in the volatile stock market. As far as the GDQN and the GDPG are compared, experimental results demonstrate that the GDPG with an actor-critic framework is more stable than the GDQN with a critic-only framework in the ever-evolving stock market.
Nowadays, investors seek more sophisticated decision-making tools that maximize their profit from investing in the financial markets by suitably determining the optimal position, trading time, price, ...and volume. This paper proposes a novel intraday algorithmic trading system for volatile commodity futures markets based on a Deep Q-network (DQN) algorithm and its robust double-version (DDQN). The higher volatility, leverage property, and more liquidity in futures contracts give investors more opportunity to take advantage of speculative behaviors with a relatively small amount of capital; however, the volatility brings more difficulties in the learning phase. As an essential prerequisite to training and evaluating any trading algorithm in the futures market, we develop a simulator to replicate a real futures exchange market environment that executes recommended trading signals by handling the clearing and margin management and the pre-order checking mechanisms. Moreover, this study provides a new definition of the continuous state and action spaces that match the futures market's characteristics. To address the curse of dimensionality, we utilize a multi-agent architecture equipped with the Gated Recurrent Unit (GRU) networks to approximate the Q-values functions. The experimental results demonstrate that implementing the proposed trading algorithms (especially the DDQN) into the actual intraday data of gold coin futures contracts significantly outperforms the benchmarks in terms of return, risk, and risk-adjusted return.
Hyperparameters are numerical pre-sets whose values are assigned prior to the commencement of a learning process. Selecting appropriate hyperparameters is often critical for achieving satisfactory ...performance in many vision problems, such as deep learning-based visual object tracking. However, it is often difficult to determine their optimal values, especially if they are specific to each video input. Most hyperparameter optimization algorithms tend to search a generic range and are imposed blindly on all sequences. In this paper, we propose a novel dynamical hyperparameter optimization method that adaptively optimizes hyperparameters for a given sequence using an action-prediction network leveraged on continuous deep Q-learning. Since the observation space for object tracking is significantly more complex than those in traditional control problems, existing continuous deep Q-learning algorithms cannot be directly applied. To overcome this challenge, we introduce an efficient heuristic strategy to handle high dimensional state space, while also accelerating the convergence behavior. The proposed algorithm is applied to improve two representative trackers, a Siamese-based one and a correlation-filter-based one, to evaluate its generalizability. Their superior performances on several popular benchmarks are clearly demonstrated. Our source code is available at https://github.com/shenjianbing/dqltracking .
This paper presents a comprehensive literature review on applications of deep reinforcement learning (DRL) in communications and networking. Modern networks, e.g., Internet of Things (IoT) and ...unmanned aerial vehicle (UAV) networks, become more decentralized and autonomous. In such networks, network entities need to make decisions locally to maximize the network performance under uncertainty of network environment. Reinforcement learning has been efficiently used to enable the network entities to obtain the optimal policy including, e.g., decisions or actions, given their states when the state and action spaces are small. However, in complex and large-scale networks, the state and action spaces are usually large, and the reinforcement learning may not be able to find the optimal policy in reasonable time. Therefore, DRL, a combination of reinforcement learning with deep learning, has been developed to overcome the shortcomings. In this survey, we first give a tutorial of DRL from fundamental concepts to advanced models. Then, we review DRL approaches proposed to address emerging issues in communications and networking. The issues include dynamic network access, data rate control, wireless caching, data offloading, network security, and connectivity preservation which are all important to next generation networks, such as 5G and beyond. Furthermore, we present applications of DRL for traffic routing, resource sharing, and data collection. Finally, we highlight important challenges, open issues, and future research directions of applying DRL.
Summary
The complex and large‐scale scientific workflow applications are effectively executes on the cloud. The performance of cloud computing highly depends on the task scheduling. Optimal workflow ...scheduling is still a challenge that needs to be addressed due to the conflicting objectives and increasing demand for quality of service. Task scheduling is an NP‐hard problem due to its complexity. The newly introduced methods for resolving the problem of task scheduling are facing challenges to take the benefits of all aspects of cloud computing. In this article, we study the joint optimization of cost and makespan of scheduling workflows in infrastructure as a service clouds and propose a new workflow scheduling scheme using deep learning. In this scheme, a deep‐Q learning‐based heterogeneous earliest‐finish‐time (DQ‐HEFT) algorithm is developed, which closely integrates the deep learning mechanism with the task scheduling heuristic HEFT. The workflowsim simulator is used for the experiment of the real‐world and synthetic workflows. The experiment results demonstrate the efficiency of our proposed approach compared with existing algorithms. This technique can achieve significantly better makespan and speed metrics with a remarkably higher volume of data and can run faster compared with the existing workflow scheduling algorithms in cloud computing environment.
•A continuous reinforcement learning based energy management of HEB is proposed.•The discrete action value matrix of Q learning is replaced by continuous neural network.•Simulation results show that ...the fuel economy of DQL algorithm is 5.6% better than Q learning.
Reinforcement learning is a new research hotspot in the artificial intelligence community. Q learning as a famous reinforcement learning algorithm can achieve satisfactory control performance without need to clarify the complex internal factors in controlled objects. However, discretization state is necessary which limits the application of Q learning in energy management for hybrid electric bus (HEB). In this paper the deep Q learning (DQL) is adopted for energy management issue and the strategy is proposed and verified. Firstly, the system modeling of bus configuration are described. Then, the energy management strategy based on deep Q learning is put forward. Deep neural network is employed and well trained to approximate the action value function (Q function). Furthermore, the Q learning strategy based on the same model is mentioned and applied to compare with deep Q learning. Finally, a part of trained decision network is analyzed separately to verify the effectiveness and rationality of the DQL-based strategy. The training results indicate that DQL-based strategy makes a better performance than that of Q learning in training time consuming and convergence rate. Results also demonstrate the fuel economy of proposed strategy under the unknown driving condition achieves 89% of dynamic programming-based method. In addition, the technique can finally learn to the target state of charge under different initial conditions. The main contribution of this study is to explore a novel reinforcement learning methodology into energy management for HEB which solve the curse of state variable dimensionality, and the techniques can be adopted to solve similar problems.
This work demonstrates the potential of deep reinforcement learning techniques for transmit power control in wireless networks. Existing techniques typically find near-optimal power allocations by ...solving a challenging optimization problem. Most of these algorithms are not scalable to large networks in real-world scenarios because of their computational complexity and instantaneous cross-cell channel state information (CSI) requirement. In this paper, a distributively executed dynamic power allocation scheme is developed based on model-free deep reinforcement learning. Each transmitter collects CSI and quality of service (QoS) information from several neighbors and adapts its own transmit power accordingly. The objective is to maximize a weighted sum-rate utility function, which can be particularized to achieve maximum sum-rate or proportionally fair scheduling. Both random variations and delays in the CSI are inherently addressed using deep <inline-formula> <tex-math notation="LaTeX">{Q} </tex-math></inline-formula>-learning. For a typical network architecture, the proposed algorithm is shown to achieve near-optimal power allocation in real time based on delayed CSI measurements available to the agents. The proposed scheme is especially suitable for practical scenarios where the system model is inaccurate and CSI delay is non-negligible.
Computation offloading is a protuberant elucidation for the resource-constrained mobile devices to accomplish the process demands high computation capability. The mobile cloud is the well-known ...existing offloading platform, which usually far-end network solution, to leverage computation of the resource-constrained mobile devices. Because of the far-end network solution, the user devices experience higher latency or network delay, which negatively affects the real-time mobile Internet of things (IoT) applications. Therefore, this paper proposed near-end network solution of computation offloading in mobile edge/fog. The mobility, heterogeneity and geographical distribution mobile devices through several challenges in computation offloading in mobile edge/fog. However, for handling the computation resource demand from the massive mobile devices, a deep Q-learning based autonomic management framework is proposed. The distributed edge/fog network controller (FNC) scavenging the available edge/fog resources i.e. processing, memory, network to enable edge/fog computation service. The randomness in the availability of resources and numerous options for allocating those resources for offloading computation fits the problem appropriate for modeling through Markov decision process (MDP) and solution through reinforcement learning. The proposed model is simulated through MATLAB considering oscillated resource demands and mobility of end user devices. The proposed autonomic deep Q-learning based method significantly improves the performance of the computation offloading through minimizing the latency of service computing. The total power consumption due to different offloading decisions is also studied for comparative study purpose which shows the proposed approach as energy efficient with respect to the state-of-the-art computation offloading solutions.
•An autonomic computation offloading model for mobile edge/fog is proposed.•A deep reinforcement Q-learning model is used for computation offloading.•Our method significantly improves the performance of the computation offloading.