Akademska digitalna zbirka SLovenije - logo
E-viri
Celotno besedilo
Recenzirano
  • Learning to Be Proactive: S...
    Zhang, Ran; Wang, Miao; Cai, Lin X.; Shen, Xuemin

    IEEE transactions on wireless communications, 2021-July, 2021-7-00, 20210701, Letnik: 20, Številka: 7
    Journal Article

    Multi-Unmanned Aerial Vehicle (UAV) control is one of the major research interests in UAV-based networks. Yet few existing works focus on how the network should optimally react when the UAV lineup and user distribution change. In this work, proactive self-regulation (PSR) of UAV-based networks is investigated when one or more UAVs are about to quit or join the network, with considering dynamic user distribution. We target at an optimal UAV trajectory control policy which proactively relocates the UAVs whenever the UAV lineup is about to change, rather than passively dispatches the UAVs after the change. Specifically, a deep reinforcement learning (DRL)-based self-regulation approach is developed to maximize the accumulated user satisfaction (US) score for a certain period within which at least one UAV will quit or join the network. To handle the changed dimension of the state-action space before and after the lineup changes, the state transition is deliberately designed. To accommodate continuous state and action space, an actor-critic based DRL, i.e., deep deterministic policy gradient (DDPG), is applied with better convergence stability. To effectively promote learning exploration around the timing of lineup change, an asynchronous parallel computing (APC) learning structure is proposed. Referred to as PSR-APC, the developed approach is then extended to the case of dynamic user distribution by incorporating time as one of the agent states. Finally, numerical results are presented to demonstrate the convergence and superiority of PSR-APC over a passive reaction method, and its capability in jointly handling the dynamics of both UAV lineup and user distribution.