Akademska digitalna zbirka SLovenije - logo
E-viri
Celotno besedilo
Recenzirano
  • Entropy regularized actor-c...
    Hao, Dong; Zhang, Dongcheng; Shi, Qi; Li, Kai

    Information sciences, December 2022, 2022-12-00, Letnik: 617
    Journal Article

    Multi-agent reinforcement learning (MARL) is an abstract framework modeling a dynamic environment that involves multiple learning and decision-making agents, each of which tries to maximize her cumulative reward. In MARL, each agent discovers a strategy alongside others and adapts her policy in response to the behavioural changes of others. A fundamental difficulty faced by MARL is that every agent is dynamically learning and changing to improve her reward, making the whole system unstable and agents’ policies difficult to converge. In this paper, we introduce the entropy regularizer into the Bellman equation and utilize Lagrange approach to optimize the entropy regularizer. We then propose a MARL algorithm based on the maximum entropy principle and the actor-critic method. This algorithm follows the policy gradient approach and uses a policy network and a value network. We call it Multi-Agent Deep Soft Policy Gradient (MADSPG). Then by using the Lagrange approach and dynamic minimax optimization, we propose the AUTO-MADSPG algorithm with an automatically adjusted entropy regularizer. These algorithms make multi-agent learning more stable while sufficient exploration is guaranteed. Finally, we also incorporate MADSPG with the recently proposed opponent modeling component into an integrated framework. This framework outperforms many state-of-the-art MARL algorithms in conventional cooperative and competitive game settings.