E-viri
Recenzirano
-
Chen, Hao; Liu, Quan; Fu, Ke; Huang, Jian; Wang, Chang; Gong, Jianxing
Knowledge-based systems, 04/2022, Letnik: 242Journal Article
In Markov games, how to respond quickly and optimally for an agent against opponents that follow changing policies is an open problem. Most state-of-the-art algorithms assume that players only change their policies at the end of an episode, and the agent can obtain the same optimal episodic rewards by accurately detecting the opponent policy. However, the opponent may change its policies within an episode, or switch to an unknown policy. Besides, the agent is more likely to achieve inconsistent optimal returns because of different opponent policies, which brings greater challenges to policy detection. In an effort to overcome these challenges, this paper proposes an algorithm to achieve accurate opponent policy detection and efficient knowledge reuse. Within an episode, an inter-episode belief and an intra-episode belief are jointly used to continuously infer the opponent’s identity taking into account the episodic rewards and opponent models. Then, the agent can reuse the best response policy directly. We also detect whether the opponent adopts an unknown policy based on performance models after each episode. For the detected unknown opponent type, we model the previously learned policies as corresponding options for indirect knowledge reuse. Moreover, an option-based knowledge reuse (OKR) network is introduced to guide new response policy learning by adaptively reusing useful knowledge from the existing learned policies. We demonstrate the advantages of the proposed algorithm over several state-of-the-art algorithms in three competitive scenarios. •An intra-episode belief continuously guides policy selection.•Episodic rewards and opponent models are used to infer the opponent policy.•Our approach can track the opponent who switches its policy within an episode.•Opponent policy switch frequencies do not degrade the agent’s performance.•Previously learned knowledge is used against an unknown opponent type.
![loading ... loading ...](themes/default/img/ajax-loading.gif)
Vnos na polico
Trajna povezava
- URL:
Faktor vpliva
Dostop do baze podatkov JCR je dovoljen samo uporabnikom iz Slovenije. Vaš trenutni IP-naslov ni na seznamu dovoljenih za dostop, zato je potrebna avtentikacija z ustreznim računom AAI.
Leto | Faktor vpliva | Izdaja | Kategorija | Razvrstitev | ||||
---|---|---|---|---|---|---|---|---|
JCR | SNIP | JCR | SNIP | JCR | SNIP | JCR | SNIP |
Baze podatkov, v katerih je revija indeksirana
Ime baze podatkov | Področje | Leto |
---|
Povezave do osebnih bibliografij avtorjev | Povezave do podatkov o raziskovalcih v sistemu SICRIS |
---|
Vir: Osebne bibliografije
in: SICRIS
To gradivo vam je dostopno v celotnem besedilu. Če kljub temu želite naročiti gradivo, kliknite gumb Nadaljuj.