Accurate policy detection and efficient knowledge reuse against multi-strategic opponents

E-viri

Recenzirano

Accurate policy detection and efficient knowledge reuse against multi-strategic opponents

Chen, Hao; Liu, Quan; Fu, Ke; Huang, Jian; Wang, Chang; Gong, Jianxing

Knowledge-based systems, 04/2022, Letnik: 242

Journal Article

In Markov games, how to respond quickly and optimally for an agent against opponents that follow changing policies is an open problem. Most state-of-the-art algorithms assume that players only change their policies at the end of an episode, and the agent can obtain the same optimal episodic rewards by accurately detecting the opponent policy. However, the opponent may change its policies within an episode, or switch to an unknown policy. Besides, the agent is more likely to achieve inconsistent optimal returns because of different opponent policies, which brings greater challenges to policy detection. In an effort to overcome these challenges, this paper proposes an algorithm to achieve accurate opponent policy detection and efficient knowledge reuse. Within an episode, an inter-episode belief and an intra-episode belief are jointly used to continuously infer the opponent’s identity taking into account the episodic rewards and opponent models. Then, the agent can reuse the best response policy directly. We also detect whether the opponent adopts an unknown policy based on performance models after each episode. For the detected unknown opponent type, we model the previously learned policies as corresponding options for indirect knowledge reuse. Moreover, an option-based knowledge reuse (OKR) network is introduced to guide new response policy learning by adaptively reusing useful knowledge from the existing learned policies. We demonstrate the advantages of the proposed algorithm over several state-of-the-art algorithms in three competitive scenarios. •An intra-episode belief continuously guides policy selection.•Episodic rewards and opponent models are used to infer the opponent policy.•Our approach can track the opponent who switches its policy within an episode.•Opponent policy switch frequencies do not degrade the agent’s performance.•Previously learned knowledge is used against an unknown opponent type.

Išči dalje

Avtor

Dostop do baze podatkov JCR je dovoljen samo uporabnikom iz Slovenije. Vaš trenutni IP-naslov ni na seznamu dovoljenih za dostop, zato je potrebna avtentikacija z ustreznim računom AAI.

Leto	Faktor vpliva		Izdaja		Kategorija		Razvrstitev
Leto	JCR	SNIP	JCR	SNIP	JCR	SNIP	JCR	SNIP

Povezave do osebnih bibliografij avtorjev	Povezave do podatkov o raziskovalcih v sistemu SICRIS

Vir: Osebne bibliografije in: SICRIS

Naloži sliko

Vnos na polico

Dodajanje gradiva na polico je uspelo.

Dodajanje gradiva na polico je spodletelo.

Dodajanje gradiva na polico ni bilo potrebno.

Trajna povezava

E-pošta

Faktor vpliva

Izberite knjižnično izkaznico:

Baze podatkov, v katerih je revija indeksirana

Citiranje

Tema