Solving nonstationary Markov decision processes via contextual decomposition: A military air battle management application

E-resources

Peer reviewed

Solving nonstationary Markov decision processes via contextual decomposition: A military air battle management application

Liles, Joseph M.; Robbins, Matthew J.; Lunday, Brian J.

Expert systems with applications, 12/2023, Volume: 233

Journal Article

Reinforcement learning for nonstationary problems is a subject of widespread research given that most realistic problems do not exist within static environments. Approaching these problems can require significant effort in feature engineering to provide a learning algorithm with enough useful information about the state space to uncover complex system dynamics. As an alternative for problems with sufficient data describing the nonstationary environment, we propose the contextual decomposition Markov decision process (CDMDP) as a collection of stationary sub-problems intended to approximate nonstationary problem dynamics using a linear combination of value functions. We demonstrate the effectiveness of the CDMDP approach with an application in military air battle management. We use a designed computational experiment and analysis of variance to show that a complex, nonstationary learning problem can be effectively approximated with a small set of stationary sub-problems, and that the CDMDP solution significantly improves solution quality over a baseline approach without the need for additional feature engineering. If a researcher suspects that a complex and continuously varying environment can be approximated by a small number of stationary contexts, the CDMDP framework may save significant computational resources and yield decision policies that are much easier to visualize and implement. •We model a military air battle scenario with a nonstationary Markov decision process.•We decompose a complex problem into a number of smaller sub-problems.•We use a computational experiment to find an effective neural network architecture.•We use a learning algorithm to develop decision policies for autonomous aircraft.•Our approach significantly improves solution quality over a baseline.

Keep searching

Author

Liles, Joseph M. | Robbins, Matthew J. | Lunday, Brian J.

Access to the JCR database is permitted only to users from Slovenia. Your current IP address is not on the list of IP addresses with access permission, and authentication with the relevant AAI accout is required.

Year	Impact factor		Edition		Category		Classification
Year	JCR	SNIP	JCR	SNIP	JCR	SNIP	JCR	SNIP

Links to authors' personal bibliographies	Links to information on researchers in the SICRIS system

Source: Personal bibliographies and: SICRIS

Upload image

Shelf entry

Adding material to shelf was successful.

Adding material to shelf failed.

It was not necessary to add the material to the shelf.

Permalink

E-mail

Impact factor

Select the library membership card:

DRS, in which the journal is indexed

Citations

Theme