Advances in machine learning (ML) and artificial intelligence (AI) present an opportunity to build better tools and solutions to help address some of the world's most pressing challenges, and deliver ...positive social impact in accordance with the priorities outlined in the United Nations' 17 Sustainable Development Goals (SDGs). The AI for Social Good (AI4SG) movement aims to establish interdisciplinary partnerships centred around AI applications towards SDGs. We provide a set of guidelines for establishing successful long-term collaborations between AI researchers and application-domain experts, relate them to existing AI4SG projects and identify key opportunities for future AI applications targeted towards social good.
This paper discusses parameter-based exploration methods for reinforcement learning. Parameter-based methods perturb parameters of a general function approximator directly, rather than adding noise ...to the resulting actions. Parameter-based exploration unifies reinforcement learning and black-box optimization, and has several advantages over action perturbation. We review two recent parameter-exploring algorithms: Natural Evolution Strategies and Policy Gradients with Parameter-Based Exploration. Both outperform state-of-the-art algorithms in several complex high-dimensional tasks commonly found in robot control. Furthermore, we describe how a novel exploration method, State-Dependent Exploration, can modify existing algorithms to mimic exploration in parameter space.
Many real-world applications require artificial agents to compete and coordinate with other agents in complex environments. As a stepping stone to this goal, the domain of StarCraft has emerged as an ...important challenge for artificial intelligence research, owing to its iconic and enduring status among the most difficult professional esports and its relevance to the real world in terms of its raw complexity and multi-agent challenges. Over the course of a decade and numerous competitions
, the strongest agents have simplified important aspects of the game, utilized superhuman capabilities, or employed hand-crafted sub-systems
. Despite these advantages, no previous agent has come close to matching the overall skill of top StarCraft players. We chose to address the challenge of StarCraft using general-purpose learning methods that are in principle applicable to other complex domains: a multi-agent reinforcement learning algorithm that uses data from both human and agent games within a diverse league of continually adapting strategies and counter-strategies, each represented by deep neural networks
. We evaluated our agent, AlphaStar, in the full game of StarCraft II, through a series of online games against human players. AlphaStar was rated at Grandmaster level for all three StarCraft races and above 99.8% of officially ranked human players.
We propose a powerful new tool for conducting research on computational intelligence and games. `PyVGDL' is a simple, high-level description language for 2D video games, and the accompanying software ...library permits parsing and instantly playing those games. The streamlined design of the language is based on defining locations and dynamics for simple building blocks, and the interaction effects when such objects collide, all of which are provided in a rich ontology. It can be used to quickly design games, without needing to deal with control structures, and the concise language is also accessible to generative approaches. We show how the dynamics of many classical games can be generated from a few lines of PyVGDL. The main objective of these generated games is to serve as diverse benchmark problems for learning and planning algorithms; so we provide a collection of interfaces for different types of learning agents, with visual or abstract observations, from a global or first-person viewpoint. To demonstrate the library's usefulness in a broad range of learning scenarios, we show how to learn competent behaviors when a model of the game dynamics is available or when it is not, when full state information is given to the agent or just subjective observations, when learning is interactive or in batch-mode, and for a number of different learning algorithms, including reinforcement learning and evolutionary search.
Natural Evolution Strategies Wierstra, D.; Schaul, T.; Peters, J. ...
2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence),
2008-June
Conference Proceeding
Peer reviewed
Open access
This paper presents natural evolution strategies (NES), a novel algorithm for performing real-valued dasiablack boxpsila function optimization: optimizing an unknown objective function where ...algorithm-selected function measurements constitute the only information accessible to the method. Natural evolution strategies search the fitness landscape using a multivariate normal distribution with a self-adapting mutation matrix to generate correlated mutations in promising regions. NES shares this property with covariance matrix adaption (CMA), an evolution strategy (ES) which has been shown to perform well on a variety of high-precision optimization tasks. The natural evolution strategies algorithm, however, is simpler, less ad-hoc and more principled. Self-adaptation of the mutation matrix is derived using a Monte Carlo estimate of the natural gradient towards better expected fitness. By following the natural gradient instead of the dasiavanillapsila gradient, we can ensure efficient update steps while preventing early convergence due to overly greedy updates, resulting in reduced sensitivity to local suboptima. We show NES has competitive performance with CMA on unimodal tasks, while outperforming it on several multimodal tasks that are rich in deceptive local optima.
In this short paper, we propose a powerful new tool for conducting research on computational intelligence and games. "PyVGDL" is a simple, high-level, extensible description language for 2-D video ...games. It is based on defining locations and dynamics for simple building blocks (objects), together with local interaction effects. A rich ontology defines various controllers, object behaviors, passive effects (physics), and collision effects. It can be used to quickly design games, without having to deal with control structures. We show how the dynamics of many classical games can be generated from a few lines of PyVGDL. Furthermore, the accompanying software library permits parsing and instantly playing those games, visualized from a bird's-eye or first-person viewpoint, and using them as benchmarks for learning algorithms.
We agree with Lake and colleagues on their list of "key ingredients" for building human-like intelligence, including the idea that model-based reasoning is essential. However, we favor an approach ...that centers on one additional ingredient: autonomy. In particular, we aim toward agents that can both build and exploit their own internal models, with minimal human hand engineering. We believe an approach centered on autonomous learning has the greatest chance of success as we scale toward real-world complexity, tackling domains for which ready-made formal models are not available. Here, we survey several important examples of the progress that has been made toward building autonomous agents with human-like abilities, and highlight some outstanding challenges.
Stochastic search using the natural gradient Yi, Sun; Wierstra, Daan; Schaul, Tom ...
Proceedings of the 26th Annual International Conference on Machine Learning,
06/2009
Conference Proceeding
Open access
To optimize unknown 'fitness' functions, we present Natural Evolution Strategies, a novel algorithm that constitutes a principled alternative to standard stochastic search methods. It maintains a ...multinormal distribution on the set of solution candidates. The Natural Gradient is used to update the distribution's parameters in the direction of higher expected fitness, by efficiently calculating the inverse of the exact Fisher information matrix whereas previous methods had to use approximations. Other novel aspects of our method include optimal fitness baselines and importance mixing, a procedure adjusting batches with minimal numbers of fitness evaluations. The algorithm yields competitive results on a number of benchmarks.
The 2014 General Video Game Playing Competition Perez-Liebana, Diego; Samothrakis, Spyridon; Togelius, Julian ...
IEEE transactions on computational intelligence and AI in games.,
2016-Sept., 2016-9-00, 20160901, Volume:
8, Issue:
3
Journal Article
Open access
This paper presents the framework, rules, games, controllers, and results of the first General Video Game Playing Competition, held at the IEEE Conference on Computational Intelligence and Games in ...2014. The competition proposes the challenge of creating controllers for general video game play, where a single agent must be able to play many different games, some of them unknown to the participants at the time of submitting their entries. This test can be seen as an approximation of general artificial intelligence, as the amount of game-dependent heuristics needs to be severely limited. The games employed are stochastic real-time scenarios (where the time budget to provide the next action is measured in milliseconds) with different winning conditions, scoring mechanisms, sprite types, and available actions for the player. It is a responsibility of the agents to discover the mechanics of each game, the requirements to obtain a high score and the requisites to finally achieve victory. This paper describes all controllers submitted to the competition, with an in-depth description of four of them by their authors, including the winner and the runner-up entries of the contest. The paper also analyzes the performance of the different approaches submitted, and finally proposes future tracks for the competition.