Learning, Reward, and Decision Making O'Doherty, John P; Cockburn, Jeffrey; Pauli, Wolfgang M
Annual review of psychology,
01/2017, Letnik:
68, Številka:
1
Journal Article
Recenzirano
Odprti dostop
In this review, we summarize findings supporting the existence of multiple behavioral strategies for controlling reward-related behavior, including a dichotomy between the goal-directed or ...model-based system and the habitual or model-free system in the domain of instrumental conditioning and a similar dichotomy in the realm of Pavlovian conditioning. We evaluate evidence from neuroscience supporting the existence of at least partly distinct neuronal substrates contributing to the key computations necessary for the function of these different control systems. We consider the nature of the interactions between these systems and show how these interactions can lead to either adaptive or maladaptive behavioral outcomes. We then review evidence that an additional system guides inference concerning the hidden states of other agents, such as their beliefs, preferences, and intentions, in a social context. We also describe emerging evidence for an arbitration mechanism between model-based and model-free reinforcement learning, placing such a mechanism within the broader context of the hierarchical control of behavior.
Both novelty and uncertainty are potent features guiding exploration; however, they are often experimentally conflated, and an understanding of how they interact to regulate the balance between ...exploration and exploitation has proved elusive. Using a task designed to decouple the influence of novelty and uncertainty, we identify separable mechanisms through which exploration is directed. We show that uncertainty-directed exploration is sensitive to the prospective benefit offered by new information, whereas novelty-directed exploration is maintained regardless of its potential advantage. Using a computational framework in conjunction with fMRI, we show that uncertainty-directed choice is rooted in an adaptive bias indexing the prospective utility of exploration. In contrast, novelty persistently promotes exploration by optimistically inflating reward expectations while simultaneously dampening uncertainty signals. Our results identify separable neural substrates charged with balancing the explore/exploit trade-off to foster a manageable decomposition of an otherwise intractable problem.
•Uncertainty-directed exploration considers the prospective benefit of new information.•Novelty-directed exploration is myopic and motivated by inflated reward expectation.•Option features are integrated by vmPFC to balance the explore/exploit trade-off.•Integrating a mixture of strategies offers a tractable approximation of optimal control.
Cockburn et al. show that novelty and uncertainty are used by the human brain to guide distinct exploration strategies. Uncertainty-directed exploration considers the prospective benefit of new information, whereas novelty motivates exploration by inflating the brain’s expectation of reward, offering a feasible decomposition of an otherwise intractable explore/exploit dilemma.
To navigate our complex social world, it is crucial to deploy multiple learning strategies, such as learning from directly experiencing action outcomes or from observing other people's behavior. ...Despite the prevalence of experiential and observational learning in humans and other social animals, it remains unclear how people favor one strategy over the other depending on the environment, and how individuals vary in their strategy use. Here, we describe an arbitration mechanism in which the prediction errors associated with each learning strategy influence their weight over behavior. We designed an online behavioral task to test our computational model, and found that while a substantial proportion of participants relied on the proposed arbitration mechanism, there was some meaningful heterogeneity in how people solved this task. Four other groups were identified: those who used a fixed mixture between the two strategies, those who relied on a single strategy and non-learners with irrelevant strategies. Furthermore, groups were found to differ on key behavioral signatures, and on transdiagnostic symptom dimensions, in particular autism traits and anxiety. Together, these results demonstrate how large heterogeneous datasets and computational methods can be leveraged to better characterize individual differences.
The value and uncertainty associated with choice alternatives constitute critical features relevant for decisions. However, the manner in which reward and risk representations are temporally ...organized in the brain remains elusive. Here we leverage the spatiotemporal precision of intracranial electroencephalography, along with a simple card game designed to elicit the unfolding computation of a set of reward and risk variables, to uncover this temporal organization. Reward outcome representations across wide-spread regions follow a sequential order along the anteroposterior axis of the brain. In contrast, expected value can be decoded from multiple regions at the same time, and error signals in both reward and risk domains reflect a mixture of sequential and parallel encoding. We further highlight the role of the anterior insula in generalizing between reward prediction error and risk prediction error codes. Together our results emphasize the importance of neural dynamics for understanding value-based decisions under uncertainty.
Humans exhibit a preference for options they have freely chosen over equally valued options they have not; however, the neural mechanism that drives this bias and its functional significance have yet ...to be identified. Here, we propose a model in which choice biases arise due to amplified positive reward prediction errors associated with free choice. Using a novel variant of a probabilistic learning task, we show that choice biases are selective to options that are predominantly associated with positive outcomes. A polymorphism in DARPP-32, a gene linked to dopaminergic striatal plasticity and individual differences in reinforcement learning, was found to predict the effect of choice as a function of value. We propose that these choice biases are the behavioral byproduct of a credit assignment mechanism responsible for ensuring the effective delivery of dopaminergic reinforcement learning signals broadcast to the striatum.
•Participants exhibit a biased preference for freely chosen rewarding options•DARPP-32 genotype predicts choice bias as a function of expected value•Bias is mirrored by a model that amplifies positive free-choice learning signals•Choice bias is the byproduct of a mechanism that refines learning signal fidelity
Cockburn et al. show behavioral, computational, and genetic evidence suggesting that human preference for free choice emerges as a byproduct of a striatal reinforcement learning mechanism that amplifies the impact of reward prediction errors following endogenously selected actions.
Across the lifespan, individuals frequently choose between exploiting known rewarding options or exploring unknown alternatives. A large body of work has suggested that children may explore more than ...adults. However, because novelty and reward uncertainty are often correlated, it is unclear how they differentially influence decision-making across development. Here, children, adolescents, and adults (ages 8–27 years,
N
= 122) completed an adapted version of a recently developed value-guided decision-making task that decouples novelty and uncertainty. In line with prior studies, we found that exploration decreased with increasing age. Critically, participants of all ages demonstrated a similar bias to select choice options with greater novelty, whereas aversion to reward uncertainty increased into adulthood. Computational modeling of participant choices revealed that whereas adolescents and adults demonstrated attenuated uncertainty aversion for more novel choice options, children’s choices were not influenced by reward uncertainty.
Abstract A number of hypotheses have suggested that the principal neurological dysfunction responsible for the behavioural symptoms associated with Attention-Deficit/Hyperactive Disorder (ADHD) is ...likely rooted in abnormal phasic signals coded by the firing rate of midbrain dopamine neurons. We present a formal investigation of the impact atypical phasic dopamine signals have on behaviour by applying a T D ( λ ) reinforcement learning model to simulations of operant conditioning tasks that have been argued to quantify the hyperactive, inattentive and impulsive behaviour associated with ADHD. The results presented here suggest that asymmetrically effective dopamine signals encoded by a punctate increase or decrease in dopamine levels provide the best account for the behaviour of children with ADHD as well as an animal model of ADHD, the spontaneously hypertensive rat (SHR). The biological sources of this asymmetry are considered, as are other computational models of ADHD.
Feedback information and the reward positivity Cockburn, Jeffrey; Holroyd, Clay B.
International journal of psychophysiology,
October 2018, 2018-10-00, 20181001, Letnik:
132, Številka:
Pt B
Journal Article
Recenzirano
The reward positivity is a component of the event-related brain potential (ERP) sensitive to neural mechanisms of reward processing. Multiple studies have demonstrated that reward positivity ...amplitude indices a reward prediction error signal that is fundamental to theories of reinforcement learning. However, whether this ERP component is also sensitive to richer forms of performance information important for supervised learning is less clear. To investigate this question, we recorded the electroencephalogram from participants engaged in a time estimation task in which the type of error information conveyed by feedback stimuli was systematically varied across conditions. Consistent with our predictions, we found that reward positivity amplitude decreased in relation to increasing information content of the feedback, and that reward positivity amplitude was unrelated to trial-to-trial behavioral adjustments in task performance. By contrast, a series of exploratory analyses revealed frontal-central and posterior ERP components immediately following the reward positivity that related to these processes. Taken in the context of the wider literature, these results suggest that the reward positivity is produced by a neural mechanism that motivates task performance, whereas the later ERP components apply the feedback information according to principles of supervised learning.
•The reward positivity (RP) is a component associated with win/loss outcome processing.•We probe the influence of corrective information on behavior and the RP.•The RP is reduced as corrective information increases but is not predictive of behavior.•Later components are found to be associated with behavioral adjustment and updating.•Our findings suggest component specific indices of various learning algorithms.
Reinforcement learning (RL) is a framework of particular importance to psychology, neuroscience and machine learning. Interactions between these fields, as promoted through the common hub of RL, has ...facilitated paradigm shifts that relate multiple levels of analysis in a singular framework (for example, relating dopamine function to a computationally defined RL signal). Recently, more sophisticated RL algorithms have been proposed to better account for human learning, and in particular its oft-documented reliance on two separable systems: a model-based (MB) system and a model-free (MF) system. However, along with many benefits, this dichotomous lens can distort questions, and may contribute to an unnecessarily narrow perspective on learning and decision-making. Here, we outline some of the consequences that come from overconfidently mapping algorithms, such as MB versus MF RL, with putative cognitive processes. We argue that the field is well positioned to move beyond simplistic dichotomies, and we propose a means of refocusing research questions towards the rich and complex components that comprise learning and decision-making.