Habitual control of goal selection in humans Cushman, Fiery; Morris, Adam
Proceedings of the National Academy of Sciences - PNAS,
11/2015, Volume:
112, Issue:
45
Journal Article
Peer reviewed
Open access
Humans choose actions based on both habit and planning. Habitual control is computationally frugal but adapts slowly to novel circumstances, whereas planning is computationally expensive but can ...adapt swiftly. Current research emphasizes the competition between habits and plans for behavioral control, yetmany complex tasks instead favor their integration. We consider a hierarchical architecture that exploits the computational efficiency of habitual control to select goals while preserving the flexibility of planning to achieve those goals. We formalize this mechanism in a reinforcement learning setting, illustrate its costs and benefits, and experimentally demonstrate its spontaneous application in a sequential decision-making task.
When Does Model-Based Control Pay Off? Kool, Wouter; Cushman, Fiery A; Gershman, Samuel J
PLOS computational biology/PLoS computational biology,
08/2016, Volume:
12, Issue:
8
Journal Article
Peer reviewed
Open access
Many accounts of decision making and reinforcement learning posit the existence of two distinct systems that control choice: a fast, automatic system and a slow, deliberative system. Recent research ...formalizes this distinction by mapping these systems to "model-free" and "model-based" strategies in reinforcement learning. Model-free strategies are computationally cheap, but sometimes inaccurate, because action values can be accessed by inspecting a look-up table constructed through trial-and-error. In contrast, model-based strategies compute action values through planning in a causal model of the environment, which is more accurate but also more cognitively demanding. It is assumed that this trade-off between accuracy and computational demand plays an important role in the arbitration between the two strategies, but we show that the hallmark task for dissociating model-free and model-based strategies, as well as several related variants, do not embody such a trade-off. We describe five factors that reduce the effectiveness of the model-based strategy on these tasks by reducing its accuracy in estimating reward outcomes and decreasing the importance of its choices. Based on these observations, we describe a version of the task that formally and empirically obtains an accuracy-demand trade-off between model-free and model-based strategies. Moreover, we show that human participants spontaneously increase their reliance on model-based control on this task, compared to the original paradigm. Our novel task and our computational analyses may prove important in subsequent empirical investigations of how humans balance accuracy and demand.
Human behavior is sometimes determined by habit and other times by goal-directed planning. Modern reinforcementlearning theories formalize this distinction as a competition between a computationally ...cheap but inaccurate modelfree system that gives rise to habits and a computationally expensive but accurate model-based system that implements planning. It is unclear, however, how people choose to allocate control between these systems. Here, we propose that arbitration occurs by comparing each system's task-specific costs and benefits. To investigate this proposal, we conducted two experiments showing that people increase model-based control when it achieves greater accuracy than model-free control, and especially when the rewards of accurate performance are amplified. In contrast, they are insensitive to reward amplification when model-based and model-free control yield equivalent accuracy. This suggests that humans adaptively balance habitual and planned action through on-line cost-benefit analysis.
Humans have a remarkable capacity for flexible decision-making, deliberating among actions by modeling their likely outcomes. This capacity allows us to adapt to the specific features of diverse ...circumstances. In real-world decision-making, however, people face an important challenge: There are often an enormous number of possibilities to choose among, far too many for exhaustive consideration. There is a crucial, understudied prechoice step in which, among myriad possibilities, a few good candidates come quickly to mind. How do people accomplish this? We show across nine experiments (N = 3,972 U.S. residents) that people use computationally frugal cached value estimates to propose a few candidate actions on the basis of their success in past contexts (even when irrelevant for the current context). Deliberative planning is then deployed just within this set, allowing people to compute more accurate values on the basis of context-specific criteria. This hybrid architecture illuminates how typically valuable thoughts come quickly to mind during decision-making.
Causal relationships, unlike mere co-occurrence, allow humans to obtain rewards and avoid punishments by intervening on their environment. Accordingly, explicit (controlled) evaluations of stimuli ...encountered in the environment are known to be sensitive to causal relationships above and beyond mere co-occurrence. In this project, we conduct stringent tests of whether implicit (automatic) evaluation also reflects causal relationships and begin to probe the representational mechanisms underlying such sensitivity. Participants (N = 4836) observed causal events during which two stimuli were equally contingent with positive or negative outcomes but only one of them was causally responsible for these outcomes. Across 6 studies, varying in design and amount of verbal scaffolding provided, differences in causal status consistently guided not only explicit measures of evaluation (Likert and slider scales; Bayes Factor meta-analysis: Cohen's d = 0.28, BF10 > 1046) but also their implicit counterparts (Implicit Association Tests; Bayes Factor meta-analysis: Cohen's d = 0.22, BF10 > 1029). However, unlike their explicit counterparts, implicit evaluations were not sensitive to causal relationships that had to be flexibly derived by combining disparate past experiences. Taken together, these studies suggest that implicit evaluations are sensitive to causal information. Such sensitivity seems to be mediated via precompiled, causally informed value representations rather than online computations over a causal model.
Ordinary people often make moral judgments that are consistent with philosophical principles and legal distinctions. For example, they judge killing as worse than letting die, and harm caused as a ...necessary means to a greater good as worse than harm caused as a side‐effect (Cushman, Young, & Hauser, 2006). Are these patterns of judgment produced by mechanisms specific to the moral domain, or do they derive from other psychological domains? We show that the action/omission and means/side‐effect distinctions affect nonmoral representations and provide evidence that their role in moral judgment is mediated by these nonmoral psychological representations. Specifically, the action/omission distinction affects moral judgment primarily via causal attribution, while the means/side‐effect distinction affects moral judgment via intentional attribution. We suggest that many of the specific patterns evident in our moral judgments in fact derive from nonmoral psychological mechanisms, and especially from the processes of causal and intentional attribution.
Humans use punishment to influence each other's behavior. Many current theories presume that this operates as a simple form of incentive. In contrast, we show that people infer the communicative ...intent behind punishment, which can sometimes diverge sharply from its immediate incentive value. In other words, people respond to punishment not as a reward to be maximized, but as a communicative signal to be interpreted. Specifically, we show that people expect harmless, yet communicative, punishments to be as effective as harmful punishments (Experiment 1). Under some situations, people display a systematic preference for harmless punishments over more canonical, harmful punishments (Experiment 2). People readily seek out and infer the communicative message inherent in a punishment (Experiment 3). And people expect that learning from punishment depends on the ease with which its communicative intent can be inferred (Experiment 4). Taken together, these findings demonstrate that people expect punishment to be constructed and interpreted as a communicative act.
Decision-making algorithms face a basic tradeoff between accuracy and effort (i.e., computational demands). It is widely agreed that humans can choose between multiple decision-making processes that ...embody different solutions to this tradeoff: Some are computationally cheap but inaccurate, whereas others are computationally expensive but accurate. Recent progress in understanding this tradeoff has been catalyzed by formalizing it in terms of model-free (i.e., habitual) versus model-based (i.e., planning) approaches to reinforcement learning. Intuitively, if two tasks offer the same rewards for accuracy but one of them is much more demanding, we might expect people to rely on habit more in the difficult task: Devoting significant computation to achieve slight marginal accuracy gains would not be “worth it.” We test and verify this prediction in a sequential reinforcement learning task. Because our paradigm is amenable to formal analysis, it contributes to the development of a computational model of how people balance the costs and benefits of different decision-making processes in a task-specific manner; in other words, how we decide when hard thinking is worth it.
A central tenet of contemporary moral psychology is that people typically reject active forms of utilitarian sacrifice. Yet, evidence for secularization and declining empathic concern in recent ...decades suggests the possibility of systematic change in this attitude. In the present study, we employ hypothetical dilemmas to investigate whether judgments of utilitarian sacrifice are becoming more permissive over time. In a cross-sectional design, age negatively predicted utilitarian moral judgment (Study 1). To examine whether this pattern reflected processes of maturation, we asked a panel to re-evaluate several moral dilemmas after an eight-year interval but observed no overall change (Study 2). In contrast, a more recent age-matched sample revealed greater endorsement of utilitarian sacrifice in a time-lag design (Study 3). Taken together, these results suggest that today’s younger cohorts increasingly endorse a utilitarian resolution of sacrificial moral dilemmas.
Action, Outcome, and Value Cushman, Fiery
Personality and social psychology review,
08/2013, Volume:
17, Issue:
3
Journal Article
Dual-system approaches to psychology explain the fundamental properties of human judgment, decision making, and behavior across diverse domains. Yet, the appropriate characterization of each system ...is a source of debate. For instance, a large body of research on moral psychology makes use of the contrast between “emotional” and “rational/cognitive” processes, yet even the chief proponents of this division recognize its shortcomings. Largely independently, research in the computational neurosciences has identified a broad division between two algorithms for learning and choice derived from formal models of reinforcement learning. One assigns value to actions intrinsically based on past experience, while another derives representations of value from an internally represented causal model of the world. This division between action- and outcome-based value representation provides an ideal framework for a dual-system theory in the moral domain.