Habits Without Values Miller, Kevin J; Shenhav, Amitai; Ludvig, Elliot A
Psychological review,
03/2019, Volume:
126, Issue:
2
Journal Article
Peer reviewed
Open access
Habits form a crucial component of behavior. In recent years, key computational models have conceptualized habits as arising from model-free reinforcement learning mechanisms, which typically select ...between available actions based on the future value expected to result from each. Traditionally, however, habits have been understood as behaviors that can be triggered directly by a stimulus, without requiring the animal to evaluate expected outcomes. Here, we develop a computational model instantiating this traditional view, in which habits develop through the direct strengthening of recently taken actions rather than through the encoding of outcomes. We demonstrate that this model accounts for key behavioral manifestations of habits, including insensitivity to outcome devaluation and contingency degradation, as well as the effects of reinforcement schedule on the rate of habit formation. The model also explains the prevalent observation of perseveration in repeated-choice tasks as an additional behavioral manifestation of the habit system. We suggest that mapping habitual behaviors onto value-free mechanisms provides a parsimonious account of existing behavioral and neural data. This mapping may provide a new foundation for building robust and comprehensive models of the interaction of habits with other, more goal-directed types of behaviors and help to better guide research into the neural mechanisms underlying control of instrumental behavior more generally.
All adaptive organisms face the fundamental tradeoff between pursuing a known reward (exploitation) and sampling lesser-known options in search of something better (exploration). Theory suggests at ...least two strategies for solving this dilemma: a directed strategy in which choices are explicitly biased toward information seeking, and a random strategy in which decision noise leads to exploration by chance. In this work we investigated the extent to which humans use these two strategies. In our "Horizon task," participants made explore-exploit decisions in two contexts that differed in the number of choices that they would make in the future (the time horizon). Participants were allowed to make either a single choice in each game (horizon 1), or 6 sequential choices (horizon 6), giving them more opportunity to explore. By modeling the behavior in these two conditions, we were able to measure exploration-related changes in decision making and quantify the contributions of the two strategies to behavior. We found that participants were more information seeking and had higher decision noise with the longer horizon, suggesting that humans use both strategies to solve the exploration-exploitation dilemma. We thus conclude that both information seeking and choice variability can be controlled and put to use in the service of exploration.
A significant challenge for real-world automated vehicles (AVs) is their interaction with human pedestrians. This paper develops a methodology to directly elicit the AV behaviour pedestrians find ...suitable by collecting quantitative data that can be used to measure and improve an algorithm's performance. Starting with a Deep Q Network (DQN) trained on a simple Pygame/Python-based pedestrian crossing environment, the reward structure was adapted to allow adjustment by human feedback. Feedback was collected by eliciting behavioural judgements collected from people in a controlled environment. The reward was shaped by the inter-action vector, decomposed into feature aspects for relevant behaviours, thereby facilitating both implicit preference selection and explicit task discovery in tandem. Using computational RL and behavioural-science techniques, we harness a formal iterative feedback loop where the rewards were repeatedly adapted based on human behavioural judgments. Experiments were conducted with 124 participants that showed strong initial improvement in the judgement of AV behaviours with the adaptive reward structure. The results indicate that the primary avenue for enhancing vehicle behaviour lies in the predictability of its movements when introduced. More broadly, recognising AV behaviours that receive favourable human judgments can pave the way for enhanced performance.
When faced with risky decisions, people tend to be risk averse for gains and risk seeking for losses (the reflection effect). Studies examining this risk-sensitive decision making, however, typically ...ask people directly what they would do in hypothetical choice scenarios. A recent flurry of studies has shown that when these risky decisions include rare outcomes, people make different choices for explicitly described probabilities than for experienced probabilistic outcomes. Specifically, rare outcomes are overweighted when described and underweighted when experienced. In two experiments, we examined risk-sensitive decision making when the risky option had two equally probable (50%) outcomes. For experience-based decisions, there was a reversal of the reflection effect with greater risk seeking for gains than for losses, as compared to description-based decisions. This fundamental difference in experienced and described choices cannot be explained by the weighting of rare events and suggests a separate subjective utility curve for experience.
When making decisions on the basis of past experiences, people must rely on their memories. Human memory has many well-known biases, including the tendency to better remember highly salient events. ...We propose an extreme-outcome rule, whereby this memory bias leads people to overweight the largest gains and largest losses, leading to more risk seeking for relative gains than for relative losses. To test this rule, in two experiments, people repeatedly chose between fixed and risky options, where the risky option led equiprobably to more or less than did the fixed option. As was predicted, people were more risk seeking for relative gains than for relative losses. In subsequent memory tests, people tended to recall the extreme outcome first and also judged the extreme outcome as having occurred more frequently. Across individuals, risk preferences in the risky-choice task correlated with these memory biases. This extreme-outcome rule presents a novel mechanism through which memory influences decision making.
In many important real-world decision domains, such as finance, the environment, and health, behavior is strongly influenced by experience. Renewed interest in studying this influence led to ...important advancements in the understanding of these decisions from experience (DfE) in the last 20 years. Building on this literature, we suggest ways the standard experimental design should be extended to better approach important real-world DfE. These extensions include, for example, introducing more complex choice situations, delaying feedback, and including social interactions. When acting upon experiences in these richer and more complicated environments, extensive cognitive processes go into making a decision. Therefore, we argue for integrating cognitive processes more explicitly into experimental research in DfE. These cognitive processes include attention to and perception of numeric and nonnumeric experiences, the influence of episodic and semantic memory, and the mental models involved in learning processes. Understanding these basic cognitive processes can advance the modeling, understanding and prediction of DfE in the laboratory and in the real world. We highlight the potential of experimental research in DfE for theory integration across the behavioral, decision, and cognitive sciences. Furthermore, this research could lead to new methodology that better informs decision-making and policy interventions.
Rare and extreme outcomes in risky choice Mason, Alice; Ludvig, Elliot A.; Spetch, Marcia L. ...
Psychonomic bulletin & review,
06/2024, Volume:
31, Issue:
3
Journal Article
Peer reviewed
Open access
Many real-world decisions involving rare events also involve extreme outcomes. Despite this confluence, decisions-from-experience research has only examined the impact of rarity and extremity in ...isolation. With rare events, people typically choose as if they underestimate the probability of a rare outcome happening. Separately, people typically overestimate the probability of an extreme outcome happening. Here, for the first time, we examine the confluence of these two biases in decisions-from-experience. In a between-groups behavioural experiment, we examine people’s risk preferences for rare extreme outcomes and for rare non-extreme outcomes. When outcomes are both rare and extreme, people’s risk preferences shift away from traditional risk patterns for rare events: they show reduced underweighting for events that are both rare and extreme. We simulate these results using a small-sample model of decision-making that accounts for both the underweighting of rare events and the overweighting of extreme events. These separable influences on risk preferences suggest that to understand real-world risk for rare events we must also consider the extremity of the outcomes.
Abstract
Many deviations from rational choice imply the neglect of important evidence and suggest the use of simple heuristics. In contrast, other deviations imply sensitivity to irrelevant evidence ...and suggest the use of overly complex rules. The current analysis takes two steps toward identifying the conditions that trigger these contradictory deviations from efficient reasoning. The first step involves a theoretical analysis. It shows that the contradictory deviations can be captured without assuming the use of rules of different complexity in different settings. Both deviations can be the product of a reliance on small samples of similar past experiences. This reliance on small samples triggers apparent overcomplexity when the optimal rule is simple, but more complex rules yield better outcomes in most cases; the opposite tendency, oversimplification, emerges when the optimal rule is complex, and simple rules yield better outcomes in most cases. The second step involves a preregistered experiment with 325 participants (Mechanical Turk workers). The experiment shows that human decision makers exhibit the pattern predicted by the reliance‐on‐small‐samples assumption. In the experiment, participants chose between the status quo and a risky alternative in a multi‐attribute decision with three binary cues. They used uninformative cues when this strategy was best in most cases yet ignored two informative cues when this strategy was best in most cases. In addition, describing the cues as recommendations given by three experts increased the tendency to follow the modal recommendation (even when reliance on only one of the experts was optimal), but people still behaved as though they relied on a small sample of past experiences.
Biased confabulation in risky choice Mason, Alice; Madan, Christopher R.; Simonsen, Nick ...
Cognition,
December 2022, 2022-12-00, 20221201, Volume:
229
Journal Article
Peer reviewed
Open access
When people make risky decisions based on past experience, they must rely on memory. The nature of the memory representations that support these decisions is not yet well understood. A key question ...concerns the extent to which people recall specific past episodes or whether they have learned a more abstract rule from their past experience. To address this question, we examined the precision of the memories used in risky decisions-from-experience. In three pre-registered experiments, we presented people with risky options, where the outcomes were drawn from continuous ranges (e.g., 100–190 or 500–590), and then assessed their memories for the outcomes experienced. In two preferential tasks, people were more risk seeking for high-value than low-value options, choosing as though they overweighted the outcomes from more extreme ranges. Moreover, in two preferential tasks and a parallel evaluation task, people were very poor at recalling the exact outcomes encountered, but rather confabulated outcomes that were consistent with the outcomes they had seen and were biased towards the more extreme ranges encountered. This common pattern suggests that the observed decision bias in the preferential task reflects a basic cognitive process to overweight extreme outcomes in memory. These results highlight the importance of the edges of the distribution in providing the encoding context for memory recall. They also suggest that episodic memory influences decision-making through gist memory and not through direct recall of specific instances.