The concept of free energy has its origins in 19th century thermodynamics, but has recently found its way into the behavioral and neural sciences, where it has been promoted for its wide ...applicability and has even been suggested as a fundamental principle of understanding intelligent behavior and brain function. We argue that there are essentially two different notions of free energy in current models of intelligent agency, that can both be considered as applications of Bayesian inference to the problem of action selection: one that appears when trading off accuracy and uncertainty based on a general maximum entropy principle, and one that formulates action selection in terms of minimizing an error measure that quantifies deviations of beliefs and policies from given reference models. The first approach provides a normative rule for action selection in the face of model uncertainty or when information processing capabilities are limited. The second approach directly aims to formulate the action selection problem as an inference problem in the context of Bayesian brain theories, also known as Active Inference in the literature. We elucidate the main ideas and discuss critical technical and conceptual issues revolving around these two notions of free energy that both claim to apply at all levels of decision-making, from the high-level deliberation of reasoning down to the low-level information processing of perception.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
In its most basic form, decision-making can be viewed as a computational process that progressively eliminates alternatives, thereby reducing uncertainty. Such processes are generally costly, meaning ...that the amount of uncertainty that can be reduced is limited by the amount of available computational resources. Here, we introduce the notion of elementary computation based on a fundamental principle for probability transfers that reduce uncertainty. Elementary computations can be considered as the inverse of Pigou-Dalton transfers applied to probability distributions, closely related to the concepts of majorization, T-transforms, and generalized entropies that induce a preorder on the space of probability distributions. Consequently, we can define resource cost functions that are order-preserving and therefore monotonic with respect to the uncertainty reduction. This leads to a comprehensive notion of decision-making processes with limited resources. Along the way, we prove several new results on majorization theory, as well as on entropy and divergence measures.
Perfectly rational decision-makers maximize expected utility, but crucially ignore the resource costs incurred when determining optimal actions. Here, we propose a thermodynamically inspired ...formalization of bounded rational decision-making where information processing is modelled as state changes in thermodynamic systems that can be quantified by differences in free energy. By optimizing a free energy, bounded rational decision-makers trade off expected utility gains and information-processing costs measured by the relative entropy. As a result, the bounded rational decision-making problem can be rephrased in terms of well-known variational principles from statistical physics. In the limit when computational costs are ignored, the maximum expected utility principle is recovered. We discuss links to existing decision-making frameworks and applications to human decision-making experiments that are at odds with expected utility theory. Since most of the mathematical machinery can be borrowed from statistical physics, the main contribution is to re-interpret the formalism of thermodynamic free-energy differences in terms of bounded rational decision-making and to discuss its relationship to human decision-making experiments.
One notable weakness of current machine learning algorithms is the poor ability of models to solve new problems without forgetting previously acquired knowledge. The Continual Learning paradigm has ...emerged as a protocol to systematically investigate settings where the model sequentially observes samples generated by a series of tasks. In this work, we take a task-agnostic view of continual learning and develop a hierarchical information-theoretic optimality principle that facilitates a trade-off between learning and forgetting. We derive this principle from a Bayesian perspective and show its connections to previous approaches to continual learning. Based on this principle, we propose a neural network layer, called the Mixture-of-Variational-Experts layer, that alleviates forgetting by creating a set of information processing paths through the network which is governed by a gating policy. Equipped with a diverse and specialized set of parameters, each path can be regarded as a distinct sub-network that learns to solve tasks. To improve expert allocation, we introduce diversity objectives, which we evaluate in additional ablation studies. Importantly, our approach can operate in a task-agnostic way, i.e., it does not require task-specific knowledge, as is the case with many existing continual learning algorithms. Due to the general formulation based on generic utility functions, we can apply this optimality principle to a large variety of learning problems, including supervised learning, reinforcement learning, and generative modeling. We demonstrate the competitive performance of our method on continual reinforcement learning and variants of the MNIST, CIFAR-10, and CIFAR-100 datasets.
Cycling advocates have recently argued that low-income and minority communities across the U.S. have disproportionately low access to bike lanes. To date, however, quantitative evidence of ...disparities in access to bike lanes has been limited to a small number of cities. We addressed this research gap by examining cross-sectional associations between bike lanes and sociodemographic characteristics at the block group level for 22 large U.S. cities (n = 21,843 block groups). Dependent variables included the presence (yes/no), coverage, connectivity, and proximity of bike lanes, measured using secondary GIS data collected by each of the 22 cities between 2012 and 2016. Primary independent variables included indicators of race, ethnicity, educational attainment, income, poverty, and a composite socioeconomic status (SES) index, all measured using data from the 2011–2015 American Community Survey. We used linear and logistic multilevel mixed-effects regression models to estimate associations between these sociodemographic characteristics and each bike lane dependent variable, before and after adjusting for traditional indicators of cycling demand (population and employment density, distance to downtown, population age structure, bicycle commuting levels). In unadjusted associations, disadvantaged block groups (i.e. lower SES, higher proportions of minority residents) had significantly lower access to bike lanes. After adjusting for indicators of cycling demand, access to bike lanes was lower in block groups with particular types of disadvantage (lower educational attainment, higher proportions of Hispanic residents, lower composite SES) but not in those with other types of disadvantage (higher proportions of black residents, lower income, higher poverty). These results provide empirical support for advocates' claims of disparities in bike lane access, suggesting the importance of more closely considering social equity in bicycle planning and advocacy.
•Quantitative evidence of disparities in access to bike lanes is currently limited.•We examine bike lanes and area-level sociodemographic characteristics in 22 cities.•Disadvantaged areas have lower access to bike lanes in unadjusted regressions.•Several of these disparities persist after adjusting for measures of cycling demand.•The results suggest disparities in access to bike lanes, with equity implications.
When we have learned a motor skill, such as cycling or ice-skating, we can rapidly generalize to novel tasks, such as motorcycling or rollerblading 1–8. Such facilitation of learning could arise ...through two distinct mechanisms by which the motor system might adjust its control parameters. First, fast learning could simply be a consequence of the proximity of the original and final settings of the control parameters. Second, by structural learning 9–14, the motor system could constrain the parameter adjustments to conform to the control parameters' covariance structure. Thus, facilitation of learning would rely on the novel task parameters' lying on the structure of a lower-dimensional subspace that can be explored more efficiently. To test between these two hypotheses, we exposed subjects to randomly varying visuomotor tasks of fixed structure. Although such randomly varying tasks are thought to prevent learning, we show that when subsequently presented with novel tasks, subjects exhibit three key features of structural learning: facilitated learning of tasks with the same structure, strong reduction in interference normally observed when switching between tasks that require opposite control strategies, and preferential exploration along the learned structure. These results suggest that skill generalization relies on task variation and structural learning.
The formation of cooperative groups of agents with limited information-processing capabilities to solve complex problems together is a fundamental building principle that cuts through multiple scales ...in biology from groups of cells to groups of humans. Here, we study an experimental paradigm where a group of humans is joined together to solve a common sensorimotor task that cannot be achieved by a single agent but relies on the cooperation of the group. In particular, each human acts as a neuron-like binary decision-maker that determines in each moment of time whether to be active or not. Inspired by the population vector method for movement decoding, each neuron-like decision-maker is assigned a preferred movement direction that the decision-maker is ignorant about. From the population vector reflecting the group activity, the movement of a cursor is determined, and the task for the group is to steer the cursor into a predefined target. As the preferred movement directions are unknown and players are not allowed to communicate, the group has to learn a control strategy on the fly from the shared visual feedback. Performance is analyzed by learning speed and accuracy, action synchronization, and group coherence. We study four different computational models of the observed behavior, including a perceptron model, a reinforcement learning model, a Bayesian inference model and a Thompson sampling model that efficiently approximates Bayes optimal behavior. The Bayes and especially the Thompson model excel in predicting the human group behavior compared to the other models, suggesting that internal models are crucial for adaptive coordination. We discuss benefits and limitations of our paradigm regarding a better understanding of distributed information processing.
•Commute mode choice analysis conducted in Barcelona, Spain.•Objective and self-reported infrastructure measures are associated with cycling.•Self-reported measures have stronger associations than ...objective measures.•Cycling and public transport are competing modes.•Education and modal integration are promising short-term cycling interventions.
Cycling for transportation has become an increasingly important component of strategies to address public health, climate change, and air quality concerns in urban centers. Within this context, planners and policy makers would benefit from an improved understanding of available interventions and their relative effectiveness for cycling promotion. We examined predictors of bicycle commuting that are relevant to planning and policy intervention, particularly those amenable to short- and medium-term action.
We estimated a travel mode choice model using data from a survey of 765 commuters who live and work within the municipality of Barcelona. We considered how the decision to commute by bicycle was associated with cycling infrastructure, bike share availability, travel demand incentives, and other environmental attributes (e.g., public transport availability). Self-reported and objective (GIS-based) measures were compared. Point elasticities and marginal effects were calculated to assess the relative explanatory power of the independent variables considered.
While both self-reported and objective measures of access to cycling infrastructure were associated with bicycle commuting, self-reported measures had stronger associations. Bicycle commuting had positive associations with access to bike share stations but inverse associations with access to public transport stops. Point elasticities suggested that bicycle commuting has a mild negative correlation with public transport availability (−0.136), bike share availability is more important at the work location (0.077) than at home (0.034), and bicycle lane presence has a relatively small association with bicycle commuting (0.039). Marginal effects suggested that provision of an employer-based incentive not to commute by private vehicle would be associated with an 11.3% decrease in the probability of commuting by bicycle, likely reflecting the typical emphasis of such incentives on public transport.
The results provide evidence of modal competition between cycling and public transport, through the presence of public transport stops and the provision of public transport-oriented travel demand incentives. Education and awareness campaigns that influence perceptions of cycling infrastructure availability, travel demand incentives that encourage cycling, and policies that integrate public transport and cycling may be promising and cost-effective strategies to promote cycling in the short to medium term.
Rate distortion theory describes how to communicate relevant information most efficiently over a channel with limited capacity. One of the many applications of rate distortion theory is bounded ...rational decision making, where decision makers are modeled as information channels that transform sensory input into motor output under the constraint that their channel capacity is limited. Such a bounded rational decision maker can be thought to optimize an objective function that trades off the decision maker’s utility or cumulative reward against the information processing cost measured by the mutual information between sensory input and motor output. In this study, we interpret a spiking neuron as a bounded rational decision maker that aims to maximize its expected reward under the computational constraint that the mutual information between the neuron’s input and output is upper bounded. This abstract computational constraint translates into a penalization of the deviation between the neuron’s instantaneous and average firing behavior. We derive a synaptic weight update rule for such a rate distortion optimizing neuron and show in simulations that the neuron efficiently extracts reward-relevant information from the input by trading off its synaptic strengths against the collected reward.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
The Nash equilibrium concept has previously been shown to be an important tool to understand human sensorimotor interactions, where different actors vie for minimizing their respective effort while ...engaging in a multi-agent motor task. However, it is not clear how such equilibria are reached. Here, we compare different reinforcement learning models to human behavior engaged in sensorimotor interactions with haptic feedback based on three classic games, including the prisoner's dilemma, and the symmetric and asymmetric matching pennies games. We find that a discrete analysis that reduces the continuous sensorimotor interaction to binary choices as in classical matrix games does not allow to distinguish between the different learning algorithms, but that a more detailed continuous analysis with continuous formulations of the learning algorithms and the game-theoretic solutions affords different predictions. In particular, we find that Q-learning with intrinsic costs that disfavor deviations from average behavior explains the observed data best, even though all learning algorithms equally converge to admissible Nash equilibrium solutions. We therefore conclude that it is important to study different learning algorithms for understanding sensorimotor interactions, as such behavior cannot be inferred from a game-theoretic analysis alone, that simply focuses on the Nash equilibrium concept, as different learning algorithms impose preferences on the set of possible equilibrium solutions due to the inherent learning dynamics.