Neural mechanisms that support flexible sensorimotor computations are not well understood. In a dynamical system whose state is determined by interactions among neurons, computations can be rapidly ...reconfigured by controlling the system’s inputs and initial conditions. To investigate whether the brain employs such control mechanisms, we recorded from the dorsomedial frontal cortex of monkeys trained to measure and produce time intervals in two sensorimotor contexts. The geometry of neural trajectories during the production epoch was consistent with a mechanism wherein the measured interval and sensorimotor context exerted control over cortical dynamics by adjusting the system’s initial condition and input, respectively. These adjustments, in turn, set the speed at which activity evolved in the production epoch, allowing the animal to flexibly produce different time intervals. These results provide evidence that the language of dynamical systems can be used to parsimoniously link brain activity to sensorimotor computations.
Display omitted
•Monkeys performed a timing task demanding flexible cognitive control•The organization of neural trajectories in frontal cortex reflected task demands•Flexible control was best explained in terms of inputs and initial conditions•Recurrent neural network models validated the inferred control principles
Remington et al. employ a dynamical systems perspective to understand how the brain flexibly controls timed movements. Results suggest that neurons in the frontal cortex form a recurrent network whose behavior is flexibly controlled by inputs and initial conditions.
Musicians can perform at different tempos, speakers can control the cadence of their speech, and children can flexibly vary their temporal expectations of events. To understand the neural basis of ...such flexibility, we recorded from the medial frontal cortex of nonhuman primates trained to produce different time intervals with different effectors. Neural responses were heterogeneous, nonlinear, and complex, and they exhibited a remarkable form of temporal invariance: firing rate profiles were temporally scaled to match the produced intervals. Recording from downstream neurons in the caudate and from thalamic neurons projecting to the medial frontal cortex indicated that this phenomenon originates within cortical networks. Recurrent neural network models trained to perform the task revealed that temporal scaling emerges from nonlinearities in the network and that the degree of scaling is controlled by the strength of external input. These findings demonstrate a simple and general mechanism for conferring temporal flexibility upon sensorimotor and cognitive functions.
The neuroscience of perception has recently been revolutionized with an integrative modeling approach in which computation, brain function, and behavior are linked across many datasets and many ...computational models. By revealing trends across models, this approach yields novel insights into cognitive and neural mechanisms in the target domain. We here present a systematic study taking this approach to higher-level cognition: human language processing, our species' signature cognitive skill. We find that the most powerful "transformer" models predict nearly 100% of explainable variance in neural responses to sentences and generalize across different datasets and imaging modalities (functional MRI and electrocorticography). Models' neural fits ("brain score") and fits to behavioral responses are both strongly correlated with model accuracy on the next-word prediction task (but not other language tasks). Model architecture appears to substantially contribute to neural fit. These results provide computationally explicit evidence that predictive processing fundamentally shapes the language comprehension mechanisms in the human brain.
Motor adaptation paradigms provide a quantitative method to study short-term modification of motor commands. Despite the growing understanding of the role motion states (e.g., velocity) play in this ...form of motor learning, there is little information on the relative stability of memories based on these movement characteristics, especially in comparison to the initial adaptation. Here, we trained subjects to make reaching movements perturbed by force patterns dependent upon either limb position or velocity. Following training, subjects were exposed to a series of error-clamp trials to measure the temporal characteristics of the feedforward motor output during the decay of learning. The compensatory force patterns were largely based on the perturbation kinematic (e.g., velocity), but also showed a small contribution from the other motion kinematic (e.g., position). However, the velocity contribution in response to the position-based perturbation decayed at a slower rate than the position contribution to velocity-based training, suggesting a difference in stability. Next, we modified a previous model of motor adaptation to reflect this difference and simulated the behavior for different learning goals. We were interested in the stability of learning when the perturbations were based on different combinations of limb position or velocity that subsequently resulted in biased amounts of motion-based learning. We trained additional subjects on these combined motion-state perturbations and confirmed the predictions of the model. Specifically, we show that (1) there is a significant separation between the observed gain-space trajectories for the learning and decay of adaptation and (2) for combined motion-state perturbations, the gain associated to changes in limb position decayed at a faster rate than the velocity-dependent gain, even when the position-dependent gain at the end of training was significantly greater. Collectively, these results suggest that the state-dependent adaptation associated with movement velocity is relatively more stable than that based on position.
Artificial neural networks have emerged as computationally plausible models of human language processing. A major criticism of these models is that the amount of training data they receive far ...exceeds that of humans during language learning. Here, we use two complementary approaches to ask how the models' ability to capture human fMRI responses to sentences is affected by the amount of training data. First, we evaluate GPT-2 models trained on 1 million, 10 million, 100 million, or 1 billion words against an fMRI benchmark. We consider the 100-million-word model to be developmentally plausible in terms of the amount of training data given that this amount is similar to what children are estimated to be exposed to during the first 10 years of life. Second, we test the performance of a GPT-2 model trained on a 9-billion-token dataset to reach state-of-the-art next-word prediction performance on the human benchmark at different stages during training. Across both approaches, we find that (i) the models trained on a developmentally plausible amount of data already achieve near-maximal performance in capturing fMRI responses to sentences. Further, (ii) lower perplexity-a measure of next-word prediction performance-is associated with stronger alignment with human data, suggesting that models that have received enough training to achieve sufficiently high next-word prediction performance also acquire representations of sentences that are predictive of human fMRI responses. In tandem, these findings establish that although
training is necessary for the models' predictive ability, a developmentally realistic amount of training (∼100 million words) may suffice.
Humans rapidly adapt reaching movements in response to perturbations (e.g., manipulations of movement dynamics or visual feedback). Following a break, when reexposed to the same perturbation, ...subjects demonstrate savings, a faster learning rate compared with the time course of initial training. Although this has been well studied, there are open questions on the extent early savings reflects the rapid recall of previous performance. To address this question, we examined how the properties of initial training (duration and final adaptive state) influence initial single-trial adaptation to force-field perturbations when training sessions were separated by 24 h. There were two main groups that were distinct based on the presence or absence of a washout period at the end of
(with washout vs. without washout). We also varied the training duration on
(15, 30, 90, or 160 training trials), resulting in 8 subgroups of subjects. We show that single-trial adaptation on
scaled with training duration, even for similar asymptotic levels of learning on
of training. Interestingly, the temporal force profile following the first perturbation on
matched that at the end of
for the longest training duration group that did not complete the washout. This correspondence persisted but was significantly lower for shorter training durations and the washout subject groups. Collectively, the results suggest that the adaptation observed very early in reexposure results from the rapid recall of the previously learned motor recalibration but is highly dependent on the initial training duration and final adaptive state.
The extent initial readaptation reflects the recall of previous motor performance is largely unknown. We examined early single-trial force-field adaptation on the second day of training and distinguished initial retention from recall. We found that the single-trial adaptation following the 24-h break matched that at the end of the first day, but this recall was modified by the training duration and final level of learning on the first day of training.
Predicting upcoming events is critical to our ability to interact with our environment. Transformer models, trained on next-word prediction, appear to construct representations of linguistic input ...that can support diverse downstream tasks. But how does a predictive objective shape such representations? Inspired by recent work in vision (Henaff et al., 2019), we test a hypothesis about predictive representations of autoregressive transformers. In particular, we test whether the neural trajectory of a sentence becomes progressively straighter as it passes through the network layers. The key insight is that straighter trajectories should facilitate prediction via linear extrapolation. We quantify straightness using a 1-dimensional curvature metric, and present four findings in support of the trajectory straightening hypothesis: i) In trained models, the curvature decreases from the early to the deeper layers of the network. ii) Models that perform better on the next-word prediction objective exhibit greater decreases in curvature, suggesting that this improved ability to straighten sentence trajectories may be the driver of better language modeling performance. iii) Given the same linguistic context, the sequences that are generated by the model have lower curvature than the actual continuations observed in a language corpus, suggesting that the model favors straighter trajectories for making predictions. iv) A consistent relationship holds between the average curvature and the average surprisal of sentences in the deep model layers, such that sentences with straighter trajectories also have lower surprisal. Importantly, untrained models do not exhibit these behaviors. In tandem, these results support the trajectory straightening hypothesis and provide a possible mechanism for how the geometry of the internal representations of autoregressive models supports next word prediction.