Quantitative modeling of human brain activity can provide crucial insights about cortical representations 1, 2 and can form the basis for brain decoding devices 3–5. Recent functional magnetic ...resonance imaging (fMRI) studies have modeled brain activity elicited by static visual patterns and have reconstructed these patterns from brain activity 6–8. However, blood oxygen level-dependent (BOLD) signals measured via fMRI are very slow 9, so it has been difficult to model brain activity elicited by dynamic stimuli such as natural movies. Here we present a new motion-energy 10, 11 encoding model that largely overcomes this limitation. The model describes fast visual information and slow hemodynamics by separate components. We recorded BOLD signals in occipitotemporal visual cortex of human subjects who watched natural movies and fit the model separately to individual voxels. Visualization of the fit models reveals how early visual areas represent the information in movies. To demonstrate the power of our approach, we also constructed a Bayesian decoder 8 by combining estimated encoding models with a sampled natural movie prior. The decoder provides remarkable reconstructions of the viewed movies. These results demonstrate that dynamic brain activity measured under naturalistic conditions can be decoded using current fMRI technology.
► A new motion-energy model can describe BOLD signals evoked by natural movies ► The model reveals how motion information is represented in early visual areas ► Speed tuning in human early visual areas depends on eccentricity ► The model provides reconstructions of natural movies from evoked BOLD signals
Humans can see and name thousands of distinct object and action categories, so it is unlikely that each category is represented in a distinct brain area. A more efficient scheme would be to represent ...categories as locations in a continuous semantic space mapped smoothly across the cortical surface. To search for such a space, we used fMRI to measure human brain activity evoked by natural movies. We then used voxelwise models to examine the cortical representation of 1,705 object and action categories. The first few dimensions of the underlying semantic space were recovered from the fit models by principal components analysis. Projection of the recovered semantic space onto cortical flat maps shows that semantic selectivity is organized into smooth gradients that cover much of visual and nonvisual cortex. Furthermore, both the recovered semantic space and the cortical organization of the space are shared across different individuals.
Display omitted
► The brain represents object and action categories within a continuous semantic space ► This semantic space is organized into broad gradients across the cortical surface ► This semantic space is shared across different individuals
Humans perceive thousands of object and action categories, so each category is likely not represented in a distinct brain area. Huth et al. use natural movie stimuli to show that categories are represented using a continuous semantic space that is shared across individuals and broadly mapped across the cortex.
An integral part of human language is the capacity to extract meaning from spoken and written words, but the precise relationship between brain representations of information perceived by listening ...versus reading is unclear. Prior neuroimaging studies have shown that semantic information in spoken language is represented in multiple regions in the human cerebral cortex, while amodal semantic information appears to be represented in a few broad brain regions. However, previous studies were too insensitive to determine whether semantic representations were shared at a fine level of detail rather than merely at a coarse scale. We used fMRI to record brain activity in two separate experiments while participants listened to or read several hours of the same narrative stories, and then created voxelwise encoding models to characterize semantic selectivity in each voxel and in each individual participant. We find that semantic tuning during listening and reading are highly correlated in most semantically selective regions of cortex, and models estimated using one modality accurately predict voxel responses in the other modality. These results suggest that the representation of language semantics is independent of the sensory modality through which the semantic information is received.
Humans can comprehend the meaning of words from both spoken and written language. It is therefore important to understand the relationship between the brain representations of spoken or written text. Here, we show that although the representation of semantic information in the human brain is quite complex, the semantic representations evoked by listening versus reading are almost identical. These results suggest that the representation of language semantics is independent of the sensory modality through which the semantic information is received.
Little is known about how attention changes the cortical representation of sensory information in humans. On the basis of neurophysiological evidence, we hypothesized that attention causes tuning ...changes to expand the representation of attended stimuli at the cost of unattended stimuli. To investigate this issue, we used functional magnetic resonance imaging to measure how semantic representation changed during visual search for different object categories in natural movies. We found that many voxels across occipito-temporal and fronto-parietal cortex shifted their tuning toward the attended category. These tuning shifts expanded the representation of the attended category and of semantically related, but unattended, categories, and compressed the representation of categories that were semantically dissimilar to the target. Attentional warping of semantic representation occurred even when the attended category was not present in the movie; thus, the effect was not a target-detection artifact. These results suggest that attention dynamically alters visual representation to optimize processing of behaviorally relevant objects during natural vision.
Semantic information in the human brain is organized into multiple networks, but the fine-grain relationships between them are poorly understood. In this study, we compared semantic maps obtained ...from two functional magnetic resonance imaging experiments in the same participants: one that used silent movies as stimuli and another that used narrative stories. Movies evoked activity from a network of modality-specific, semantically selective areas in visual cortex. Stories evoked activity from another network of semantically selective areas immediately anterior to visual cortex. Remarkably, the pattern of semantic selectivity in these two distinct networks corresponded along the boundary of visual cortex: for visual categories represented posterior to the boundary, the same categories were represented linguistically on the anterior side. These results suggest that these two networks are smoothly joined to form one contiguous map.
Visual area V4 is a midtier cortical area in the ventral visual pathway. It is crucial for visual object recognition and has been a focus of many studies on visual attention. However, there is no ...unifying view of V4's role in visual processing. Neither is there an understanding of how its role in feature processing interfaces with its role in visual attention. This review captures our current knowledge of V4, largely derived from electrophysiological and imaging studies in the macaque monkey. Based on recent discovery of functionally specific domains in V4, we propose that the unifying function of V4 circuitry is to enable selective extraction of specific functional domain-based networks, whether it be by bottom-up specification of object features or by top-down attentionally driven selection.
In this Review, Roe and colleagues discuss our current knowledge of visual area V4 and propose that V4 circuitry enables selective extraction of specific functional domain-based networks to enable visual processing.
Recent multi-voxel pattern classification (MVPC) studies have shown that in early visual cortex patterns of brain activity generated during mental imagery are similar to patterns of activity ...generated during perception. This finding implies that low-level visual features (e.g., space, spatial frequency, and orientation) are encoded during mental imagery. However, the specific hypothesis that low-level visual features are encoded during mental imagery is difficult to directly test using MVPC. The difficulty is especially acute when considering the representation of complex, multi-object scenes that can evoke multiple sources of variation that are distinct from low-level visual features. Therefore, we used a voxel-wise modeling and decoding approach to directly test the hypothesis that low-level visual features are encoded in activity generated during mental imagery of complex scenes. Using fMRI measurements of cortical activity evoked by viewing photographs, we constructed voxel-wise encoding models of tuning to low-level visual features. We also measured activity as subjects imagined previously memorized works of art. We then used the encoding models to determine if putative low-level visual features encoded in this activity could pick out the imagined artwork from among thousands of other randomly selected images. We show that mental images can be accurately identified in this way; moreover, mental image identification accuracy depends upon the degree of tuning to low-level visual features in the voxels selected for decoding. These results directly confirm the hypothesis that low-level visual features are encoded during mental imagery of complex scenes. Our work also points to novel forms of brain–machine interaction: we provide a proof-of-concept demonstration of an internet image search guided by mental imagery.
•A model of representation in early visual cortex decodes mental images of complex scenes.•Mental imagery depends directly upon the encoding of low-level visual features.•Low-level visual features of mental images are encoded by activity in early visual cortex.•Depictive theories of mental imagery are strongly supported by our results.•Brain activity evoked by mental imagery can be used to guide internet image search.
Speech comprehension requires that the brain extract semantic meaning from the spectral features represented at the cochlea. To investigate this process, we performed an fMRI experiment in which five ...men and two women passively listened to several hours of natural narrative speech. We then used voxelwise modeling to predict BOLD responses based on three different feature spaces that represent the spectral, articulatory, and semantic properties of speech. The amount of variance explained by each feature space was then assessed using a separate validation dataset. Because some responses might be explained equally well by more than one feature space, we used a variance partitioning analysis to determine the fraction of the variance that was uniquely explained by each feature space. Consistent with previous studies, we found that speech comprehension involves hierarchical representations starting in primary auditory areas and moving laterally on the temporal lobe: spectral features are found in the core of A1, mixtures of spectral and articulatory in STG, mixtures of articulatory and semantic in STS, and semantic in STS and beyond. Our data also show that both hemispheres are equally and actively involved in speech perception and interpretation. Further, responses as early in the auditory hierarchy as in STS are more correlated with semantic than spectral representations. These results illustrate the importance of using natural speech in neurolinguistic research. Our methodology also provides an efficient way to simultaneously test multiple specific hypotheses about the representations of speech without using block designs and segmented or synthetic speech.
To investigate the processing steps performed by the human brain to transform natural speech sound into meaningful language, we used models based on a hierarchical set of speech features to predict BOLD responses of individual voxels recorded in an fMRI experiment while subjects listened to natural speech. Both cerebral hemispheres were actively involved in speech processing in large and equal amounts. Also, the transformation from spectral features to semantic elements occurs early in the cortical speech-processing stream. Our experimental and analytical approaches are important alternatives and complements to standard approaches that use segmented speech and block designs, which report more laterality in speech processing and associated semantic processing to higher levels of cortex than reported here.
Area MT has been an important target for studies of motion processing. However, previous neurophysiological studies of MT have used simple stimuli that do not contain many of the motion signals that ...occur during natural vision. In this study we sought to determine whether views of area MT neurons developed using simple stimuli can account for MT responses under more naturalistic conditions. We recorded responses from macaque area MT neurons during stimulation with naturalistic movies. We then used a quantitative modeling framework to discover which specific mechanisms best predict neuronal responses under these challenging conditions. We find that the simplest model that accurately predicts responses of MT neurons consists of a bank of V1-like filters, each followed by a compressive nonlinearity, a divisive nonlinearity, and linear pooling. Inspection of the fit models shows that the excitatory receptive fields of MT neurons tend to lie on a single plane within the three-dimensional spatiotemporal frequency domain, and suppressive receptive fields lie off this plane. However, most excitatory receptive fields form a partial ring in the plane and avoid low temporal frequencies. This receptive field organization ensures that most MT neurons are tuned for velocity but do not tend to respond to ambiguous static textures that are aligned with the direction of motion. In sum, MT responses to naturalistic movies are largely consistent with predictions based on simple stimuli. However, models fit using naturalistic stimuli reveal several novel properties of MT receptive fields that had not been shown in prior experiments.
•Using multiple feature spaces in a joint encoding model improves prediction accuracy.•The variance explained by the joint model can be decomposed over feature spaces.•Banded ridge regression ...optimizes the regularization for each feature space.•Banded ridge regression contains an implicit feature-space selection mechanism.•Banded ridge regression can be solved with random search or gradient descent.
Encoding models provide a powerful framework to identify the information represented in brain recordings. In this framework, a stimulus representation is expressed within a feature space and is used in a regularized linear regression to predict brain activity. To account for a potential complementarity of different feature spaces, a joint model is fit on multiple feature spaces simultaneously. To adapt regularization strength to each feature space, ridge regression is extended to banded ridge regression, which optimizes a different regularization hyperparameter per feature space. The present paper proposes a method to decompose over feature spaces the variance explained by a banded ridge regression model. It also describes how banded ridge regression performs a feature-space selection, effectively ignoring non-predictive and redundant feature spaces. This feature-space selection leads to better prediction accuracy and to better interpretability. Banded ridge regression is then mathematically linked to a number of other regression methods with similar feature-space selection mechanisms. Finally, several methods are proposed to address the computational challenge of fitting banded ridge regressions on large numbers of voxels and feature spaces. All implementations are released in an open-source Python package called Himalaya.