Biological visual systems rely on pose estimation of 3D objects to navigate and interact with their environment, but the neural mechanisms and computations for inferring 3D poses from 2D retinal ...images are only partially understood, especially where stereo information is missing. We previously presented evidence that humans infer the poses of 3D objects lying centered on the ground by using the geometrical back-transform from retinal images to viewer-centered world coordinates. This model explained the almost veridical estimation of poses in real scenes and the illusory rotation of poses in obliquely viewed pictures, which includes the “pointing out of the picture” phenomenon. Here we test this model for more varied configurations and find that it needs to be augmented. Five observers estimated poses of sloped, elevated, or off-center 3D sticks in each of 16 different poses displayed on a monitor in frontal and oblique views. Pose estimates in scenes and pictures showed remarkable accuracy and agreement between observers, but with a systematic fronto-parallel bias for oblique poses similar to the ground condition. The retinal projection of the pose of an object sloped wrt the ground depends on the slope. We show that observers’ estimates can be explained by the back-transform derived for close to the correct slope. The back-transform explanation also applies to obliquely viewed pictures and to off-center objects and elevated objects, making it more likely that observers use internalized perspective geometry to make 3D pose inferences while actively incorporating inferences about other aspects of object placement.
Interacting with people and three-dimensional objects depicted on a screen is perceptually different from interacting with them in real life. This difference resides in their corresponding perceptual ...spaces: The former involves pictorial space, and the latter, visual space. Studies have examined the perceptual geometry of pictorial or visual space, but rarely their connection. The current study connected the two spaces using a pointing task and investigated how binocular disparity and motion parallax affect this connection. In a virtual environment, a pointing virtual character was displayed within a frame and the participants rotated him to point at targets in visual space. What binocular disparity and motion parallax specified was independently manipulated, either the two-dimensional surface or its depicted three-dimensional content. In Experiment 1, we changed the virtual character's distance to the screen and found that binocular disparity determines the distance relationship between visual and pictorial space, but also introduces a relief depth expansion of the perceived virtual character. In Experiment 2, we changed the participants' viewing angle relative to the screen and found that motion parallax determines the directional relationship between visual and pictorial space. We discuss the theoretical and practical implications of our results in the context of video-mediated telecommunication.
•Infants explore touchscreens differently than other two-dimensional surfaces.•Understanding touchscreen interactive properties develops by 15 months.•Infants exhibit touchscreen-appropriate ...behaviors towards touchscreens by 15 months.
Infants’ exposure to images presented on screens is increasing with the accelerating use of technology in society and at home. Touchscreen technology provides numerous interactive screen opportunities geared toward infants and toddlers. Touchscreens are unique in that they possess the 2D qualities of a picture, but a set of manipulation possibilities similar to, but distinct from, a 3D object. Research comparing infants’ manual exploration of photographs, objects, and screen images has demonstrated that although 7–10-month-old infants direct different actions towards 3D objects, their exploration of screen images does not differ significantly from their exploration of 2D photographs (Ziemer & Snyder, 2016). The current investigation compares the ways in which 7–10-month-old infants and 15–18-month-old infants manually explore screen images, photographs, and objects. Infants in the older age group were shown examples of objects, photographs, and screen images presented within a well in a table with a Plexiglas® cover to create identical tactile feedback. Coders noted the presence or absence of appropriate actions displayed toward the various surfaces. Results were compared to data collected earlier (Ziemer & Snyder, 2016) to demonstrate the evolution of touchscreen competence across the first years of infant development. By 15–18 months, infants demonstrate an emerging repertoire of touchscreen-appropriate behaviors directed towards touchscreens that is not demonstrated by 7–10-month-old infants. Differences in haptic exploration suggest the beginnings of a touchscreen competence that enables infants to understand and interact with touchscreens in a new way.
Detecting meaning in RSVP at 13 ms per picture Potter, Mary C.; Wyble, Brad; Hagmann, Carl Erick ...
Attention, perception & psychophysics,
02/2014, Letnik:
76, Številka:
2
Journal Article
Recenzirano
Odprti dostop
The visual system is exquisitely adapted to the task of extracting conceptual information from visual input with every new eye fixation, three or four times a second. Here we assess the minimum ...viewing time needed for visual comprehension, using rapid serial visual presentation (RSVP) of a series of six or 12 pictures presented at between 13 and 80 ms per picture, with no interstimulus interval. Participants were to detect a picture specified by a name (e.g.,
smiling couple
) that was given just before or immediately after the sequence. Detection improved with increasing duration and was better when the name was presented before the sequence, but performance was significantly above chance at all durations, whether the target was named before or only after the sequence. The results are consistent with feedforward models, in which an initial wave of neural activity through the ventral stream is sufficient to allow identification of a complex visual stimulus in a single forward pass. Although we discuss other explanations, the results suggest that neither reentrant processing from higher to lower levels nor advance information about the stimulus is necessary for the conscious detection of rapidly presented, complex visual information.
This paper argues that the still-emerging paradigm of situated cognition requires a more systematic perspective on media to capture the enculturation of the human mind. By virtue of being media, ...cultural artifacts present central experiential models of the world for our embodied minds to latch onto. The paper identifies references to external media within
embodied, extended, enactive
, and
predictive
approaches to cognition, which remain underdeveloped in terms of the profound impact that media have on our mind. To grasp this impact, I propose an enactive account of media that is based on expansive habits as media-structured, embodied ways of bringing forth meaning and new domains of values. We apply such habits, for instance, when seeing a picture or perceiving a movie. They become established through a process of reciprocal adaptation between media artifacts and organisms and define the range of viable actions within such a media ecology. Within an artifactual habit, we then become attuned to a specific media work (e.g., a TV series, a picture, a text, or even a city) that engages us. Both the plurality of habits and the dynamical adjustments within a habit require a more flexible neural architecture than is addressed by classical cognitive neuroscience. To detail how neural and media processes interlock, I will introduce the concept of
neuromediality
and discuss radical predictive processing accounts that could contribute to the externalization of the mind by treating media themselves as generative models of the world. After a short primer on general media theory, I discuss media examples in three domains: pictures and moving images; digital media; architecture and the built environment. This discussion demonstrates the need for a
new cognitive media theory
based on enactive artifactual habits—one that will help us gain perspective on the continuous re-mediation of our mind.
Standard philosophical studies on picture perception usually investigated the peculiar nature of pictorial experience and the way aesthetic appreciation can be generated during this experience. ...Recently, however, the philosophical literature has also focused on a new aspect of picture perception: the possible involvement that the visual states related to action processing may actually play in pictorial experience. But this role has been studied only in relation to the understanding of the nature of pictorial experience, qua visual experience. This paper offers some preliminar speculation, which may guide future research, on the role of action in aesthetic appreciation of pictures.
Hemodynamic and electrophysiological studies indicate differential brain response to emotionally arousing, compared to neutral, pictures. The time course and source distribution of electrocortical ...potentials in response to emotional stimuli, using a high‐density electrode (129‐sensor) array were examined here. Event‐related potentials (ERPs) were recorded while participants viewed pleasant, neutral, and unpleasant pictures. ERP voltages were examined in six time intervals, roughly corresponding to P1, N1, early P3, late P3 and a slow wave window. Differential activity was found for emotional, compared to neutral, pictures at both of the P3 intervals, as well as enhancement of later posterior positivity. Source space projection was performed using a minimum norm procedure that estimates the source currents generating the extracranially measured electrical gradient. Sources of slow wave modulation were located in occipital and posterior parietal cortex, with a right‐hemispheric dominance.
The dominant inferential approach to human 3D perception assumes a model of spatial encoding based on a physical description of objects and space. Prevailing models based on this physicalist approach ...assume that the visual system infers an objective, unitary and mostly veridical representation of the external world. However, careful consideration of the phenomenology of 3D perception challenges these assumptions. I review important aspects of phenomenology, psychophysics and neurophysiology which suggest that human visual perception of 3D objects and space is underwritten by distinct and dissociated spatial encodings that are optimized for specific regions of space. Specifically, I argue that 3D perception is underwritten by at least three distinct encodings for (1) egocentric distance perception at the ambulatory scale, (2) exocentric distance (scaled depth) perception optimized for near space, and (3) perception of object shape and layout (unscaled depth). This tripartite division can more satisfactorily account for the phenomenology, psychophysics and adaptive logic of human 3D perception. This article is part of a discussion meeting issue 'New approaches to 3D vision'.
The choices hidden in photography Hertzmann, Aaron
Journal of vision (Charlottesville, Va.),
10/2022, Letnik:
22, Številka:
11
Journal Article
Recenzirano
Odprti dostop
Photography is often understood as an objective recording of light measurements, in contrast with the subjective nature of painting. This article argues that photography entails making the same kinds ...of choices of color, tone, and perspective as in painting, and surveys examples from film photography and smartphone cameras. Hence, understanding picture perception requires treating photography as just one way to make pictures. More research is needed to understand the effects of these choices on pictorial perception, which in turn could lead to the design of new imaging techniques.
Three experiments were conducted to study on a more fine-grained level how processing a picture facilitates learning from text. In Experiment 1 (N = 85), results from a drawing task revealed that the ...global spatial structure of a pulley system picture was extracted even from its brief inspection (for 600 ms, 2 s). In Experiment 2 (N = 105), students who initially inspected the pulley system picture (for 600 ms, 2 s, or self-paced) had better comprehension of the system's functions and made more eye movements in line with the system's global spatial structure when listening to text than students who listened to text only. In Experiment 3 (N = 39), students who first saw the picture (for 2 s) processed written text of the pulley system's spatial structure more efficiently than students who read text only. Results suggest that global spatial information extracted from the picture was used as a mental scaffold to facilitate mental model construction.
•Briefly inspecting a picture allows extracting its global spatial structure (GSS).•The GSS acts as a mental scaffold during text comprehension.•The GSS facilitates text comprehension as reflected in shorter reading times.•Eye movements indicate that the GSS is reactivated during text processing.•Availability of the GSS prior to reading text supports mental model construction.