Neurophysiological investigation of neural processes are hindered by the presence of large artifacts associated with eye movement. Although blind source separation (BSS)‐based hybrid algorithms are ...useful for separating, identifying, and removing these artifacts from EEG, it remains unexamined to what extent neural signals can remain mixed with these artifact components, potentially resulting in unintended removal of critical neural signals. Here, we present a novel validation approach to quantitatively evaluate to what extent horizontal and vertical saccadic eye movement‐related artifact components (H and V Comps) are indeed ocular in origin. To automate the identification of the H and V Comps recovered by the second‐order blind identification (SOBI), we introduced a novel Discriminant ANd Similarity (DANS)‐based method. Through source localization, we showed that over 95% of variance in the SOBI‐DANS identified H and V Comps’ scalp projections were ocular in origin. Through the analysis of saccade‐related potentials (SRPs), we found that the H and V Comps’ SRP amplitudes were finely modulated by eye movement direction and distance jointly. SOBI‐DANS’ component selection was in 100% agreement with human experts’ selection and was 100% successful in component identification across all participants indicating a high cross‐individual consistency or robustness. These results set the stage for future work to transform the to‐be‐thrown‐away artifacts into signals indicative of gaze position, thereby providing readily co‐registered eye movement and neural signal without using a separate eye tracker.
We validated the SOBI‐DANS method for extracting artifact components associated with horizontal and vertical eye movements from EEG. We (1) set an upper bound on the maximum leakage of neural signals into these components; (2) quantified modulation of these components by saccade direction and distance; (3) demonstrated cross‐individual consistency; (4) raised the possibility of transforming EEG artifacts into signals of gaze position. The present study offers a starting pointing for future development of EEG‐based virtual eye tracking applications.
How people look at visual information reveals fundamental information about them; their interests and their states of mind. Previous studies showed that scanpath, i.e., the sequence of eye movements ...made by an observer exploring a visual stimulus, can be used to infer observer-related (e.g., task at hand) and stimuli-related (e.g., image semantic category) information. However, eye movements are complex signals and many of these studies rely on limited gaze descriptors and bespoke datasets. Here, we provide a turnkey method for scanpath modeling and classification. This method relies on variational hidden Markov models (HMMs) and discriminant analysis (DA). HMMs encapsulate the dynamic and individualistic dimensions of gaze behavior, allowing DA to capture systematic patterns diagnostic of a given class of observers and/or stimuli. We test our approach on two very different datasets. Firstly, we use fixations recorded while viewing 800 static natural scene images, and infer an observer-related characteristic: the task at hand. We achieve an average of 55.9% correct classification rate (chance = 33%). We show that correct classification rates positively correlate with the number of salient regions present in the stimuli. Secondly, we use eye positions recorded while viewing 15 conversational videos, and infer a stimulus-related characteristic: the presence or absence of original soundtrack. We achieve an average 81.2% correct classification rate (chance = 50%). HMMs allow to integrate bottom-up, top-down, and oculomotor influences into a single model of gaze behavior. This synergistic approach between behavior and machine learning will open new avenues for simple quantification of gazing behavior. We release
SMAC with HMM
, a Matlab toolbox freely available to the community under an open-source license agreement.
Recent research has suggested that dynamic emotion recognition involves strong audiovisual association; that is, facial or vocal information alone automatically induces perceptual processes in the ...other modality. We hypothesized that different emotions may differ in the automaticity of audiovisual association, resulting in differential audiovisual information processing. Participants judged the emotion of a talking-head video under audiovisual, video-only (with no sound), and audio-only (with a static neutral face) conditions. Among the six basic emotions, disgust had the largest audiovisual advantage over the unimodal conditions in recognition accuracy. In addition, in the recognition of all the emotions except for disgust, participants' eye-movement patterns did not change significantly across the three conditions, suggesting mandatory audiovisual information processing. In contrast, in disgust recognition, participants' eye movements in the audiovisual condition were less eyes-focused than the video-only condition and more eyes-focused than the audio-only condition, suggesting that audio information in the audiovisual condition interfered with eye-movement planning for important features (eyes) for disgust. In addition, those whose eye-movement pattern was affected less by concurrent disgusted voice information benefited more in recognition accuracy. Disgust recognition is learned later in life and thus may involve a reduced amount of audiovisual associative learning. Consequently, audiovisual association in disgust recognition is less automatic and demands more attentional resources than other emotions. Thus, audiovisual information processing in emotion recognition depends on the automaticity of audiovisual association of the emotion resulting from associative learning. This finding has important implications for real-life emotion recognition and multimodal learning.
In face recognition, looking at the eyes has been associated with engagement of local attention, as well as better recognition performance. As recent research has suggested negative mood facilitates ...local attention while positive mood facilitates global attention, negative mood changes may lead to more eyes-focused eye movement patterns and consequently enhance recognition performance. Here we test this hypothesis using mood induction. Through eye movement analysis with hidden Markov models, we discovered eyes-focused and nose-focused eye movement strategies in the participants, and the eyes-focused strategy was associated with better recognition performance. During the recognition phase, participants with a negative mood change had increased eye movement pattern similarity to the eyes-focused strategy, and participants' mood change was correlated with eye movement pattern similarity change. Nevertheless, mood change did not significantly change participants' eye movement strategy classification despite changes in eye movement pattern similarity, and the eye movement pattern similarity change did not modulate recognition performance. These results suggest that mood changes through mood induction lead to slight changes in eye movement pattern that may not be sufficient to modulate recognition performance. Thus, individuals may have preferred eye movement strategies in face recognition impervious to transitory mood changes. This finding is consistent with a recent speculation on limited plasticity in adult face recognition and suggests that eye movements in face recognition may provide reliable information about an individual's cognitive abilities.
Recent research has suggested the importance of part-based information in face recognition in addition to global, whole-face information. Nevertheless, face drawing experience was reported to enhance ...selective attention to the eyes but did not improve face recognition performance, leading to speculations about limited plasticity in adult face recognition. Here we examined the mechanism underlying the limited advantage of face drawing experience in face recognition through the Eye Movement analysis with Hidden Markov Models (EMHMM) approach. We found that portrait artists showed more eyes-focused eye movement patterns and outperformed novices in face matching, and participants' drawing rating was correlated with both eye movement pattern and performance. In contrast, portrait artists did not outperform novices and did not differ from novices in eye movement pattern in either the face recognition or part-whole tasks, although the eyes-focused pattern was associated with better recognition performance and longer response times in the whole condition relative to the part condition. Interestingly, in contrast to the face recognition and part-whole tasks, participants' performance in face matching was predicted by their drawing rating but not eye movement pattern. These results suggested that artists' advantage in face processing is specific to tasks similar to their drawing experience such as face matching, and may be related to their better ability in extracting identity-invariant information between two faces rather than more eyes-focused eye movement patterns.
Using background music (BGM) during learning is a common behavior, yet whether BGM can facilitate or hinder learning remains inconclusive and the underlying mechanism is largely an open question. ...This study aims to elucidate the effect of self-selected BGM on reading task for learners with different characteristics. Particularly, learners' reading task performance, metacognition, and eye movements were examined, in relation to their personal traits including language proficiency, working memory capacity, music experience and personality. Data were collected from a between-subject experiment with 100 non-native English speakers who were randomly assigned into two groups. Those in the experimental group read English passages with music of their own choice played in the background, while those in the control group performed the same task in silence. Results showed no salient differences on passage comprehension accuracy or metacognition between the two groups. Comparisons on fine-grained eye movement measures reveal that BGM imposed heavier cognitive load on post-lexical processes but not on lexical processes. It was also revealed that students with higher English proficiency level or more frequent BGM usage in daily self-learning/reading experienced less cognitive load when reading with their BGM, whereas students with higher working memory capacity (WMC) invested more mental effort than those with lower WMC in the BGM condition. These findings further scientific understanding of how BGM interacts with cognitive tasks in the foreground, and provide practical guidance for learners and learning environment designers on making the most of BGM for instruction and learning.
We propose the gradient-weighted Object Detector Activation Maps (ODAM), a visual explanation technique for interpreting the predictions of object detectors. Utilizing the gradients of detector ...targets flowing into the intermediate feature maps, ODAM produces heat maps that show the influence of regions on the detector's decision for each predicted attribute. Compared to previous works on classification activation maps (CAM), ODAM generates instance-specific explanations rather than class-specific ones. We show that ODAM is applicable to one-stage, two-stage, and transformer-based detectors with different types of detector backbones and heads, and produces higher-quality visual explanations than the state-of-the-art in terms of both effectiveness and efficiency. We discuss two explanation tasks for object detection: 1) object specification: what is the important region for the prediction? 2) object discrimination: which object is detected? Aiming at these two aspects, we present a detailed analysis of the visual explanations of detectors and carry out extensive experiments to validate the effectiveness of the proposed ODAM. Furthermore, we investigate user trust on the explanation maps, how well the visual explanations of object detectors agrees with human explanations, as measured through human eye gaze, and whether this agreement is related with user trust. Finally, we also propose two applications, ODAM-KD and ODAM-NMS, based on these two abilities of ODAM. ODAM-KD utilizes the object specification of ODAM to generate top-down attention for key predictions and instruct the knowledge distillation of object detection. ODAM-NMS considers the location of the model's explanation for each prediction to distinguish the duplicate detected objects. A training scheme, ODAM-Train, is proposed to improve the quality on object discrimination, and help with ODAM-NMS. The code of ODAM is available: https://github.com/Cyang-Zhao/ODAM .
Explainable artificial intelligence (XAI) has been increasingly investigated to enhance the transparency of black-box artificial intelligence models, promoting better user understanding and trust. ...Developing an XAI that is faithful to models and plausible to users is both a necessity and a challenge. This work examines whether embedding human attention knowledge into saliency-based XAI methods for computer vision models could enhance their plausibility and faithfulness. Two novel XAI methods for object detection models, namely FullGrad-CAM and FullGrad-CAM++, were first developed to generate object-specific explanations by extending the current gradient-based XAI methods for image classification models. Using human attention as the objective plausibility measure, these methods achieve higher explanation plausibility. Interestingly, all current XAI methods when applied to object detection models generally produce saliency maps that are less faithful to the model than human attention maps from the same object detection task. Accordingly, human attention-guided XAI (HAG-XAI) was proposed to learn from human attention how to best combine explanatory information from the models to enhance explanation plausibility by using trainable activation functions and smoothing kernels to maximize the similarity between XAI saliency map and human attention map. The proposed XAI methods were evaluated on widely used BDD-100K, MS-COCO, and ImageNet datasets and compared with typical gradient-based and perturbation-based XAI methods. Results suggest that HAG-XAI enhanced explanation plausibility and user trust at the expense of faithfulness for image classification models, and it enhanced plausibility, faithfulness, and user trust simultaneously and outperformed existing state-of-the-art XAI methods for object detection models.
•Human attention guided XAI is proposed for more faithful and plausible explanations.•Two gradient-based XAI methods are presented for explaining object detection models.•Human attention is adopted as an objective plausibility measure for XAI evaluation.•The generalization ability and robustness of the proposed XAI methods are evaluated.
•More people showed holistic scan patterns during face learning than recognition.•Analytic patterns were associated with better recognition performance.•About 40% of the participants used different ...patterns for learning and recognition.•Pattern similarity between learning and recognition did not predict performance.
The hidden Markov model (HMM)-based approach for eye movement analysis is able to reflect individual differences in both spatial and temporal aspects of eye movements. Here we used this approach to understand the relationship between eye movements during face learning and recognition, and its association with recognition performance. We discovered holistic (i.e., mainly looking at the face center) and analytic (i.e., specifically looking at the two eyes in addition to the face center) patterns during both learning and recognition. Although for both learning and recognition, participants who adopted analytic patterns had better recognition performance than those with holistic patterns, a significant positive correlation between the likelihood of participants’ patterns being classified as analytic and their recognition performance was only observed during recognition. Significantly more participants adopted holistic patterns during learning than recognition. Interestingly, about 40% of the participants used different patterns between learning and recognition, and among them 90% switched their patterns from holistic at learning to analytic at recognition. In contrast to the scan path theory, which posits that eye movements during learning have to be recapitulated during recognition for the recognition to be successful, participants who used the same or different patterns during learning and recognition did not differ in recognition performance. The similarity between their learning and recognition eye movement patterns also did not correlate with their recognition performance. These findings suggested that perceptuomotor memory elicited by eye movement patterns during learning does not play an important role in recognition. In contrast, the retrieval of diagnostic information for recognition, such as the eyes for face recognition, is a better predictor for recognition performance.
The eye movement analysis with hidden Markov models (EMHMM) method provides quantitative measures of individual differences in eye-movement pattern. However, it is limited to tasks where stimuli have ...the same feature layout (e.g., faces). Here we proposed to combine EMHMM with the data mining technique co-clustering to discover participant groups with consistent eye-movement patterns across stimuli for tasks involving stimuli with different feature layouts. Through applying this method to eye movements in scene perception, we discovered explorative (switching between the foreground and background information or different regions of interest) and focused (mainly looking at the foreground with less switching) eye-movement patterns among Asian participants. Higher similarity to the explorative pattern predicted better foreground object recognition performance, whereas higher similarity to the focused pattern was associated with better feature integration in the flanker task. These results have important implications for using eye tracking as a window into individual differences in cognitive abilities and styles. Thus, EMHMM with co-clustering provides quantitative assessments on eye-movement patterns across stimuli and tasks. It can be applied to many other real-life visual tasks, making a significant impact on the use of eye tracking to study cognitive behavior across disciplines.