Multi-head attention is appealing for the ability to jointly attend to information from different representation subspaces at different positions. In this work, we propose two approaches to better ...exploit such diversity for multi-head attention, which are complementary to each other. First, we introduce a disagreement regularization to explicitly encourage the diversity among multiple attention heads. Specifically, we propose three types of disagreement regularization, which respectively encourage the subspace, the attended positions, and the output representation associated with each attention head to be different from other heads. Second, we propose to better capture the diverse information distributed in the extracted partial-representations with the routing-by-agreement algorithm. The routing algorithm iteratively updates the proportion of how much a part (i.e. the distinct information learned from a specific subspace) should be assigned to a whole (i.e. the final output representation), based on the agreement between parts and wholes. Experimental results on the machine translation, sentence encoding and logical inference tasks demonstrate the effectiveness and universality of the proposed approaches, which indicate the necessity of better exploiting the diversity for multi-head attention. While the two strategies individually boost performance, combining them together can further improve the model performance.
In the two experiments reported here, we uncovered evidence for shared structural representations between arithmetic and language. Specifically, we primed subjects using mathematical equations either ...with or without parenthetical groupings, such as 80 -(9 + 1) x 5 or 80 -9 + 1* 5, and then presented a target sentence fragment, such as "The tourist guide mentioned the bells of the church that...," which subjects had to complete. When the mathematical equations were solved correctly, their structure influenced the noun phrase— for example, either "the bells of the church" or "the church," respectively— that subjects chose to attach their sentence completion to. These experiments provide the first demonstration of cross-domain structural priming from mathematics to language. They highlight the importance of global structural representations at a very high level of abstraction and have potentially far-reaching implications regarding the domain generality of structural representations.
The potential benefits of automatic radiology report generation, such as reducing misdiagnosis rates and enhancing clinical diagnosis efficiency, are significant. However, existing data-driven ...methods lack essential medical prior knowledge, which hampers their performance. Moreover, establishing global correspondences between radiology images and related reports, while achieving local alignments between images correlated with prior knowledge and text, remains a challenging task. To address these shortcomings, we introduce a novel Eye Gaze Guided Cross-modal Alignment Network (EGGCA-Net) for generating accurate medical reports. Our approach incorporates prior knowledge from radiologists' Eye Gaze Region (EGR) to refine the fidelity and comprehensibility of report generation. Specifically, we design a Dual Fine-Grained Branch (DFGB) and a Multi-Task Branch (MTB) to collaboratively ensure the alignment of visual and textual semantics across multiple levels. To establish fine-grained alignment between EGR-related images and sentences, we introduce the Sentence Fine-grained Prototype Module (SFPM) within DFGB to capture cross-modal information at different levels. Additionally, to learn the alignment of EGR-related image topics, we introduce the Multi-task Feature Fusion Module (MFFM) within MTB to refine the encoder output information. Finally, a specifically designed label matching mechanism is designed to generate reports that are consistent with the anticipated disease states. The experimental outcomes indicate that the introduced methodology surpasses previous advanced techniques, yielding enhanced performance on two extensively used benchmark datasets: Open-i and MIMIC-CXR.
Listeners can successfully interpret the intended meaning of an utterance even when it contains errors or other unexpected anomalies. The present work combines an online measure of attention to ...sentence referents (visual world eye-tracking) with offline judgments of sentence meaning to disclose how the interpretation of anomalous sentences unfolds over time in order to explore mechanisms of non-literal processing. We use a metalinguistic judgment in Experiment 1 and an elicited imitation task in Experiment 2. In both experiments, we focus on one morphosyntactic anomaly (Subject-verb agreement; The key to the cabinets literally *were ... ) and one semantic anomaly (Without; Lulu went to the gym without her hat ?off) and show that non-literal referents to each are considered upon hearing the anomalous region of the sentence. This shows that listeners understand anomalies by overwriting or adding to an initial interpretation and that this occurs incrementally and adaptively as the sentence unfolds.
The study of word-to-text integration (WTI) provides a window on incremental processes that link the meaning of a word to the preceding text. We review a research program using event-related ...potential indicators of WTI at sentence beginnings, thus localizing sources of integration to prior text meaning independently of the current sentence. The results led to the following conclusions. First, integration occurs when the word being read cues the retrieval of a text meaning from memory. Second, when the word does not cue retrieval, new structure building rather than integration is the default at sentence beginnings. Third, integration depends on a highly accessible text memory. The immediate preceding sentence provides the primary source for integration; however, instructions that encourage attention to thematic elements enable influences of global text meaning. Finally, contrary to the role that prediction may play in comprehension generally, prediction has a limited role in WTI at sentence beginnings.
Dementia is a cognitive decline that leads to the progressive deterioration of an individual's ability to perform daily activities independently. As a result, a considerable amount of time and ...resources are spent on caretaking. Early detection of dementia can significantly reduce the effort and resources needed for caretaking.
This research proposes an approach for assessing cognitive decline by analysing speech data, specifically focusing on speech relevance as a crucial indicator for memory recall.
This is a cross-sectional, online, self-administered. The proposed method used deep learning architecture based on transformers, with BERT (Bidirectional Encoder Representations from Transformers) and Sentence-Transformer to derive encoded representations of speech transcripts. These representations provide contextually descriptive information that is used to analyse the relevance of sentences in their respective contexts. The encoded information is then compared using cosine similarity metrics to measure the relevance of uttered sequences of sentences. The study uses the Pitt Corpus Dementia dataset for experimentation, which consists of speech data from individuals with and without dementia. The accuracy of the proposed multi-QA-MPNet (Multi-Query Maximum Inner Product Search Pretraining) model is compared with other pretrained transformer models of Sentence-Transformer.
The results show that the proposed approach outperforms the other models in capturing context level information, particularly semantic memory. Additionally, the study explores the suitability of different similarity measures to evaluate the relevance of uttered sequences of sentences. The experimentation reveals that cosine similarity is the most appropriate measure for this task.
This finding has significant implications for the early warning signs of dementia, as it suggests that cosine similarity metrics can effectively capture the semantic relevance of spoken language. The persistent cognitive decline over time acts as one of the indicators for prevalence of dementia. Additionally early dementia could be recognised by analysis on other modalities like speech and brain images.
What is already known on this subject It is already known that speech- and language-based detection methods can be useful for dementia diagnosis, as language difficulties are often early signs of the disease. Additionally, deep learning algorithms have shown promise in detecting and diagnosing dementia through analysing large datasets, particularly in speech- and language-based detection methods. However, further research is needed to validate the performance of these algorithms on larger and more diverse datasets and to address potential biases and limitations. What this paper adds to existing knowledge This study presents a unique and effective approach for cognitive decline assessment through analysing speech data. The study provides valuable insights into the importance of context and semantic memory in accurately detecting the potential in dementia and demonstrates the applicability of deep learning models for this purpose. The findings of this study have important clinical implications and can inform future research and development in the field of dementia detection and care. What are the potential or actual clinical implications of this work? The proposed approach for cognitive decline assessment using speech data and deep learning models has significant clinical implications. It has the potential to improve the accuracy and efficiency of dementia diagnosis, leading to earlier detection and more effective treatments, which can improve patient outcomes and quality of life.
•We propose a query-aware video encoder to selectively emphasize visual features.•We learn hierarchical and structural query clues to guide the video encoding.•We achieve the state-of-the-art on the ...Charades-STA and TACoS datasets.
Given an untrimmed video and a sentence query, video moment retrieval is to locate a target video moment that semantically corresponds to the query. It is a challenging task that requires a joint understanding of natural language queries and video contents. However, video contains complex contents, including query-related and query-irrelevant contents, which brings difficulty for the joint understanding. To this end, we propose a query-aware video encoder to capture the query-related visual contents. Specifically, we design a query-guided block following each encoder layer to recalibrate the encoded visual features according to the query semantics. The core of query-guided block is a channel-level attention gating mechanism, which could selectively emphasize query-related visual contents and suppress query-irrelevant ones. Besides, to fully match with different levels of contents in videos, we learn hierarchical and structural query clues to guide the visual content capturing. We disentangle sentence query into a semantics graph and capture the local contexts inside the graph via a trilinear model as query clues. Extensive experiments on Charades-STA and TACoS datasets demonstrate the effectiveness of our approach, and we achieve the state-of-the-art on the two datasets.
► We report two eye-tracking studies on expectations & locality in German verb processing. ► We eliminate confounds in expectation-based facilitation from preverbal dependents. ► We find new evidence ...of locality effects when memory load is high. ► These results constrain theories of expectations & memory in sentence comprehension.
Probabilistic expectations and memory limitations are central factors governing the real-time comprehension of natural language, but how the two factors interact remains poorly understood. One respect in which the two factors have come into theoretical conflict is the documentation of both locality effects, in which having more dependents preceding a governing verb increases processing difficulty at the verb, and anti-locality effects, in which having more preceding dependents facilitates processing at the verb. However, no controlled study has previously demonstrated both locality and anti-locality effects in the same type of dependency relation within the same language. Additionally, many previous demonstrations of anti-locality effects have been potentially confounded with lexical identity, plausibility, and sentence position. Here, we provide new evidence of both locality and anti-locality effects in the same type of dependency relation in a single language—verb-final constructions in German—while controlling for lexical identity, plausibility, and sentence position. In main clauses, we find clear anti-locality effects, with the presence of a preceding dative argument facilitating processing at the final verb; in subject-extracted relative clauses with identical linear ordering of verbal dependents, we find both anti-locality and locality effects, with processing facilitated when the verb is preceded by a dative argument alone, but hindered when the verb is preceded by both the dative argument and an adjunct. These results indicate that both expectations and memory limitations need to be accounted for in any complete theory of online syntactic comprehension.
It is well known that sentences containing object-extracted relative clauses (e.g.,
The reporter that the senator attacked admitted the error) are more difficult to comprehend than sentences ...containing subject-extracted relative clauses (e.g.,
The reporter that attacked the senator admitted the error). Two major accounts of this phenomenon make different predictions about where, in the course of incremental processing of an object relative, difficulty should first appear. An account emphasizing memory processes (
Gibson, 1998; Grodner & Gibson, 2005) predicts difficulty at the relative clause verb, while an account emphasizing experience-based expectations (
Hale, 2001; Levy, 2008) predicts earlier difficulty, at the relative clause subject. Two eye movement experiments tested these predictions. Regressive saccades were much more likely from the subject noun phrase of an object relative than from the same noun phrase occurring within a subject relative (Experiment 1) or within a verbal complement clause (Experiment 2). This effect was further amplified when the relative pronoun
that was omitted. However, reading time was also inflated on the object relative clause verb in both experiments. These results suggest that the violation of expectations and the difficulty of memory retrieval both contribute to the difficulty of object relative clauses, but that these two sources of difficulty have qualitatively distinct behavioral consequences in normal reading.