In recent years, Intelligent Personal Assistants (IPAs) have emerged as important tools in human–computer interaction, with a wide range of applications such as voice assistant, virtual customer ...service, and navigation. Capturing and understanding the prominent emotional needs of users is important for improving the quality of service of IPAs. Multimodal emotion recognition in conversation (MMERC) aimed at automatically identifying and tracking the emotional states of speakers during the dialogue process has become a crucial component for building emotional IPAs and attracted increasing attention. Current research in this field is based on graph simulation for cross-modal and single-modal interactions. However, these methods ignore the highly imbalanced class problem inherent in MMERC, leading to a decrease in the generalization ability of the model and an inability to effectively recognize minority emotion classes. Data mining methods use oversampling to solve the imbalanced classification, but they are unsuitable for MMERC as they disrupt the conversational coherence and modality alignment characteristics of multimodal emotion recognition datasets. To overcome these problems, this paper proposes an IMBA-MMERC, which is an effective framework to address the pervasive issue of class imbalance in MMERC. Within this framework, sample generation for multimodal conversation tackles the application challenges that exist in multimodal conversational emotion recognition datasets, and well-classified encouraging loss mitigates the performance degradation of the model on certain majority classes due to decision boundary deviations. On two English benchmark datasets and one Chinese public dataset, we used two performance indicators to demonstrate the effectiveness and superiority of the proposed IMBA-MMERC. Ablation experiment, case study, and histograms visualization further verify the well performance of the proposed framework.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
So far, multi-intent spoken language understanding (SLU) has become a research hotspot in the field of natural language processing (NLP) due to its ability to recognize and extract multiple intents ...expressed and annotate corresponding sequence slot tags within a single utterance. Previous research has primarily concentrated on the token-level intent-slot interaction to model joint intent detection and slot filling, which resulted in a failure to fully utilize anisotropic intent-guiding information during joint training. In this work, we present a novel architecture by modeling the multi-intent SLU as a multi-view intent-slot interaction. The architecture resolves the kernel bottleneck of unified multi-intent SLU by effectively modeling the intent-slot relations with utterance, chunk, and token-level interaction. We further develop a neural framework, namely Uni-MIS, in which the unified multi-intent SLU is modeled as a three-view intent-slot interaction fusion to better capture the interaction information after special encoding. A chunk-level intent detection decoder is used to sufficiently capture the multi-intent, and an adaptive intent-slot graph network is used to capture the fine-grained intent information to guide final slot filling. We perform extensive experiments on two widely used benchmark datasets for multi-intent SLU, where our model bets on all the current strong baselines, pushing the state-of-the-art performance of unified multi-intent SLU. Additionally, the ChatGPT benchmark that we have developed demonstrates that there is a considerable amount of potential research value in the field of multi-intent SLU.
This research signifies a considerable breakthrough in leveraging Large Language Models (LLMs) for multi-intent spoken language understanding (SLU). Our approach re-imagines the use of entity slots ...in multi-intent SLU applications, making the most of the generative potential of LLMs within the SLU landscape, leading to the development of the EN-LLM series. Furthermore, we introduce the concept of Sub-Intent Instruction (SII) to amplify the analysis and interpretation of complex, multi-intent communications, which further supports the creation of the ENSI-LLM models series. Our novel datasets, identified as LM-MixATIS and LM-MixSNIPS, are synthesized from existing benchmarks. The study evidences that LLMs may match or even surpass the performance of the current best multi-intent SLU models. We also scrutinize the performance of LLMs across a spectrum of intent configurations and dataset distributions. On top of this, we present two revolutionary metrics - Entity Slot Accuracy (ESA) and Combined Semantic Accuracy (CSA) - to facilitate a detailed assessment of LLM competence in this multifaceted field." Our code and datasets are available at \url{https://github.com/SJY8460/SLM}.