The recent advancements in multimodal dialogue systems have been gaining importance in several domains such as retail, travel, fashion, among others. Several existing works have improved the ...understanding and generation of multimodal dialogues. However, there still exists considerable space to improve the quality of output textual responses due to insufficient information infusion between the visual and textual semantics. Moreover, the existing dialogue systems often generate defective knowledge-aware responses for tasks such as providing product attributes and celebrity endorsements. To address the aforementioned issues, we present a Transformer-based Multimodal Infusion Dialogue (TMID) system that extracts the visual and textual information from dialogues via a transformer-based multimodal context encoder and employs a cross-attention mechanism to achieve information infusion between images and texts for each utterance. Furthermore, TMID uses adaptive decoders to generate appropriate multimodal responses based on the user intentions it has determined using a state classifier and enriches the output responses by incorporating domain knowledge into the decoders. The results of extensive experiments on a multimodal dialogue dataset demonstrate that TMID has achieved a state-of-the-art performance by improving the BLUE-4 score by 13.03, NIST by 2.77, image selection Recall@1 by 1.84%.
Dear Editor, This letter proposes an unsupervised entity alignment method, which realizes integration of multiple multi-modal knowledge graphs adaptively.
Amidst the critical role that high-quality labeled data plays in advancing machine learning, the persistence of noise within widely-used datasets remains a challenge. While noise learning has gained ...traction within machine learning, particularly in computer vision, its exploration in text and multimodal classification domains has lagged. Furthermore, a comprehensive comparison of noise learning techniques in text and multimodal classification has been lacking, partly due to variations in experimental noise settings across prior studies. Addressing these gaps, this research introduces a pioneering Multimodal Infusion Joint Training (MinJoT) framework featuring a novel co-regularized loss function that seamlessly integrates multimodal information during joint training. This framework notably excels in maintaining model robustness amidst noisy data environments. Adapting established noise learning methods from computer vision to text classification, the study conducts extensive experiments across five English and Chinese textual and multimodal datasets, involving four methods, five noise modes, and seven noise rates. Critically, this work challenges the implicit assumption that widely-used datasets are devoid of noise, revealing that these datasets indeed encompass noise levels ranging from 0.61% to 15.77% which is defined as intrinsic noise in this study. For the first time, the study investigates the impact of intrinsic noise on model performance, categorizing it into distinct levels of ambiguity. To facilitate accurate method comparison, a new dataset, Golden-Chnsenticorp (G-Chnsenticorp), is introduced, carefully crafted to be free of intrinsic noise. This research establishes the inaugural noise learning benchmark for text classification, while simultaneously pioneering the first noise learning framework tailored for multimodal sentiment classification. Through these contributions, the study advances the understanding of noise learning in text and multimodal contexts, providing a novel framework, benchmarks, and insights to propel the field forward.
•Introducing MinJoT, a novel noise learning framework for multimodal classification.•Studying intrinsic noise in text classification and its impacts.•Creating G-Chnsenticorp, a noise-free dataset, for reliable research in this area.
Graphs are powerful representations for relations among objects, which have attracted plenty of attention. A fundamental challenge for graph learning is how to train an effective Graph Neural Network ...(GNN) encoder without labels, which are expensive and time consuming to obtain. Contrastive Learning (CL) is one of the most popular paradigms to address this challenge, which trains GNNs by discriminating positive and negative node pairs. Despite the success of recent CL methods, there are still two under-explored problems. First, how to reduce the semantic error introduced by random topology based data augmentations. Traditional CL defines positive and negative node pairs via the node-level topological proximity, which is solely based on the graph topology regardless of the semantic information of node attributes, and thus some semantically similar nodes could be wrongly treated as negative pairs. Second, how to effectively model the multiplexity of the real-world graphs, where nodes are connected by various relations and each relation could form a homogeneous graph layer. To solve these problems, we propose a novel multiplex heterogeneous graph prototypical contrastive leaning (X-GOAL) framework to extract node embeddings. X-GOAL is comprised of two components: the GOAL framework, which learns node embeddings for each homogeneous graph layer, and an alignment regularization, which jointly models different layers by aligning layer-specific node embeddings. Specifically, the GOAL framework captures the node-level information by a succinct graph transformation technique, and captures the cluster-level information by pulling nodes within the same semantic cluster closer in the embedding space. The alignment regularization aligns embeddings across layers at both node and cluster levels. We evaluate X-GOAL on various real-world datasets and downstream tasks to demonstrate its effectiveness.
Entity alignment (EA) for knowledge graphs (KGs) plays a critical role in knowledge engineering. Existing EA methods mostly focus on utilizing the graph structures and entity attributes (including ...literals), but ignore images that are common in modern multi-modal KGs. In this study we first constructed Multi-OpenEA -- eight large-scale, image-equipped EA benchmarks, and then evaluated some existing embedding-based methods for utilizing images. In view of the complementary nature of visual modal information and logical deduction, we further developed a new multi-modal EA method named LODEME using logical deduction and multi-modal KG embedding, with state-of-the-art performance achieved on Multi-OpenEA and other existing multi-modal EA benchmarks.
Knowledge Graph (KG) alignment is to discover the mappings (i.e., equivalent entities, relations, and others) between two KGs. The existing methods can be divided into the embedding-based models, and ...the conventional reasoning and lexical matching based systems. The former compute the similarity of entities via their cross-KG embeddings, but they usually rely on an ideal supervised learning setting for good performance and lack appropriate reasoning to avoid logically wrong mappings; while the latter address the reasoning issue but are poor at utilizing the KG graph structures and the entity contexts. In this study, we aim at combining the above two solutions and thus propose an iterative framework named PRASE which is based on probabilistic reasoning and semantic embedding. It learns the KG embeddings via entity mappings from a probabilistic reasoning system named PARIS, and feeds the resultant entity mappings and embeddings back into PARIS for augmentation. The PRASE framework is compatible with different embedding-based models, and our experiments on multiple datasets have demonstrated its state-of-the-art performance.
Semantic embedding has been widely investigated for aligning knowledge graph (KG) entities. Current methods have explored and utilized the graph structure, the entity names and attributes, but ignore ...the ontology (or ontological schema) which contains critical meta information such as classes and their membership relationships with entities. In this paper, we propose an ontology-guided entity alignment method named OntoEA, where both KGs and their ontologies are jointly embedded, and the class hierarchy and the class disjointness are utilized to avoid false mappings. Extensive experiments on seven public and industrial benchmarks have demonstrated the state-of-the-art performance of OntoEA and the effectiveness of the ontologies.
Entity alignment (EA) for knowledge graphs (KGs) plays a critical role in knowledge engineering. Existing EA methods mostly focus on utilizing the graph structures and entity attributes (including ...literals), but ignore images that are common in modern multi-modal KGs. In this study we first constructed Multi-OpenEA - eight large-scale, image-equipped EA benchmarks, and then evaluated some existing embedding-based methods for utilizing images. In view of the complementary nature of visual modal information and logical deduction, we further developed a new multi-modal EA method named LODEME using logical deduction and multi-modal KG embedding, with state-of-the-art performance achieved on Multi-OpenEA and other existing multi-modal EA benchmarks.
Embedding-based entity alignment has been widely investigated in recent years, but most proposed methods still rely on an ideal supervised learning setting with a large number of unbiased seed ...mappings for training and validation, which significantly limits their usage. In this study, we evaluate those state-of-the-art methods in an industrial context, where the impact of seed mappings with different sizes and different biases is explored. Besides the popular benchmarks from DBpedia and Wikidata, we contribute and evaluate a new industrial benchmark that is extracted from two heterogeneous knowledge graphs (KGs) under deployment for medical applications. The experimental results enable the analysis of the advantages and disadvantages of these alignment methods and the further discussion of suitable strategies for their industrial deployment.
X-GOAL Jing, Baoyu; Feng, Shengyu; Xiang, Yuejia ...
Proceedings of the 31st ACM International Conference on Information & Knowledge Management,
10/2022
Conference Proceeding
Odprti dostop
Graphs are powerful representations for relations among objects, which have attracted plenty of attention in both academia and industry. A fundamental challenge for graph learning is how to train an ...effective Graph Neural Network (GNN) encoder without labels, which are expensive and time consuming to obtain. Contrastive Learning (CL) is one of the most popular paradigms to address this challenge, which trains GNNs by discriminating positive and negative node pairs. Despite the success of recent CL methods, there are still two under-explored problems. Firstly, how to reduce the semantic error introduced by random topology based data augmentations. Traditional CL defines positive and negative node pairs via the node-level topological proximity, which is solely based on the graph topology regardless of the semantic information of node attributes, and thus some semantically similar nodes could be wrongly treated as negative pairs. Secondly, how to effectively model the multiplexity of the real-world graphs, where nodes are connected by various relations and each relation could form a homogeneous graph layer. To solve these problems, we propose a novel multiplex heterogeneous graph prototypical contrastive leaning (X-GOAL) framework to extract node embeddings. X-GOAL is comprised of two components: the GOAL framework, which learns node embeddings for each homogeneous graph layer, and an alignment regularization, which jointly models different layers by aligning layer-specific node embeddings. Specifically, the GOAL framework captures the node-level information by a succinct graph transformation technique, and captures the cluster-level information by pulling nodes within the same semantic cluster closer in the embedding space. The alignment regularization aligns embeddings across layers at both node level and cluster level. We evaluate the proposed X-GOAL on a variety of real-world datasets and downstream tasks to demonstrate the effectiveness of the X-GOAL framework.