•Syntactically-informed representations based on static and/or contextual representations.•Pre-encode syntactic information with automatically annotated data.•Improve performance over base ...representations on three information extraction tasks.•Syntactic dependencies can be beneficial for both static and contextual embeddings.•Easily adapt in different linguistic tasks than fine-tuning large models.
Most deep language understanding models depend only on word representations, which are mainly based on language modelling derived from a large amount of raw text. These models encode distributional knowledge without considering syntactic structural information, although several studies have shown benefits of including such information. Therefore, we propose new syntactically-informed word representations (SIWRs), which allow us to enrich the pre-trained word representations with syntactic information without training language models from scratch. To obtain SIWRs, a graph-based neural model is built on top of either static or contextualised word representations such as GloVe, ELMo and BERT. The model is first pre-trained with only a relatively modest amount of task-independent data that are automatically annotated using existing syntactic tools. SIWRs are then obtained by applying the model to downstream task data and extracting the intermediate word representations. We finally replace word representations in downstream models with SIWRs for applications. We evaluate SIWRs on three information extraction tasks, namely nested named entity recognition (NER), binary and n-ary relation extractions (REs). The results demonstrate that our SIWRs yield performance gains over the base representations in these NLP tasks with 3–9% relative error reduction. Our SIWRs also perform better than fine-tuning BERT in binary RE. We also conduct extensive experiments to analyse the proposed method.
Multimodal models have been proven to outperform text-based models on learning semantic word representations. According to psycholinguistic theory, there is a graphical relationship among the ...modalities of language, and in recent years, the graph convolution network (GCN) has been proven to have substantial advantages in the extraction of non-European spatial features. This inspires us to propose a new multimodal word representation model, namely, GCNW, which uses the graph convolutional network to incorporate the phonetic and syntactic information into the word representation. We use a greedy strategy to update the modality-relation matrix in the GCN, and we train the model through unsupervised learning. We evaluated the proposed model on multiple downstream NLP tasks, and various experimental results demonstrate that the GCNW outperforms strong unimodal baselines and state-of-the-art multimodal models. We make the source code of both models available to encourage reproducible research.
•We propose a simple method to obtain task-specific word representation.•We propose to handle out the OOV problem by subword and mapping approaches.•The proposed methods achieved performance ...improvement in all four Korean tasks.
Although general word representations (GWRs) by skip-gram or GloVe have been widely used in many natural language processing (NLP) tasks with considerable success, they require further improvement. First, a GWR only represents general information of a word, even though task-oriented information can be more useful in specific tasks. Second, a GWR cannot avoid the out-of-vocabulary (OOV) problem. Thus, some recent studies have proposed methods based on an additional complex model or deep knowledge of resources for each specific task. Although such methods have the potential for improved performance, we believe that the baseline systems of each NLP task are already expensive; hence, making them more complex would be problematic for real-world applications. Therefore, the objective of this study is to overcome the limitations of GWRs by developing simple but effective methods for task-specific word representations (TSWRs) and OOV representations (OOVRs). The proposed methods achieved state-of-the-art performance in four Korean NLP tasks, namely part-of-speech tagging, named entity recognition, dependency parsing, and semantic role labeling.
Interpretability is a significant aspect of the distributed word representation learning model. Although the most advanced pretrained models have achieved the best results till date, the ...interpretability of a pretrained model is difficult to explain clearly. For this reason, based on the interpretability of distributed word embeddings, this paper presents a method of learning word representation using joint context. At present, the existing distributed word representation models for learning word representations usually focus on either neighbor or syntactic context. We argue that it is necessary to simultaneously model both contexts. In particular, the point mutual information obtained by combining the two types of contexts can efficiently express the correlation between the words. We propose two alternative distribution models for learning word representations by employing the neighbor and syntactic contexts via a simple and effective joint learning framework. Furthermore, the proposed models are trained on a public corpus, and the learned representations are evaluated in word analogy, word similarity, and sentence classification tasks. The experimental results demonstrate the potential of the proposed method.
Words, unlike images, are symbolic representations. The associative details inherent within a word's meaning and the visual imagery it generates, are inextricably connected to the way words are ...processed and represented. It is well recognised that the hippocampus associatively binds components of a memory to form a lasting representation, and here we show that the hippocampus is especially sensitive to abstract word processing. Using fMRI during recognition, we found that the increased abstractness of words produced increased hippocampal activation regardless of memory outcome. Interestingly, word recollection produced hippocampal activation regardless of word content, while the parahippocampal cortex was sensitive to concreteness of word representations, regardless of memory outcome. We reason that the hippocampus has assumed a critical role in the representation of uncontextualized abstract word meaning, as its information-binding ability allows the retrieval of the semantic and visual associates that, when bound together, generate the abstract concept represented by word symbols. These insights have implications for research on word representation, memory, and hippocampal function, perhaps shedding light on how the human brain has adapted to encode and represent abstract concepts.
•Explored how words are represented and recognised while varying their abstractness.•Hippocampal activity tracked the degree of abstractness independent of memory.•Recollection of words engaged the hippocampus, regardless of word content.•Abstractness, rather than familiarity, is predicted by hippocampal activity.•Hippocampus helps form associative representations of abstract words.
The detection of mentioned aspects in product reviews is one of the significant and complex tasks in opinion mining. Recently, contextual-based approaches have significantly improved the accuracy of ...aspect extraction over non-contextual embeddings. However, these approaches are often computationally expensive and time-consuming; thus, applying such heavy models with insufficient resources and within runtime systems is impractical in many realistic scenarios. The present investigation sought an efficient, practical deep-learning-based model that relies on the complementary power of various existing non-contextual embeddings. In this regard, two morphology-based (character and FastText) and two syntax-based (POS and extended dependency skip-gram) embeddings were used alongside a base word embedding (GloVe) to form an enriched word representation layer. The presented model was integrated into the proposed network architecture (extended BiGRU). Finally, two novel post-processing rules were applied to refine the errors in the model's predictions. The proposed model achieved F-scores of 0.86, 0.91, 0.79, and 0.80 for the SemEval 2014 laptop domain and the SemEval 2015–2016 restaurant domain, respectively. Furthermore, the results were validated by comparing the computational and temporal efficiency of the proposed model with seven BERT-family transformers through statistical tests.
Deep Learning is considered to leverage smart cities through social media sentiment analysis. The digital content in social media can be used for many smart city applications (SCAs)11Smart city ...applications.. Classical convolutional neural networks (CNNs) are challenging to parallelize and insufficient to capture long term contextual semantic features for sentiment analysis. In this perspective, this paper initially proposes a domain-specific distributed word representation (DS-DWR)22Domain Specific distributed word representation. with a considerably small corpus size induced from textual resources in social media. In DS-DWR, different Distributed Word Representations are concatenated to builds rich representations over the input sequence, which is worthwhile for infrequent and unseen terms. Second, a dilated convolutional neural network (D-CNN)33Dilated Convolutional Neural Network., which is composed of three parallel dilated convolutional neural network (PD-CNN)44Parallel dilated convolutional neural network. layers and a global average pooling (GAP)55Global Average Pooling. layer. Our considered parallel dilated convolution reduces dimension and incorporates an extension in the size of receptive fields without the loss of local information. Further, the long-term contextual semantic information is achieved by the use of different dilation rates. Experiments demonstrate that our architecture accomplishes comparable results with multiple hyperparameters tuning for better parallelism which leads to the minimized computational cost.
Word learning is basic to foreign language acquisition, however time consuming and not always successful. Empirical studies have shown that traditional (visual) word learning can be enhanced by ...gestures. The gesture benefit has been attributed to depth of encoding. Gestures can lead to depth of encoding because they trigger semantic processing and sensorimotor enrichment of the novel word. However, the neural underpinning of depth of encoding is still unclear. Here, we combined an fMRI and a behavioral study to investigate word encoding online. In the scanner, participants encoded 30 novel words of an artificial language created for experimental purposes and their translation into the subjects' native language. Participants encoded the words three times: visually, audiovisually, and by additionally observing semantically related gestures performed by an actress. Hemodynamic activity during word encoding revealed the recruitment of cortical areas involved in stimulus processing. In this study, depth of encoding can be spelt out in terms of sensorimotor brain networks that grow larger the more sensory modalities are linked to the novel word. Word retention outside the scanner documented a positive effect of gestures in a free recall test in the short term.