The whole sentence representation reasoning process simultaneously comprises a sentence representation module and a semantic reasoning module. This paper combines the multi-layer semantic ...representation network with the deep fusion matching network to solve the limitations of only considering a sentence representation module or a reasoning model. It proposes a joint optimization method based on multi-layer semantics called the Semantic Fusion Deep Matching Network (SCF-DMN) to explore the influence of sentence representation and reasoning models on reasoning performance. Experiments on text entailment recognition tasks show that the joint optimization representation reasoning method performs better than the existing methods. The sentence representation optimization module and the improved optimization reasoning model can promote reasoning performance when used individually. However, the optimization of the reasoning model has a more significant impact on the final reasoning results. Furthermore, after comparing each module's performance, there is a mutual constraint between the sentence representation module and the reasoning model. This condition restricts overall performance, resulting in no linear superposition of reasoning performance. Overall, by comparing the proposed methods with other existed methods that are tested using the same database, the proposed method solves the lack of in-depth interactive information and interpretability in the model design which would be inspirational for future improving and studying of natural language reasoning.
The Visual Question Answering (VQA) system is the process of finding useful information from images related to the question to answer the question correctly. It can be widely used in the fields of ...visual assistance, automated security surveillance, and intelligent interaction between robots and humans. However, the accuracy of VQA has not been ideal, and the main difficulty in its research is that the image features cannot well represent the scene and object information, and the text information cannot be fully represented. This paper used multi-scale feature extraction and fusion methods in the image feature characterization and text information representation sections of the VQA system, respectively to improve its accuracy. Firstly, aiming at the image feature representation problem, multi-scale feature extraction and fusion method were adopted, and the image features output of different network layers were extracted by a pre-trained deep neural network, and the optimal scheme of feature fusion method was found through experiments. Secondly, for the representation of sentences, a multi-scale feature method was introduced to characterize and fuse the word-level, phrase-level, and sentence-level features of sentences. Finally, the VQA model was improved using the multi-scale feature extraction and fusion method. The results show that the addition of multi-scale feature extraction and fusion improves the accuracy of the VQA model.
Display omitted
•The influence of spatio-temporal factors on human comfort was quantified.•Seasonal change is the main factor affecting physical ...comfort.•Theartificialimpermeablematrixincreasesthesurfacethermalfieldtemperature.•Natural factors can always effectively reduce the surface temperature.•Human comfort can be improved to a certain extent through human intervention.
In recent decades, urbanization and the dramatic increase in urban populations have exacerbated the urban heat island effect. At present, much attention has been paid to the causes and patterns of the urban heat island effect. However, there are few quantitative studies on the impact of the urban heat island effect. By extracting post-world climate data, DEM data, and land use data for 20 years between 2001 and 2020, this paper first studies the variation pattern and spatial distribution characteristics of urban heat islands in New York State, summarizes the seasonal distribution characteristics of temperature, and then uses Giles formula to calculate Tom's discomfort index and evaluate human thermal comfort, and evaluates the effect of heat island effect on human sensory thermal comfort. The results show that on the time scale, the surface temperature in the study area generally showed a slow upward trend over the past 20 years. For example, in July, the maximum and minimum temperatures increased by 3.2 °C and 4.1 °C, respectively. At the spatial scale, most of the heat island areas in the study area were distributed in the New York City agglomeration, especially from May to October, when the heat island effect was particularly obvious. The temperature map showed obvious high temperatures. Compared with 2001, the human discomfort index (DI) increased between June and August 2020. The land use map shows that as the city expands, people's DI index also increases, and the proportion of people who feel uncomfortable with heat increases to 50%. Except for the New York City cluster, other areas were mostly athermic-free. This result shows that excessive urban development concentration seriously affects residents' quality of life. We should pay attention to the superimposed impact of climate change and urban heat islands on the human discomfort index, and adjust the local high temperature and thermal field area through reasonable planning, strengthening greening, and using building technology to make cities more livable.
Small sample learning aims to learn information about object categories from a single or a few training samples. This learning style is crucial for deep learning methods based on large amounts of ...data. The deep learning method can solve small sample learning through the idea of meta-learning “how to learn by using previous experience.” Therefore, this paper takes image classification as the research object to study how meta-learning quickly learns from a small number of sample images. The main contents are as follows: After considering the distribution difference of data sets on the generalization performance of measurement learning and the advantages of optimizing the initial characterization method, this paper adds the model-independent meta-learning algorithm and designs a multi-scale meta-relational network. First, the idea of META-SGD is adopted, and the inner learning rate is taken as the learning vector and model parameter to learn together. Secondly, in the meta-training process, the model-independent meta-learning algorithm is used to find the optimal parameters of the model. The inner gradient iteration is canceled in the process of meta-validation and meta-test. The experimental results show that the multi-scale meta-relational network makes the learned measurement have stronger generalization ability, which further improves the classification accuracy on the benchmark set and avoids the need for fine-tuning of the model-independent meta-learning algorithm.
With the development of artificial intelligence, more and more people hope that computers can understand human language through natural language technology, learn to think like human beings, and ...finally replace human beings to complete the highly difficult tasks with cognitive ability. As the key technology of natural language understanding, sentence representation reasoning technology mainly focuses on the sentence representation method and the reasoning model. Although the performance has been improved, there are still some problems such as incomplete sentence semantic expression, lack of depth of reasoning model, and lack of interpretability of the reasoning process. In this paper, a multi-layer semantic representation network is designed for sentence representation. The multi-attention mechanism obtains the semantic information of different levels of a sentence. The word order information of the sentence is also integrated by adding the relative position mask between words to reduce the uncertainty caused by word order. Finally, the method is verified on the task of text implication recognition and emotion classification. The experimental results show that the multi-layer semantic representation network can promote sentence representation’s accuracy and comprehensiveness.
The existing joint embedding Visual Question Answering models use different combinations of image characterization, text characterization and feature fusion method, but all the existing models use ...static word vectors for text characterization. However, in the real language environment, the same word may represent different meanings in different contexts, and may also be used as different grammatical components. These differences cannot be effectively expressed by static word vectors, so there may be semantic and grammatical deviations. In order to solve this problem, our article constructs a joint embedding model based on dynamic word vector-none KB-Specific network (N-KBSN) model which is different from commonly used Visual Question Answering models based on static word vectors. The N-KBSN model consists of three main parts: question text and image feature extraction module, self attention and guided attention module, feature fusion and classifier module. Among them, the key parts of N-KBSN model are: image characterization based on Faster R-CNN, text characterization based on ELMo and feature enhancement based on multi-head attention mechanism. The experimental results show that the N-KBSN constructed in our experiment is better than the other 2017-winner (glove) model and 2019-winner (glove) model. The introduction of dynamic word vector improves the accuracy of the overall results.
Bilateral teleoperation robots with force feedback enable humans to accomplish these tasks without exposing them to these hazardous environments. Its stability and transparency describe the ...performance of bilateral teleoperation systems with force feedback. Bilateral teleoperation with force feedback enables humans to combine tactics with optesthesia. However, the force feedback may lead to bilateral teleoperation instability if the communication channels’ time delay exists. The instability of bilateral teleoperation with force feedback, which is brought in by the time delay, has become one of the complicated problems researchers need to solve. Transparency is one of the leading design objectives of the teleoperation system. There are two evaluation criteria for transparency: the accuracy of the position followed by the master mechanical arm and the accuracy of the feedback received by the slave arm from the master arm. The main content of this paper is as follows: 1) This paper researches and summarizes the control structures and control algorithms of several well-developed force-feedback bilateral teleoperation systems and decides to improve the PBTDPA algorithm, which aligns with practical application requirements. 2) The four-channel structure makes the transparency of force-feedback bilateral teleoperation systems perfect in theory. This paper uses the four-channel structure combined with the PBTDPA algorithm to improve the transparency of the approach. 3) Moreover, the delay predictor is used to improve the four-channel power-based time domain passivity approach (PBTDPA) control strategy. The delay differential predictor is added to the communication channel. The delay change rate differential predictor can estimate the communication channel’s delay change rate instead of the maximum delay change rate to improve transparency. The simulation experiment of the improved control strategy was carried out. The results show the excellent performance of our design.
In visual reasoning, the achievement of deep learning significantly improved the accuracy of results. Image features are primarily used as input to get answers. However, the image features are too ...redundant to learn accurate characterizations within a limited complexity and time. While in the process of human reasoning, abstract description of an image is usually to avoid irrelevant details. Inspired by this, a higher-level representation named semantic representation is introduced. In this paper, a detailed visual reasoning model is proposed. This new model contains an image understanding model based on semantic representation, feature extraction and process model refined with watershed and u-distance method, a feature vector learning model using pyramidal pooling and residual network, and a question understanding model combining problem embedding coding method and machine translation decoding method. The feature vector could better represent the whole image instead of overly focused on specific characteristics. The model using semantic representation as input verifies that more accurate results can be obtained by introducing a high-level semantic representation. The result also shows that it is feasible and effective to introduce high-level and abstract forms of knowledge representation into deep learning tasks. This study lays a theoretical and experimental foundation for introducing different levels of knowledge representation into deep learning in the future.
Visual Question Answering (VQA) is a significant cross-disciplinary issue in the fields of computer vision and natural language processing that requires a computer to output a natural language answer ...based on pictures and questions posed based on the pictures. This requires simultaneous processing of multimodal fusion of text features and visual features, and the key task that can ensure its success is the attention mechanism. Bringing in attention mechanisms makes it better to integrate text features and image features into a compact multi-modal representation. Therefore, it is necessary to clarify the development status of attention mechanism, understand the most advanced attention mechanism methods, and look forward to its future development direction. In this article, we first conduct a bibliometric analysis of the correlation through CiteSpace, then we find and reasonably speculate that the attention mechanism has great development potential in cross-modal retrieval. Secondly, we discuss the classification and application of existing attention mechanisms in VQA tasks, analysis their shortcomings, and summarize current improvement methods. Finally, through the continuous exploration of attention mechanisms, we believe that VQA will evolve in a smarter and more human direction.
Abstract
The process of computationally identifying and categorizing opinions expressed in a piece of text is of great importance to support better understanding and services to online users in the ...digital environment. However, accurate and fast multi-label automatic classification is still insufficient. By considering not only individual in-sentence features but also the features in the adjacent sentences and the full text of the tweet, this study adjusted the Multi-label
K
-Nearest Neighbors (MLkNN) classifier to allow iterative corrections of the multi-label emotion classification. It applies the new method to improve both the accuracy and speed of emotion classification for short texts on Twitter. By carrying out three groups of experiments on the Twitter corpus, this study compares the performance of the base classifier of MLkNN, the sample-based MLkNN (S-MLkNN), and the label-based MLkNN (L-MLkNN). The results show that the improved MLkNN algorithm can effectively improve the accuracy of emotion classification of short texts, especially when the value of
K
in the MLkNN base classifier is 8, and the value of
α
is 0.7, and the improved L-MLkNN algorithm outperforms the other methods in the overall performance and the recall rate reaches 0.8019. This study attempts to obtain an efficient classifier with smaller training samples and lower training costs for sentiment analysis. It is suggested that future studies should pay more attention to balancing the efficiency of the model with smaller training sample sizes and the completeness of the model to cover various scenarios.