The Visual Question Answering (VQA) system is the process of finding useful information from images related to the question to answer the question correctly. It can be widely used in the fields of ...visual assistance, automated security surveillance, and intelligent interaction between robots and humans. However, the accuracy of VQA has not been ideal, and the main difficulty in its research is that the image features cannot well represent the scene and object information, and the text information cannot be fully represented. This paper used multi-scale feature extraction and fusion methods in the image feature characterization and text information representation sections of the VQA system, respectively to improve its accuracy. Firstly, aiming at the image feature representation problem, multi-scale feature extraction and fusion method were adopted, and the image features output of different network layers were extracted by a pre-trained deep neural network, and the optimal scheme of feature fusion method was found through experiments. Secondly, for the representation of sentences, a multi-scale feature method was introduced to characterize and fuse the word-level, phrase-level, and sentence-level features of sentences. Finally, the VQA model was improved using the multi-scale feature extraction and fusion method. The results show that the addition of multi-scale feature extraction and fusion improves the accuracy of the VQA model.
Display omitted
•The influence of spatio-temporal factors on human comfort was quantified.•Seasonal change is the main factor affecting physical ...comfort.•Theartificialimpermeablematrixincreasesthesurfacethermalfieldtemperature.•Natural factors can always effectively reduce the surface temperature.•Human comfort can be improved to a certain extent through human intervention.
In recent decades, urbanization and the dramatic increase in urban populations have exacerbated the urban heat island effect. At present, much attention has been paid to the causes and patterns of the urban heat island effect. However, there are few quantitative studies on the impact of the urban heat island effect. By extracting post-world climate data, DEM data, and land use data for 20 years between 2001 and 2020, this paper first studies the variation pattern and spatial distribution characteristics of urban heat islands in New York State, summarizes the seasonal distribution characteristics of temperature, and then uses Giles formula to calculate Tom's discomfort index and evaluate human thermal comfort, and evaluates the effect of heat island effect on human sensory thermal comfort. The results show that on the time scale, the surface temperature in the study area generally showed a slow upward trend over the past 20 years. For example, in July, the maximum and minimum temperatures increased by 3.2 °C and 4.1 °C, respectively. At the spatial scale, most of the heat island areas in the study area were distributed in the New York City agglomeration, especially from May to October, when the heat island effect was particularly obvious. The temperature map showed obvious high temperatures. Compared with 2001, the human discomfort index (DI) increased between June and August 2020. The land use map shows that as the city expands, people's DI index also increases, and the proportion of people who feel uncomfortable with heat increases to 50%. Except for the New York City cluster, other areas were mostly athermic-free. This result shows that excessive urban development concentration seriously affects residents' quality of life. We should pay attention to the superimposed impact of climate change and urban heat islands on the human discomfort index, and adjust the local high temperature and thermal field area through reasonable planning, strengthening greening, and using building technology to make cities more livable.
Visual Question Answering (VQA) is a significant cross-disciplinary issue in the fields of computer vision and natural language processing that requires a computer to output a natural language answer ...based on pictures and questions posed based on the pictures. This requires simultaneous processing of multimodal fusion of text features and visual features, and the key task that can ensure its success is the attention mechanism. Bringing in attention mechanisms makes it better to integrate text features and image features into a compact multi-modal representation. Therefore, it is necessary to clarify the development status of attention mechanism, understand the most advanced attention mechanism methods, and look forward to its future development direction. In this article, we first conduct a bibliometric analysis of the correlation through CiteSpace, then we find and reasonably speculate that the attention mechanism has great development potential in cross-modal retrieval. Secondly, we discuss the classification and application of existing attention mechanisms in VQA tasks, analysis their shortcomings, and summarize current improvement methods. Finally, through the continuous exploration of attention mechanisms, we believe that VQA will evolve in a smarter and more human direction.
Abstract
The process of computationally identifying and categorizing opinions expressed in a piece of text is of great importance to support better understanding and services to online users in the ...digital environment. However, accurate and fast multi-label automatic classification is still insufficient. By considering not only individual in-sentence features but also the features in the adjacent sentences and the full text of the tweet, this study adjusted the Multi-label
K
-Nearest Neighbors (MLkNN) classifier to allow iterative corrections of the multi-label emotion classification. It applies the new method to improve both the accuracy and speed of emotion classification for short texts on Twitter. By carrying out three groups of experiments on the Twitter corpus, this study compares the performance of the base classifier of MLkNN, the sample-based MLkNN (S-MLkNN), and the label-based MLkNN (L-MLkNN). The results show that the improved MLkNN algorithm can effectively improve the accuracy of emotion classification of short texts, especially when the value of
K
in the MLkNN base classifier is 8, and the value of
α
is 0.7, and the improved L-MLkNN algorithm outperforms the other methods in the overall performance and the recall rate reaches 0.8019. This study attempts to obtain an efficient classifier with smaller training samples and lower training costs for sentiment analysis. It is suggested that future studies should pay more attention to balancing the efficiency of the model with smaller training sample sizes and the completeness of the model to cover various scenarios.
Facing fast-increasing electronic documents in the Digital Media Age, the need to extract textual features of online texts for better communication is growing. Sentiment classification might be the ...key method to catch emotions of online communication, and developing corpora with annotation of emotions is the first step to achieving sentiment classification. However, the labour-intensive and costly manual annotation has resulted in the lack of corpora for emotional words. Furthermore, single-label semantic corpora could hardly meet the requirement of modern analysis of complicated user’s emotions, but tagging emotional words with multiple labels is even more difficult than usual. Improvement of the methods of automatic emotion tagging with multiple emotion labels to construct new semantic corpora is urgently needed. Taking Twitter short texts as the case, this study proposes a new semi-automatic method to annotate Internet short texts with multiple labels and form a multi-labelled corpus for further algorithm training. Each sentence is tagged with both the emotional tendency and polarity, and each tweet, which generally contains several sentences, is tagged with the first two major emotional tendencies. The semi-automatic multi-labelled annotation is achieved through the process of selecting the base corpus and emotional tags, data preprocessing, automatic annotation through word matching and weight calculation, and manual correction in case of multiple emotional tendencies are found. The experiments on the Sentiment140 published Twitter corpus demonstrate the effectiveness of the proposed approach and show consistency between the results of semi-automatic annotation and manual annotation. By applying this method, this study summarises the annotation specification and constructs a multi-labelled emotion corpus with 6500 tweets for further algorithm training.
•Confirms the influence of the Three Gorges Dam on the Yangtze River basin.•Explored three datasets’ statistics, periodic pattern, and coherence.•Reservoir changed the landscape and climate, causing ...precipitation change.•The wavelet coherence analysis show periodic signals other than seasonal change.•There are coherences between dam operation, river discharge, and precipitation.
The Three Gorges Dam and Reservoir on the Yangtze River is one of the world's largest dams. After the dam's construction in 1997, the reservoir started filling up, expanding to a size of over 600 km2. Therefore, its possible influence on maintaining the size and water level of this waterbody is significant and concerning. This research utilized wavelet coherence analysis to examine the temporal correlation and phase coherence among various datasets, including dam injection (1998–2018) and discharge (2003–2018) data, ground station precipitation data along the Yangtze River (1998–2020), and river discharge raster maps 1998–2018. The analysis revealed a strong coherence between dam operation and river discharge rates, as well as a minor seasonal coherence between dam operation and precipitation. The periodic properties of the datasets indicate that, in addition to the general seasonal changes observed in the wavelet coherence analysis, other periodic signals in the datasets are also coherent over time. This coherence may be attributed to the simultaneous impacts of dam operation on precipitation and river discharge. The reasons for this coherence are still unknown, and further studies are required, incorporating information on soil moisture, groundwater levels, air humidity, and the monsoon, to understand how the dam affects them.
As the battery cycles between charging and discharging, the working conditions or improper operations such as overcharge and over discharge will aggravate the negative reaction inside the battery, ...generate irreversible chemical substances, and reduce the number of active substances involved in the electrochemical reaction, resulting in a decrease in battery capacity. Batteries that lose 20% of their capacity can be considered to have failed. A failed battery shows that the battery capacity and power decay faster, and the electrical characteristics, stability, and safety of the battery will drop significantly. As a means of improving the machine learning model’s accuracy and generalization for RUL prediction of zinc-ion batteries, this paper mainly discusses about the design of the encoder–decoder model structure and the application of optimization methods. Then, the method of neural network hyperparameter optimization is studied. Finally, the validity of the research work done in this paper is verified by a series of comparative experiments.
The attitude closed-loop control of the parallel platform in the working space needs to determine the relationship between the pose of the top moving platform and the length of each mechanical arm, ...that is, the kinematics problem of the parallel platform. In this study, the kinematics model of the six-degree-of-freedom parallel platform was established. The kinematics forward solution algorithm based on Newton–Raphson iteration was studied. The kinematics forward solution method usually adopts a numerical solution, which often needs multiple iterations, and the algorithm has a poor real-time performance. In order to improve the real-time performance of the parallel platform control system, a multivariate polynomial regression kinematics forward solution algorithm is proposed in this paper. Moreover, by combining the multivariate polynomial regression with the Newton iterative method, we obtained an efficient solution algorithm with controllable solution accuracy. The effectiveness of the proposed method was verified by simulation tests and physical tests.
Detecting changes in land cover is a critical task in remote sensing image interpretation, with particular significance placed on accurately determining the boundaries of lakes. Lake boundaries are ...closely tied to land resources, and any alterations can have substantial implications for the surrounding environment and ecosystem. This paper introduces an innovative end-to-end model that combines U-Net and spatial transformation network (STN) to predict changes in lake boundaries and investigate the evolution of the Lake Urmia boundary. The proposed approach involves pre-processing annual panoramic remote sensing images of Lake Urmia, obtained from 1996 to 2014 through Google Earth Pro Version 7.3 software, using image segmentation and grayscale filling techniques. The results of the experiments demonstrate the model’s ability to accurately forecast the evolution of lake boundaries in remote sensing images. Additionally, the model exhibits a high degree of adaptability, effectively learning and adjusting to changing patterns over time. The study also evaluates the influence of varying time series lengths on prediction accuracy and reveals that longer time series provide a larger number of samples, resulting in more precise predictions. The maximum achieved accuracy reaches 89.3%. The findings and methodologies presented in this study offer valuable insights into the utilization of deep learning techniques for investigating and managing lake boundary changes, thereby contributing to the effective management and conservation of this significant ecosystem.
Natural language processing (NLP) based on deep learning provides a positive performance for generative dialogue system, and the transformer model is a new boost in NLP after the advent of word ...vectors. In this paper, a Chinese generative dialogue system based on transformer is designed, which only uses a multi-layer transformer decoder to build the system and uses the design of an incomplete mask to realize one-way language generation. That is, questions can perceive context information in both directions, while reply sentences can only output one-way autoregressive. The above system improvements make the one-way generation of dialogue tasks more logical and reasonable, and the performance is better than the traditional dialogue system scheme. In consideration of the long-distance information weakness of absolute position coding, we put forward the improvement of relative position coding in theory, and verify it in subsequent experiments. In the transformer module, the calculation formula of self-attention is modified, and the relative position information is added to replace the absolute position coding of the position embedding layer. The performance of the modified model in BLEU, embedding average, grammatical and semantic coherence is ideal, to enhance long-distance attention.