Visual Question Answering (VQA) is a significant cross-disciplinary issue in the fields of computer vision and natural language processing that requires a computer to output a natural language answer ...based on pictures and questions posed based on the pictures. This requires simultaneous processing of multimodal fusion of text features and visual features, and the key task that can ensure its success is the attention mechanism. Bringing in attention mechanisms makes it better to integrate text features and image features into a compact multi-modal representation. Therefore, it is necessary to clarify the development status of attention mechanism, understand the most advanced attention mechanism methods, and look forward to its future development direction. In this article, we first conduct a bibliometric analysis of the correlation through CiteSpace, then we find and reasonably speculate that the attention mechanism has great development potential in cross-modal retrieval. Secondly, we discuss the classification and application of existing attention mechanisms in VQA tasks, analysis their shortcomings, and summarize current improvement methods. Finally, through the continuous exploration of attention mechanisms, we believe that VQA will evolve in a smarter and more human direction.
Abstract
The process of computationally identifying and categorizing opinions expressed in a piece of text is of great importance to support better understanding and services to online users in the ...digital environment. However, accurate and fast multi-label automatic classification is still insufficient. By considering not only individual in-sentence features but also the features in the adjacent sentences and the full text of the tweet, this study adjusted the Multi-label
K
-Nearest Neighbors (MLkNN) classifier to allow iterative corrections of the multi-label emotion classification. It applies the new method to improve both the accuracy and speed of emotion classification for short texts on Twitter. By carrying out three groups of experiments on the Twitter corpus, this study compares the performance of the base classifier of MLkNN, the sample-based MLkNN (S-MLkNN), and the label-based MLkNN (L-MLkNN). The results show that the improved MLkNN algorithm can effectively improve the accuracy of emotion classification of short texts, especially when the value of
K
in the MLkNN base classifier is 8, and the value of
α
is 0.7, and the improved L-MLkNN algorithm outperforms the other methods in the overall performance and the recall rate reaches 0.8019. This study attempts to obtain an efficient classifier with smaller training samples and lower training costs for sentiment analysis. It is suggested that future studies should pay more attention to balancing the efficiency of the model with smaller training sample sizes and the completeness of the model to cover various scenarios.
In this paper, an effective hybrid optimization strategy by incorporating the adaptive optimization of particle swarm optimization (PSO) into genetic algorithm (GA), namely HPSOGA, is used for ...determining the parameters of radial basis function neural networks (number of neurons, their respective centers and radii) automatically. While this task depends upon operator׳s experience with trial and error due to lack of prior knowledge, or based on gradient algorithms which are highly dependent on initial values. In this paper, hybrid evolutionary algorithms are used to automatically build a radial basis function neural networks (RBF-NN) that solves a specified problem, related to rainfall forecasting in this case. In HPSOGA, individuals in a new generation are created through three approaches to improve the global optimization performance, which are elitist strategy, PSO strategy and GA strategy. The upper-half of the best-performing individuals in a population are regarded as elites, whereas the half of the worst-performing individuals are regarded as a swarm. The group constituted by the elites are enhanced by selection, crossover and mutation operation on these enhanced elites. HPSOGA is applied to RBF-NN design for rainfall prediction. The performance of HPSOGA is compared to pure GA in these basis function neural networks design problems, showing that the hybrid strategy is of more effective global exploration ability and to avoid premature convergence. Our findings reveal that the hybrid optimization strategy proposed here may be used as a promising alternative forecasting tool for higher forecasting accuracy and better generalization ability.
The Visual Question Answering (VQA) system is the process of finding useful information from images related to the question to answer the question correctly. It can be widely used in the fields of ...visual assistance, automated security surveillance, and intelligent interaction between robots and humans. However, the accuracy of VQA has not been ideal, and the main difficulty in its research is that the image features cannot well represent the scene and object information, and the text information cannot be fully represented. This paper used multi-scale feature extraction and fusion methods in the image feature characterization and text information representation sections of the VQA system, respectively to improve its accuracy. Firstly, aiming at the image feature representation problem, multi-scale feature extraction and fusion method were adopted, and the image features output of different network layers were extracted by a pre-trained deep neural network, and the optimal scheme of feature fusion method was found through experiments. Secondly, for the representation of sentences, a multi-scale feature method was introduced to characterize and fuse the word-level, phrase-level, and sentence-level features of sentences. Finally, the VQA model was improved using the multi-scale feature extraction and fusion method. The results show that the addition of multi-scale feature extraction and fusion improves the accuracy of the VQA model.
Conversion of glial cells into functional neurons represents a potential therapeutic approach for replenishing neuronal loss associated with neurodegenerative diseases and brain injury. Previous ...attempts in this area using expression of transcription factors were hindered by the low conversion efficiency and failure of generating desired neuronal types in vivo. Here, we report that downregulation of a single RNA-binding protein, polypyrimidine tract-binding protein 1 (Ptbp1), using in vivo viral delivery of a recently developed RNA-targeting CRISPR system CasRx, resulted in the conversion of Müller glia into retinal ganglion cells (RGCs) with a high efficiency, leading to the alleviation of disease symptoms associated with RGC loss. Furthermore, this approach also induced neurons with dopaminergic features in the striatum and alleviated motor defects in a Parkinson’s disease mouse model. Thus, glia-to-neuron conversion by CasRx-mediated Ptbp1 knockdown represents a promising in vivo genetic approach for treating a variety of disorders due to neuronal loss.
Display omitted
•Knockdown of Ptbp1 converts Müller glia into retinal ganglion cells in mature retinas•Central projections of converted retinal ganglion cells restore visual responses•Induction of neurons with dopaminergic features in PD model mice•Induced neurons alleviated motor dysfunctions in PD mice
In vivo CasRx-mediated downregulation of a single RNA-binding protein, Ptbp1, locally converts glia to neurons and shows promise for treating disorders due to neuronal loss in mice.
•Confirms the influence of the Three Gorges Dam on the Yangtze River basin.•Explored three datasets’ statistics, periodic pattern, and coherence.•Reservoir changed the landscape and climate, causing ...precipitation change.•The wavelet coherence analysis show periodic signals other than seasonal change.•There are coherences between dam operation, river discharge, and precipitation.
The Three Gorges Dam and Reservoir on the Yangtze River is one of the world's largest dams. After the dam's construction in 1997, the reservoir started filling up, expanding to a size of over 600 km2. Therefore, its possible influence on maintaining the size and water level of this waterbody is significant and concerning. This research utilized wavelet coherence analysis to examine the temporal correlation and phase coherence among various datasets, including dam injection (1998–2018) and discharge (2003–2018) data, ground station precipitation data along the Yangtze River (1998–2020), and river discharge raster maps 1998–2018. The analysis revealed a strong coherence between dam operation and river discharge rates, as well as a minor seasonal coherence between dam operation and precipitation. The periodic properties of the datasets indicate that, in addition to the general seasonal changes observed in the wavelet coherence analysis, other periodic signals in the datasets are also coherent over time. This coherence may be attributed to the simultaneous impacts of dam operation on precipitation and river discharge. The reasons for this coherence are still unknown, and further studies are required, incorporating information on soil moisture, groundwater levels, air humidity, and the monsoon, to understand how the dam affects them.
This research mainly studies the semi-supervised learning algorithm of different domain data in machine olfaction, also known as sensor drift compensation algorithm. Usually for this kind of problem, ...it is difficult to obtain better recognition results by directly using the semi-supervised learning algorithm. For this reason, we propose a domain transformation semi-supervised weighted kernel extreme learning machine (DTSWKELM) algorithm, which converts the data through the domain and uses SWKELM algorithmic classification to transform the semi-supervised classification problem of different domain data into a semi-supervised classification problem of the same domain data.
•A random number generator, chaotic restricted Boltzmann machine (CRBM) is designed.•A color image encryption algorithm using the Hénon-zigzag map and CRBM is proposed.•Asymmetric image encryption ...system with tamper detection capability is designed.
A color image encryption algorithm using the Hénon-zigzag map and chaotic restricted Boltzmann machine (CRBM) is proposed in this paper. The proposed pseudo-random number generator, chaotic restricted Boltzmann machine (CRBM), can simultaneously generate three pseudo-random number sequences. The algorithm includes the permutation phase and the diffusion phase. In the Hénon-zigzag map-based permutation phase, zigzag map is used to modulate two pseudo-random number sequences generated by Hénon map to obtain two new pseudo-random number sequences. The mixing of these two chaotic maps makes the security of the permutation phase significantly improve. Subsequently the two pseudo-random number sequences are used for row permutation and column permutation, respectively. In the diffusion phase, through multiple iterations of CRBM of the 3 × 3 architecture, three pseudo-random number sequences are generated by the state values of three neurons in the visible layer. Then these three pseudo-random number sequences are used for bitxor operation with the R, G and B components of the scrambled image, respectively. A series of numerical experiments and analyses on encrypted images prove that the proposed algorithm is more secure than state-of-the-art algorithms. Furthermore, based on the combined use of blockchain and the proposed algorithm, a novel image encryption/decryption system is proposed. The system has two features: asymmetric encryption/decryption of images and authoritative verification of the integrity of encrypted images. It may provide a better understanding of blockchain in digital image encryption.
At present, machine sense of smell has shown its important role and advantages in many scenarios. The development of machine sense of smell is inseparable from the support of corresponding data and ...algorithms. However, the process of olfactory data collection is relatively cumbersome, and it is more difficult to collect labeled data. However, in many scenarios, to use a small amount of labeled data to train a good-performing classifier, it is not feasible to rely only on supervised learning algorithms, but semi-supervised learning algorithms can better cope with only a small amount of labeled data and a large amount of unlabeled data. This study combines the new weighted kernel with SKELM and proposes a semi-supervised extreme learning machine algorithm based on the weighted kernel, SELMWK. The experimental results show that the proposed SELMWK algorithm has good classification performance and can solve the semi-supervised gas classification task of the same domain data well on the used dataset.
As the battery cycles between charging and discharging, the working conditions or improper operations such as overcharge and over discharge will aggravate the negative reaction inside the battery, ...generate irreversible chemical substances, and reduce the number of active substances involved in the electrochemical reaction, resulting in a decrease in battery capacity. Batteries that lose 20% of their capacity can be considered to have failed. A failed battery shows that the battery capacity and power decay faster, and the electrical characteristics, stability, and safety of the battery will drop significantly. As a means of improving the machine learning model’s accuracy and generalization for RUL prediction of zinc-ion batteries, this paper mainly discusses about the design of the encoder–decoder model structure and the application of optimization methods. Then, the method of neural network hyperparameter optimization is studied. Finally, the validity of the research work done in this paper is verified by a series of comparative experiments.