A winner-take-all Lotka–Volterra recurrent neural network with
N
×
N
neurons is proposed in this paper. Sufficient conditions for existence of winner-take-all stable equilibrium points in the ...network are obtained. These conditions guarantee that there is one and only one winner in each row and each column at any stable equilibrium point. In addition, rigorous convergence analysis is carried out. It is proven that the proposed network model is convergent. The conditions for the winner-take-all behavior obtained in this paper provide design guidelines for network implementation and fabrication. Simulations are also presented to illustrate the theoretical findings.
Benefitting from the low storage cost and high retrieval efficiency, hash learning has become a widely used retrieval technology to approximate nearest neighbors. Within it, the cross-modal medical ...hashing has attracted an increasing attention in facilitating efficiently clinical decision. However, there are still two main challenges in weak multi-manifold structure perseveration across multiple modalities and weak discriminability of hash code. Specifically, existing cross-modal hashing methods focus on pairwise relations within two modalities, and ignore underlying multi-manifold structures across over 2 modalities. Then, there is little consideration about discriminability, i.e., any pair of hash codes should be different. In this paper, we propose a novel hashing method named multi-manifold deep discriminative cross-modal hashing (MDDCH) for large-scale medical image retrieval. The key point is multi-modal manifold similarity which integrates multiple sub-manifolds defined on heterogeneous data to preserve correlation among instances, and it can be measured by three-step connection on corresponding hetero-manifold. Then, we propose discriminative item to make each hash code encoded by hash functions be different, which improves discriminative performance of hash code. Besides, we introduce Gaussian-binary Restricted Boltzmann Machine to directly output hash codes without using any continuous relaxation. Experiments on three benchmark datasets (AIBL, Brain and SPLP) show that our proposed MDDCH achieves comparative performance to recent state-of-the-art hashing methods. Additionally, diagnostic evaluation from professional physicians shows that all the retrieved medical images describe the same object and illness as the queried image.
Universal domain adaptive object detection (UniDAOD) is a more challenging and realistic problem than traditional domain adaptive object detection (DAOD), aiming to transfer the knowledge from the ...well-labeled source domain to the unlabeled target domain without any prior knowledge of label sets. Intuitively, the main challenge of UniDAOD is to eliminate the domain shift and suppress the interference caused by the category shift induced by private classes (i.e., classes only existed in one domain). In the current study, we propose a simple but effective CODE framework, namely Confused and Disentangled Extraction, for alleviating this issue. Specifically, we propose the virtual adversarial adaptation module, characterized by incorporating virtual domain labels within the domain classifier for unaligned samples. This confuses the domain classifier, effectively addressing the issue of converging to local optima resulting from equilibrium challenges and consequently narrowing the domain shift. Simultaneously, we introduce the entropy margin separation module, which utilizes the distinctiveness of category predictions as a disentangled factor. This enables the automatic discovery of private classes in each domain, suppressing interference during the adaptation process. Experiments on four universal scenarios (i.e., closed-set, partial-set, open-partial-set, and open-set) show that CODE obtains a significant performance gain over original DAOD detectors.
In the realm of object detection, traditional domain adaptive object detection (DAOD) methods assume that source and target data completely share one identical class space, which is often difficult ...to satisfy in many real-world applications. To address this limitation, this paper introduces universal domain adaptive object detection (UniDAOD), a learning paradigm that relaxes identical class space assumption to be a different but overlapped class space. Intuitively, the main challenge of UniDAOD is to reduce the negative transfer of private classes (i.e., classes only existed in one domain) and reinforce the positive transfer of the common classes (i.e., classes shared across domains). In this paper, we provide a rigorous theoretical analysis and induce a new generalization bound of the expected target error under the UniDAOD setting. On the basis of theoretical insight, we then propose weighted adaptation (W-adapt) to suppress the interference of private classes and reinforce the positive effects of common classes. In particular, we propose a pseudo category margin (PCM) to quantify class importance based on dynamic pseudotarget label prediction to recognize common classes. Furthermore, to alleviate the impact of inaccurate pseudotarget labels, we propose a temporary memory-based filter (TMF) to dynamically store and update the PCM during progressive training. On the basis of the learned TMF, we design a weighted classwise domain alignment loss to adapt two domains across common classes. Experiments on four universal scenarios (i.e., partial-set, open-partial-set, open-set, and closed-set) show that W-adapt outperforms several domain adaptation methods.
Bird strikes in low-altitude areas can cause severe economic losses and endanger the lives of airline passengers. Thus, it is necessary to drive away the corresponding birds, which requires adequate ...and accurate identification of birds. In this paper, we propose an effective bird identification algorithm using a vision transformer (ViT) with hyper-head attention and a Mel frequency cepstral coefficient (MFCC) flow framework. The original sound signal is preprocessed by using preemphasis, framing, and windowing. Then, the designed MFCC flow, which includes discrete Fourier transform, Mel frequency filtering, and discrete cosine transform operations, is proposed to extract sound features, which are then normalized as a recognizable visual dataset that contains the visual feature and can be identified by subsequent visual feature networks. Next, the ViT with hyper-head attention is designed to encode visual features and accurately identify birds. Extensive experiments on two public datasets show that the proposed method performs satisfactorily. Compared with five recent state-of-the-art approaches, the proposed Transound method achieves average increments of 10.64%, 5.65%, 1.15%, 1.78%, and 1.51%.
•We propose a vision Transformer with hyper-head attention to achieve visual encoding and accurate Birds sound recognition.•We propose MFCC flow to describe the dynamic transformation relationships among patches.•We incorporate a hyper-head attention mechanism into the vision Transformer to measure vision and region similarity.•Ours achieves better performance than other state-of-the-art sound recognition approaches.
Giant panda (Ailuropoda melanoleuca) is an iconic species of conservation. However, long-term monitoring of wild giant pandas has been a challenge, largely due to the lack of appropriate method for ...the identification of target panda individuals. Although there are some traditional methods, such as distance-bamboo stem fragments methods, molecular biological method, and manual visual identification, they all have some limitations that can restrict their application. Therefore, it is urgent to explore a reliable and efficient approach to identify giant panda individuals. Here, we applied the deep learning technology and developed a novel face-identification model based on convolutional neural network to identify giant panda individuals. The model was able to identify 95% of giant panda individuals in the validation dataset. In all simulated field situations where the quality of photo data was degraded, the model still accurately identified more than 90% of panda individuals. The identification accuracy of our model is robust to brightness, small rotation, and cleanness of photos, although large rotation angle (>20°) of photos has significant influence on the identification accuracy of the model (P < 0.01). Our model can be applied in future studies of giant panda such as long-term monitoring, big data analysis for behavior and be adapted for individual identification of other wildlife species.
•We developed an identification model of giant panda, and this model can be extended to other animals.•This model can help researchers to carry out longitudinal studies and analyze big data of behavior.•This model based on deep learning, we make it more robust by improving the neural network and collecting about 65 thousand images.•We considered different factors to adapt to field situation. This model achieved a high recognition rate (about 95% in normal training dataset in which all the images were not treated experimentally, >90% in different situations in which all the images were treated experimentally according to potential factors that may lower image quality in the field).
Image captioning, also called report generation in medical field, aims to describe visual content of images in human language, which requires to model semantic relationship between visual and textual ...elements and generate corresponding descriptions that conform to human language cognition. Image captioning is significant for promoting human–computer interaction in all fields and particularly, for computer-aided diagnosis in medical field. Currently, with the rapid development of deep learning technologies, image caption has attracted increasing attention of many researchers in artificial intelligence-related fields. To this end, this study attempts to provide readers with systematic and comprehensive research about different deep image captioning methods in natural and medical fields. We first introduce workflow of image captioning from perspective of simulating human process of describing images, including seeing, focusing and telling, which is respectively behavioralized into feature representation, visual encoding and language generation. Within it, we present common-used feature representation, visual encoding and language generation models. Then, we review datasets, evaluations and basic losses used in image captioning, and summarize typical caption methods which are generally divided into that with or without using reinforcement learning. Besides, we describe advantages and disadvantages of existing methods, and conclusion and challenges are finally presented.
Medical report generation, as a cross-modal automatic text generation task, can be highly significant both in research and clinical fields. The core is to generate diagnosis reports in clinical ...language from medical images. However, several limitations persist, including a lack of global information, inadequate cross-modal fusion capabilities, and high computational demands. To address these issues, we propose cross-modal global feature fusion Transformer (CGFTrans) to extract global information meanwhile reduce computational strain. Firstly, we introduce mesh recurrent network to capture inter-layer information at different levels to address the absence of global features. Then, we design feature fusion decoder and define 'mid-fusion' strategy to separately fuse visual and global features with medical report embeddings, which enhances the ability of the cross-modal joint learning. Finally, we integrate shifted window attention into Transformer encoder to alleviate computational pressure and capture pathological information at multiple scales. Extensive experiments conducted on three datasets demonstrate that the proposed method achieves average increments of 2.9%, 1.5%, and 0.7% in terms of the BLEU-1, METEOR and ROUGE-L metrics, respectively. Besides, it achieves average increments -22.4% and 17.3% training time and images throughput, respectively.
Individual recognition of animals via infrared camera trapping surveys is an important method for protecting and monitoring animals in the wild. However, several factors limit current survey methods ...used for individual animal recognition, such as the lack of accuracy and extensive time required to process data. Recently, new technologies and methods for individual recognition of animal images have been developed for rare wildlife species (e.g., giant pandas and lemurs). These new technologies require adequate and high-quality sampled images; however, it can be challenging for researchers to obtain an adequate sample size of wildlife images from the field. To overcome this problem, we proposed and tested a new small-sample individual recognition method adapted from FaceNet called PandaFaceNet, using data from a self-built giant panda (Ailuropoda melanoleuca) facial image database. We tested the proposed giant panda individual recognition method on unknown captive and wild giant panda datasets. The results showed that this method has 95.3% recognition accuracy for distinguishing among two captive giant panda facial images and 91% recognition accuracy for distinguishing among two wild giant pandas. Notably, PandaFaceNet achieves individual recognition through comparing two images and is an open-set identification method. Therefore, PandaFaceNet provides a novel method for giant panda research by opening up opportunities for analysis of small sample sizes of panda imagery data, while also providing new directions for research on rare wildlife more broadly.
•We modified the FaceNet network structure to make it more suitable for extracting the facial image features of giant pandas.•We have obtained and tested an appropriate similarity threshold to distinguish individual giant pandas.•Our method is able to recognize giant panda individuals that the PandaFaceNet network has not learned before.•Our method can be used to help researchers investigate and study wild giant pandas.•Our facial image dataset of giant pandas in captivity is useful for studying giant panda individual recognition methods.