To detect illegal copies of copyrighted images, recent copy detection methods mostly rely on the bag-of-visual-words (BOW) model, in which local features are quantized into visual words for image ...matching. However, both the limited discriminability of local features and the BOW quantization errors will lead to many false local matches, which make it hard to distinguish similar images from copies. Geometric consistency verification is a popular technology for reducing the false matches, but it neglects global context information of local features and thus cannot solve this problem well. To address this problem, this paper proposes a global context verification scheme to filter false matches for copy detection. More specifically, after obtaining initial scale invariant feature transform (SIFT) matches between images based on the BOW quantization, the overlapping region-based global context descriptor (OR-GCD) is proposed for the verification of these matches to filter false matches. The OR-GCD not only encodes relatively rich global context information of SIFT features but also has good robustness and efficiency. Thus, it allows an effective and efficient verification. Furthermore, a fast image similarity measurement based on random verification is proposed to efficiently implement copy detection. In addition, we also extend the proposed method for partial-duplicate image detection. Extensive experiments demonstrate that our method achieves higher accuracy than the state-of-the-art methods, and has comparable efficiency to the baseline method based on the BOW quantization.
Recent works on video salient object detection have demonstrated that directly transferring the generalization ability of image-based models to video data without modeling spatial-temporal ...information remains nontrivial and challenging. Considering both intraframe accuracy and interframe consistency of saliency detection, this article presents a novel cross-attention based encoder-decoder model under the Siamese framework (CASNet) for video salient object detection. A baseline encoder-decoder model trained with Lovász softmax loss function is adopted as a backbone network to guarantee the accuracy of intraframe salient object detection. Self- and cross-attention modules are incorporated into our model in order to preserve the saliency correlation and improve intraframe salient detection consistency. Extensive experimental results obtained by ablation analysis and cross-data set validation demonstrate the effectiveness of our proposed method. Quantitative results indicate that our CASNet model outperforms 19 state-of-the-art image- and video-based methods on six benchmark data sets.
In this paper, a new image indexing and retrieval algorithm using local mesh patterns are proposed for biomedical image retrieval application. The standard local binary pattern encodes the ...relationship between the referenced pixel and its surrounding neighbors, whereas the proposed method encodes the relationship among the surrounding neighbors for a given referenced pixel in an image. The possible relationships among the surrounding neighbors are depending on the number of neighbors, P. In addition, the effectiveness of our algorithm is confirmed by combining it with the Gabor transform. To prove the effectiveness of our algorithm, three experiments have been carried out on three different biomedical image databases. Out of which two are meant for computer tomography (CT) and one for magnetic resonance (MR) image retrieval. It is further mentioned that the database considered for three experiments are OASIS-MRI database, NEMA-CT database, and VIA/I-ELCAP database which includes region of interest CT images. The results after being investigated show a significant improvement in terms of their evaluation measures as compared to LBP, LBP with Gabor transform, and other spatial and transform domain methods.
Recently, generative steganography that transforms secret information to a generated image has been a promising technique to resist steganalysis detection. However, due to the inefficiency and ...irreversibility of the secret-to-image transformation, it is hard to find a good trade-off between the information hiding capacity and extraction accuracy. To address this issue, we propose a secret-to-image reversible transformation (S2IRT) scheme for generative steganography. The proposed S2IRT scheme is based on a generative model, i.e., Glow model, which enables a bijective-mapping between latent space with multivariate Gaussian distribution and image space with a complex distribution. In the process of S2I transformation, guided by a given secret message, we construct a latent vector and then map it to a generated image by the Glow model, so that the secret message is finally transformed to the generated image. Owing to good efficiency and reversibility of S2IRT scheme, the proposed steganographic approach achieves both high hiding capacity and accurate extraction of secret message from generated image. Furthermore, a separate encoding-based S2IRT (SE-S2IRT) scheme is also proposed to improve the robustness to common image attacks. The experiments demonstrate the proposed steganographic approaches can achieve high hiding capacity (up to 4 bpp ) and accurate information extraction (almost 100% accuracy rate) simultaneously, while maintaining desirable anti-detectability and imperceptibility.
A Review of Generalized Zero-Shot Learning Methods Pourpanah, Farhad; Abdar, Moloud; Luo, Yuxuan ...
IEEE transactions on pattern analysis and machine intelligence,
04/2023, Letnik:
45, Številka:
4
Journal Article
Recenzirano
Odprti dostop
Generalized zero-shot learning (GZSL) aims to train a model for classifying data samples under the condition that some output classes are unknown during supervised learning. To address this ...challenging task, GZSL leverages semantic information of the seen (source) and unseen (target) classes to bridge the gap between both seen and unseen classes. Since its introduction, many GZSL models have been formulated. In this review paper, we present a comprehensive review on GZSL. First, we provide an overview of GZSL including the problems and challenges. Then, we introduce a hierarchical categorization for the GZSL methods and discuss the representative methods in each category. In addition, we discuss the available benchmark data sets and applications of GZSL, along with a discussion on the research gaps and directions for future investigations.
The extreme learning machine (ELM), which was originally proposed for "generalized" single-hidden layer feedforward neural networks, provides efficient unified learning solutions for the applications ...of clustering, regression, and classification. It presents competitive accuracy with superb efficiency in many applications. However, ELM with subnetwork nodes architecture has not attracted much research attentions. Recently, many methods have been proposed for supervised/unsupervised dimension reduction or representation learning, but these methods normally only work for one type of problem. This paper studies the general architecture of multilayer ELM (ML-ELM) with subnetwork nodes, showing that: 1) the proposed method provides a representation learning platform with unsupervised/supervised and compressed/sparse representation learning and 2) experimental results on ten image datasets and 16 classification datasets show that, compared to other conventional feature learning methods, the proposed ML-ELM with subnetwork nodes performs competitively or much better than other feature learning methods.
In underwater scenes, degraded underwater images caused by wavelength-dependent light absorption and scattering present huge challenges to vision tasks. Underwater image enhancement has attracted ...much attention due to the significance of vision-based applications in marine engineering and underwater robotics. Numerous underwater image enhancement algorithms have been proposed in the last few years. However, almost all existing approaches focus only on the enhancement of independent images. Considering that images photographed in the same underwater scene usually share similar degradation, related images can provide rich complementary information for each other's enhancement. In this paper, we propose an Underwater Image Co-enhancement Network (UICoE-Net) based on an encoder-decoder Siamese architecture. For joint learning, we introduced correlation feature matching units into the multiple layers of our Siamese encoder-decoder structure in order to communicate the mutual correlation of the two branches. Extensive experiments using the Underwater Image Enhancement Benchmark (UIEB), Underwater Image Co-enhancement Dataset (UICoD) collected from an underwater video dataset with ground-truth reference and Stereo Quantitative Underwater Image Dataset (SQUID) dataset demonstrate the effectiveness of our method.
Fully connected representation learning (FCRL) is one of the widely used network structures in multimodel image classification frameworks. However, most FCRL-based structures, for instance, stacked ...autoencoder encode features and find the final cognition with separate building blocks, resulting in loosely connected feature representation. This article achieves a robust representation by considering a low-dimensional feature and the classifier model simultaneously. Thus, a new hierarchical subnetwork-based neural network (HSNN) is proposed in this article. The novelties of this framework are as follows: 1) it is an iterative learning process, instead of stacking separate blocks to obtain the discriminative encoding and the final classification results. In this sense, the optimal global features are generated; 2) it applies Moore-Penrose (MP) inverse-based batch-by-batch learning strategy to handle large-scale data sets, so that large data set, such as Place365 containing 1.8 million images, can be processed effectively. The experimental results on multiple domains with a varying number of training samples from <inline-formula> <tex-math notation="LaTeX">{\sim } 1 \,\,K </tex-math></inline-formula> to <inline-formula> <tex-math notation="LaTeX">{\sim }2~M </tex-math></inline-formula> show that the proposed feature reinforcement framework achieves better generalization performance compared with most state-of-the-art FCRL methods.
Wildfires have devastating consequences on ecological systems and human lives. Accurate and fast wildfire detection is crucial to reduce damage. The existing smoke detection algorithms using ...convolution neural network are mostly based on the classification of smoke images or patches, whereas the traditional smoke detection algorithms are often necessary to extract multiple features for integration. With the methods mentioned above, false positive is always an insurmountable problem in wildfire smoke detection. Moreover, there are few studies on the detection of wildfire smoke. Thus, to detect the wildfire smoke more intelligent, a 3D parallel fully convolutional network for wildfire smoke detection is proposed to segment the smoke regions in video sequences. Wildfire smoke detection is considered as a segmentation problem in this paper. There are more than 90 videos including various scenes used for training and test. Experiments have demonstrated that our architecture can segment smoke regions accurately and eliminate the interference of natural scenes. Smoke targets in multiple scenes can be detected accurately and quickly.
Due to the lack of pre-judgment of fingerprints, fingerprint authentication systems are frequently vulnerable to artificial replicas. Anonymous people can impersonate authorized users to complete ...various authentication operations, thereby disrupting the order of life and causing tremendous economic losses to society. Therefore, to ensure that authorized users' fingerprint information is not used illegally, one possible anti-spoofing technique, called fingerprint liveness detection (FLD), has been exploited. Compared with the hand-crafted feature methods, the deep convolutional neural network (DCNN) can automatically learn the high-level semantic detail via supervised learning algorithm without any professional background knowledge. However, one disadvantage of most CNNs models is that fixed scale images (e.g., <inline-formula> <tex-math notation="LaTeX">227\times 227 </tex-math></inline-formula>) are essential in the input layer. Although the scale problem can be handled by cropping or scaling operations via transforming an image of any scale into a fixed scale, they can easily cause some key texture information loss and image resolution degradation, which will weaken the generalization performance of the classifier model. In this paper, a novel FLD method called an improved DCNN with image scale equalization, has been proposed to preserve texture information and maintain image resolution. Besides, an adaptive learning rate method has been used in this paper. In the performance evaluation, the confusion matrix is applied into FLD for the first time as a performance indicator. The amounts of the experimental results based on the LivDet 2011 and LivDet 2013 data sets also verify that the detection performance of our method is superior to other methods.