Clustering is a long-standing important research problem, however, remains challenging when handling large-scale image data from diverse sources. In this paper, we present a novel Binary Multi-View ...Clustering (BMVC) framework, which can dexterously manipulate multi-view image data and easily scale to large data. To achieve this goal, we formulate BMVC by two key components: compact collaborative discrete representation learning and binary clustering structure learning, in a joint learning framework. Specifically, BMVC collaboratively encodes the multi-view image descriptors into a compact common binary code space by considering their complementary information; the collaborative binary representations are meanwhile clustered by a binary matrix factorization model, such that the cluster structures are optimized in the Hamming space by pure, extremely fast bit-operations. For efficiency, the code balance constraints are imposed on both binary data representations and cluster centroids. Finally, the resulting optimization problem is solved by an alternating optimization scheme with guaranteed fast convergence. Extensive experiments on four large-scale multi-view image datasets demonstrate that the proposed method enjoys the significant reduction in both computation and memory footprint, while observing superior (in most cases) or very competitive performance, in comparison with state-of-the-art clustering methods.
A Survey on Learning to Hash Wang, Jingdong; Zhang, Ting; Song, Jingkuan ...
IEEE transactions on pattern analysis and machine intelligence,
04/2018, Letnik:
40, Številka:
4
Journal Article
Recenzirano
Odprti dostop
Nearest neighbor search is a problem of finding the data points from the database such that the distances from them to the query point are the smallest. Learning to hash is one of the major solutions ...to this problem and has been widely studied recently. In this paper, we present a comprehensive survey of the learning to hash algorithms, categorize them according to the manners of preserving the similarities into: pairwise similarity preserving, multiwise similarity preserving, implicit similarity preserving, as well as quantization, and discuss their relations. We separate quantization from pairwise similarity preserving as the objective function is very different though quantization, as we show, can be derived from preserving the pairwise similarities. In addition, we present the evaluation protocols, and the general performance analysis, and point out that the quantization algorithms perform superiorly in terms of search accuracy, search time cost, and space cost. Finally, we introduce a few emerging topics.
Hashing methods for efficient image retrieval aim at learning hash functions that map similar images to semantically correlated binary codes in the Hamming space with similarity well preserved. The ...traditional hashing methods usually represent image content by hand-crafted features. Deep hashing methods based on deep neural network (DNN) architectures can generate more effective image features and obtain better retrieval performance. However, the underlying data structure is hardly captured by existing DNN models. Moreover, the similarity (either visually or semantically) between pairwise images is ambiguous, even uncertain, to be measured in the existing deep hashing methods. In this article, we propose a novel hashing method termed deep fuzzy hashing network (DFHN) to overcome the shortcomings of existing deep hashing approaches. Our DFHN method combines the fuzzy logic technique and the DNN to learn more effective binary codes, which can leverage fuzzy rules to model the uncertainties underlying the data. Derived from fuzzy logic theory, the generalized hamming distance is devised in the convolutional layers and fully connected layers in our DFHN to model their outputs, which come from an efficient xor operation on given inputs and weights. Extensive experiments show that our DFHN method obtains competitive retrieval accuracy with highly efficient training speed compared with several state-of-the-art deep hashing approaches on two large-scale image datasets: CIFAR-10 and NUS-WIDE.
Zero-shot video object segmentation (ZS-VOS) aims to segment foreground objects in a video sequence without prior knowledge of these objects. However, existing ZS-VOS methods often struggle to ...distinguish between foreground and background or to keep track of the foreground in complex scenarios. The common practice of introducing motion information, such as optical flow, can lead to overreliance on optical flow estimation. To address these challenges, we propose an encoder-decoder-based hierarchical co-attention propagation network (HCPN) capable of tracking and segmenting objects. Specifically, our model is built upon multiple collaborative evolutions of the parallel co-attention module (PCM) and the cross co-attention module (CCM). PCM captures common foreground regions among adjacent appearance and motion features, while CCM further exploits and fuses cross-modal motion features returned by PCM. Our method is progressively trained to achieve hierarchical spatio-temporal feature propagation across the entire video. Experimental results demonstrate that our HCPN outperforms all previous methods on public benchmarks, showcasing its effectiveness for ZS-VOS. Code and pre-trained model can be found at https://github.com/NUST-Machine-Intelligence-Laboratory/HCPN.
Domain adaptation aims to leverage knowledge from a well-labeled source domain to a poorly labeled target domain. A majority of existing works transfer the knowledge at either feature level or sample ...level. Recent studies reveal that both of the paradigms are essentially important, and optimizing one of them can reinforce the other. Inspired by this, we propose a novel approach to jointly exploit feature adaptation with distribution matching and sample adaptation with landmark selection. During the knowledge transfer, we also take the local consistency between the samples into consideration so that the manifold structures of samples can be preserved. At last, we deploy label propagation to predict the categories of new instances. Notably, our approach is suitable for both homogeneous- and heterogeneous-domain adaptations by learning domain-specific projections. Extensive experiments on five open benchmarks, which consist of both standard and large-scale datasets, verify that our approach can significantly outperform not only conventional approaches but also end-to-end deep models. The experiments also demonstrate that we can leverage handcrafted features to promote the accuracy on deep features by heterogeneous adaptation.
In real-world transfer learning tasks, especially in cross-modal applications, the source domain and the target domain often have different features and distributions, which are well known as the ...heterogeneous domain adaptation (HDA) problem. Yet, existing HDA methods focus on either alleviating the feature discrepancy or mitigating the distribution divergence due to the challenges of HDA. In fact, optimizing one of them can reinforce the other. In this paper, we propose a novel HDA method that can optimize both feature discrepancy and distribution divergence in a unified objective function. Specifically, we present progressive alignment , which first learns a new transferable feature space by dictionary-sharing coding, and then aligns the distribution gaps on the new space. Different from previous HDA methods that are limited to specific scenarios, our approach can handle diverse features with arbitrary dimensions. Extensive experiments on various transfer learning tasks, such as image classification, text categorization, and text-to-image recognition, verify the superiority of our method against several state-of-the-art approaches.
Unsupervised hashing can desirably support scalable content-based image retrieval for its appealing advantages of semantic label independence, memory, and search efficiency. However, the learned hash ...codes are embedded with limited discriminative semantics due to the intrinsic limitation of image representation. To address the problem, in this paper, we propose a novel hashing approach, dubbed as discrete semantic transfer hashing (DSTH). The key idea is to directly augment the semantics of discrete image hash codes by exploring auxiliary contextual modalities. To this end, a unified hashing framework is formulated to simultaneously preserve visual similarities of images and perform semantic transfer from contextual modalities. Furthermore, to guarantee direct semantic transfer and avoid information loss, we explicitly impose the discrete constraint, bit-uncorrelation constraint, and bit-balance constraint on hash codes. A novel and effective discrete optimization method based on augmented Lagrangian multiplier is developed to iteratively solve the optimization problem. The whole learning process has linear computation complexity and desirable scalability. Experiments on three benchmark data sets demonstrate the superiority of DSTH compared with several state-of-the-art approaches.
Recently, remote sensing images have become increasingly popular in a number of tasks, such as environmental monitoring. However, the observed images from satellite sensors often suffer from ...low-resolution (LR), making it difficult to meet the requirements for further analysis. Super-resolution (SR) aims to increase the image resolution while providing finer spatial details, which perfectly remedies the weakness of satellite images. Therefore, in this article, we propose an innovative mixed high-order attention network (MHAN) for remote sensing SR. It comprises two components: a feature extraction network for feature extraction, and a feature refinement network with high-order attention (HOA) mechanism for detail restoration. In the feature extraction network, we replace the elementwise addition with weighted channelwise concatenation in all skip connections, which greatly facilitates the information flow. In the feature refinement network, rather than exploring the first-order statistics (spatial or channel attention), we introduce the HOA module to restore the missing details. Finally, to fully exploit hierarchical features, we introduce the frequency-aware connection to bridge the feature extraction and feature refinement networks. Experiments on two widely used remote sensing image data sets demonstrate that our MHAN not only obtains better accuracy than the state-of-the-art methods but also shows the superiority in terms of running time and GPU cost. Code is available at https://github.com/ZhangDY827/MHAN .
Currently, unsupervised heterogeneous domain adaptation in a generalized setting, which is the most common scenario in real-world applications, is under insufficient exploration. Existing approaches ...either are limited to special cases or require labeled target samples for training. This paper aims to overcome these limitations by proposing a generalized framework, named as transfer independently together (TIT). Specifically, we learn multiple transformations, one for each domain (independently) , to map data onto a shared latent space, where the domains are well aligned. The multiple transformations are jointly optimized in a unified framework (together) by an effective formulation. In addition, to learn robust transformations, we further propose a novel landmark selection algorithm to reweight samples, i.e., increase the weight of pivot samples and decrease the weight of outliers. Our landmark selection is based on graph optimization. It focuses on sample geometric relationship rather than sample features. As a result, by abstracting feature vectors to graph vertices, only a simple and fast integer arithmetic is involved in our algorithm instead of matrix operations with float point arithmetic in existing approaches. At last, we effectively optimize our objective via a dimensionality reduction procedure. TIT is applicable to arbitrary sample dimensionality and does not need labeled target samples for training. Extensive evaluations on several standard benchmarks and large-scale datasets of image classification, text categorization and text-to-image recognition verify the superiority of our approach.
Progressive Meta-Learning With Curriculum Zhang, Ji; Song, Jingkuan; Gao, Lianli ...
IEEE transactions on circuits and systems for video technology,
09/2022, Letnik:
32, Številka:
9
Journal Article
Recenzirano
Meta-learning offers an effective solution to learn new concepts under scarce supervision through an episodic-training scheme: a series of target-like tasks sampled from base classes are sequentially ...fed into a meta-learner to extract cross-task knowledge, which can facilitate the quick acquisition of task-specific knowledge of the target task with few samples. Despite its noticeable improvements, the episodic-training strategy samples tasks randomly and uniformly, without considering their hardness and quality, which may not progressively improve the meta-leaner's generalization. In this paper, we propose Progressive Meta-learning using tasks from easy to hard. First, based on a predefined curriculum, we develop a Curriculum-Based Meta-learning (CubMeta) method. CubMeta is in a stepwise manner, and in each step, we design a BrotherNet module to establish harder tasks and an effective learning scheme for obtaining an ensemble of stronger meta-learners. Then we move a step further to propose an end-to-end Self-Paced Meta-learning (SepMeta) method. The curriculum in SepMeta is effectively integrated as a regularization term into the objective so that the meta-learner can measure the hardness of tasks adaptively, according to what the model has already learned. Extensive experiments on benchmark datasets demonstrate the effectiveness of the proposed methods. Our code is available at https://github.com/nobody-777 .