The use of multiple features has been shown to be an effective strategy for visual tracking because of their complementary contributions to appearance modeling. The key problem is how to learn a ...fused representation from multiple features for appearance modeling. Different features extracted from the same object should share some commonalities in their representations while each feature should also have some feature-specific representation patterns which reflect its complementarity in appearance modeling. Different from existing multi-feature sparse trackers which only consider the commonalities among the sparsity patterns of multiple features, this paper proposes a novel multiple sparse representation framework for visual tracking which jointly exploits the shared and feature-specific properties of different features by decomposing multiple sparsity patterns. Moreover, we introduce a novel online multiple metric learning to efficiently and adaptively incorporate the appearance proximity constraint, which ensures that the learned commonalities of multiple features are more representative. Experimental results on tracking benchmark videos and other challenging videos demonstrate the effectiveness of the proposed tracker.
Recently, sparse coding has been successfully applied in visual tracking. The goal of this paper is to review the state-of-the-art tracking methods based on sparse coding. We first analyze the ...benefits of using sparse coding in visual tracking and then categorize these methods into appearance modeling based on sparse coding (AMSC) and target searching based on sparse representation (TSSR) as well as their combination. For each categorization, we introduce the basic framework and subsequent improvements with emphasis on their advantages and disadvantages. Finally, we conduct extensive experiments to compare the representative methods on a total of 20 test sequences. The experimental results indicate that: (1) AMSC methods significantly outperform TSSR methods. (2) For AMSC methods, both discriminative dictionary and spatial order reserved pooling operators are important for achieving high tracking accuracy. (3) For TSSR methods, the widely used identity pixel basis will degrade the performance when the target or candidate images are not aligned well or severe occlusion occurs. (4) For TSSR methods, ℓ1 norm minimization is not necessary. In contrast, ℓ2 norm minimization can obtain comparable performance but with lower computational cost. The open questions and future research topics are also discussed.
► A comprehensive review of visual tracking based on sparse coding. ► Extensive experimental comparison between 15 state of the art tracking methods on a total of 20 challenging sequences. ► Analyze the benefits of using sparse coding in visual tracking. ► Point out the future research topics.
Recovering the 3D representation of an object from single-view or multi-view RGB images by deep neural networks has attracted increasing attention in the past few years. Several mainstream works ...(e.g., 3D-R2N2) use recurrent neural networks (RNNs) to fuse multiple feature maps extracted from input images sequentially. However, when given the same set of input images with different orders, RNN-based approaches are unable to produce consistent reconstruction results. Moreover, due to long-term memory loss, RNNs cannot fully exploit input images to refine reconstruction results. To solve these problems, we propose a novel framework for single-view and multi-view 3D reconstruction, named Pix2Vox. By using a well-designed encoder-decoder, it generates a coarse 3D volume from each input image. Then, a context-aware fusion module is introduced to adaptively select high-quality reconstructions for each part (e.g., table legs) from different coarse 3D volumes to obtain a fused 3D volume. Finally, a refiner further refines the fused 3D volume to generate the final output. Experimental results on the ShapeNet and Pix3D benchmarks indicate that the proposed Pix2Vox outperforms state-of-the-arts by a large margin. Furthermore, the proposed method is 24 times faster than 3D-R2N2 in terms of backward inference time. The experiments on ShapeNet unseen 3D categories have shown the superior generalization abilities of our method.
Sparse representation-based visual tracking approaches have attracted increasing interests in the community in recent years. The main idea is to linearly represent each target candidate using a set ...of target and trivial templates, while imposing a sparsity constraint onto the representation coefficients. After we obtain the coefficients using ℓ 1 -norm minimization methods, the candidate with the lowest error, when it is reconstructed using only the target templates and the associated coefficients, is considered as the tracking result. In spite of promising system performance widely reported, it is unclear if the performance of these trackers can be maximized. In addition, computational complexity caused by the dimensionality of the feature space limits these algorithms in real-time applications. In this paper, we propose a real-time visual tracking method based on structurally random projection (RP) and weighted least squares (WLS) techniques. In particular, to enhance the discriminative capability of the tracker, we introduce background templates to the linear representation framework. To handle appearance variations over time, we relax the sparsity constraint using a WLS method to obtain the representation coefficients. To further reduce the computational complexity, structurally RP is used to reduce the dimensionality of the feature space, while preserving the pairwise distances between the data points in the feature space. Experimental results show that the proposed approach outperforms several state-of-the-art tracking methods.
Recently, several Space-Time Memory based networks have shown that the object cues (e.g. video frames as well as the segmented object masks) from the past frames are useful for segmenting objects in ...the current frame. However, these methods exploit the information from the memory by global-to-global matching between the current and past frames, which lead to mismatching to similar objects and high computational complexity. To address these problems, we propose a novel local-to-local matching solution for semi-supervised VOS, namely Regional Memory Network (RMNet). In RMNet, the precise regional memory is constructed by memorizing local regions where the target objects appear in the past frames. For the current query frame, the query regions are tracked and predicted based on the optical flow estimated from the previous frame. The proposed local-to-local matching effectively alleviates the ambiguity of similar objects in both memory and query frames, which allows the information to be passed from the regional memory to the query region efficiently and effectively. Experimental results indicate that the proposed RM-Net performs favorably against state-of-the-art methods on the DAVIS and YouTube-VOS datasets.
Robust Visual Tracking via Basis Matching Shengping Zhang; Xiangyuan Lan; Yuankai Qi ...
IEEE transactions on circuits and systems for video technology,
03/2017, Letnik:
27, Številka:
3
Journal Article
Recenzirano
Most existing tracking approaches are based on either the tracking by detection framework or the tracking by matching framework. The former needs to learn a discriminative classifier using positive ...and negative samples, which will cause tracking drift due to unreliable samples. The latter usually performs tracking by matching local interest points between a target candidate and the tracked target, which is not robust to target appearance changes over time. In this paper, we propose a novel tracking by matching framework for robust tracking based on basis matching rather than point matching. In particular, we learn the target model from target images using a set of Gabor basis functions, which have large responses on the corresponding spatial positions after a max pooling. During tracking, a target candidate is evaluated by computing the responses of the Gabor basis functions on their corresponding spatial positions. The experimental results on a set of challenging sequences validate that the performance of the proposed tracking method outperforms those of several state-of-the-art methods.
In this paper, we propose a biologically inspired appearance model for robust visual tracking. Motivated in part by the success of the hierarchical organization of the primary visual cortex (area ...V1), we establish an architecture consisting of five layers: whitening, rectification, normalization, coding, and pooling. The first three layers stem from the models developed for object recognition. In this paper, our attention focuses on the coding and pooling layers. In particular, we use a discriminative sparse coding method in the coding layer along with spatial pyramid representation in the pooling layer, which makes it easier to distinguish the target to be tracked from its background in the presence of appearance variations. An extensive experimental study shows that the proposed method has higher tracking accuracy than several state-of-the-art trackers.
Hedged Deep Tracking Yuankai Qi; Shengping Zhang; Lei Qin ...
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
2016-June
Conference Proceeding
In recent years, several methods have been developed to utilize hierarchical features learned from a deep convolutional neural network (CNN) for visual tracking. However, as features from a certain ...CNN layer characterize an object of interest from only one aspect or one level, the performance of such trackers trained with features from one layer (usually the second to last layer) can be further improved. In this paper, we propose a novel CNN based tracking framework, which takes full advantage of features from different CNN layers and uses an adaptive Hedge method to hedge several CNN based trackers into a single stronger one. Extensive experiments on a benchmark dataset of 100 challenging image sequences demonstrate the effectiveness of the proposed algorithm compared to several state-of-theart trackers.
Cucurbitacins are triterpenoids that confer a bitter taste in cucurbits such as cucumber, melon, watermelon, squash, and pumpkin. These compounds discourage most pests on the plant and have also been ...shown to have antitumor properties. With genomics and biochemistry, we identified nine cucumber genes in the pathway for biosynthesis of cucurbitacin C and elucidated four catalytic steps. We discovered transcription factors Bl (Bitter leaf) and Bt (Bitter fruit) that regulate this pathway in leaves and fruits, respectively. Traces in genomic signatures indicated that selection imposed on Bt during domestication led to derivation of nonbitter cucurbits from their bitter ancestors.
Computer vision cracks the leaf code Wilf, Peter; Zhang, Shengping; Chikkerur, Sharat ...
Proceedings of the National Academy of Sciences - PNAS,
03/2016, Letnik:
113, Številka:
12
Journal Article
Recenzirano
Odprti dostop
Understanding the extremely variable, complex shape and venation characters of angiosperm leaves is one of the most challenging problems in botany. Machine learning offers opportunities to analyze ...large numbers of specimens, to discover novel leaf features of angiosperm clades that may have phylogenetic significance, and to use those characters to classify unknowns. Previous computer vision approaches have primarily focused on leaf identification at the species level. It remains an open question whether learning and classification are possible among major evolutionary groups such as families and orders, which usually contain hundreds to thousands of species each and exhibit many times the foliar variation of individual species. Here, we tested whether a computer vision algorithm could use a database of 7,597 leaf images from 2,001 genera to learn features of botanical families and orders, then classify novel images. The images are of cleared leaves, specimens that are chemically bleached, then stained to reveal venation. Machine learning was used to learn a codebook of visual elements representing leaf shape and venation patterns. The resulting automated system learned to classify images into families and orders with a success rate many times greater than chance. Of direct botanical interest, the responses of diagnostic features can be visualized on leaf images as heat maps, which are likely to prompt recognition and evolutionary interpretation of a wealth of novel morphological characters. With assistance from computer vision, leaves are poised to make numerous new contributions to systematic and paleobotanical studies.