The challenges of developing neuromorphic vision systems inspired by the human eye come not only from how to recreate the flexibility, sophistication, and adaptability of animal systems, but also how ...to do so with computational efficiency and elegance. Similar to biological systems, these neuromorphic circuits integrate functions of image sensing, memory and processing into the device, and process continuous analog brightness signal in real-time. High-integration, flexibility and ultra-sensitivity are essential for practical artificial vision systems that attempt to emulate biological processing. Here, we present a flexible optoelectronic sensor array of 1024 pixels using a combination of carbon nanotubes and perovskite quantum dots as active materials for an efficient neuromorphic vision system. The device has an extraordinary sensitivity to light with a responsivity of 5.1 × 10
A/W and a specific detectivity of 2 × 10
Jones, and demonstrates neuromorphic reinforcement learning by training the sensor array with a weak light pulse of 1 μW/cm
.
Recently, significant improvement has been made on semantic object segmentation due to the development of deep convolutional neural networks (DCNNs). Training such a DCNN usually relies on a large ...number of images with pixel-level segmentation masks, and annotating these images is very costly in terms of both finance and human effort. In this paper, we propose a simple to complex (STC) framework in which only image-level annotations are utilized to learn DCNNs for semantic segmentation. Specifically, we first train an initial segmentation network called Initial-DCNN with the saliency maps of simple images (i.e., those with a single category of major object(s) and clean background). These saliency maps can be automatically obtained by existing bottom-up salient object detection techniques, where no supervision information is needed. Then, a better network called Enhanced-DCNN is learned with supervision from the predicted segmentation masks of simple images based on the Initial-DCNN as well as the image-level annotations. Finally, more pixel-level segmentation masks of complex images (two or more categories of objects with cluttered background), which are inferred by using Enhanced-DCNN and image-level annotations, are utilized as the supervision information to learn the Powerful-DCNN for semantic segmentation. Our method utilizes 40K simple images from Flickr.com and 10K complex images from PASCAL VOC for step-wisely boosting the segmentation network. Extensive experimental results on PASCAL VOC 2012 segmentation benchmark well demonstrate the superiority of the proposed STC framework compared with other state-of-the-arts.
Automatic assessment of sentiment from visual content has gained considerable attention with the increasing tendency of expressing opinions via images and videos online. This paper investigates the ...problem of visual sentiment analysis, which involves a high-level abstraction in the recognition process. While most of the current methods focus on improving holistic representations, we aim to utilize the local information, which is inspired by the observation that both the whole image and local regions convey significant sentiment information. We propose a framework to leverage affective regions, where we first use an off-the-shelf objectness tool to generate the candidates, and employ a candidate selection method to remove redundant and noisy proposals. Then, a convolutional neural network (CNN) is connected with each candidate to compute the sentiment scores, and the affective regions are automatically discovered, taking the objectness score as well as the sentiment score into consideration. Finally, the CNN outputs from local regions are aggregated with the whole images to produce the final predictions. Our framework only requires image-level labels, thereby significantly reducing the annotation burden otherwise required for training. This is especially important for sentiment analysis since sentiment can be abstract, and labeling affective regions is too subjective and labor-consuming. Extensive experiments show that the proposed algorithm outperforms the state-of-the-art approaches on eight popular benchmark datasets.
Foreground map evaluation is crucial for gauging the progress of object segmentation algorithms, in particular in the field of salient object detection where the purpose is to accurately detect and ...segment the most salient object in a scene. Several measures (e.g., area-under-the-curve, F1-measure, average precision, etc.) have been used to evaluate the similarity between a foreground map and a ground-truth map. The existing measures are based on pixel-wise errors and often ignore the structural similarities. Behavioral vision studies, however, have shown that the human visual system is highly sensitive to structures in scenes. Here, we propose a novel, efficient (0.005 s per image), and easy to calculate measure known as
S-measure
(structural measure) to evaluate foreground maps. Our new measure simultaneously evaluates region-aware and object-aware structural similarity between a foreground map and a ground-truth map. We demonstrate superiority of our measure over existing ones using 4 meta-measures on 5 widely-used benchmark datasets. Furthermore, we conduct a behavioral judgment study over a new database. Data from 45 subjects shows that on average they preferred the saliency maps chosen by our measure over the saliency maps chosen by the state-of-the-art measures. Our experimental results offer new insights into foreground map evaluation where current measures fail to truly examine the strengths and weaknesses of models. Code:
https://github.com/DengPingFan/S-measure
.
Struck: Structured Output Tracking with Kernels Hare, Sam; Golodetz, Stuart; Saffari, Amir ...
IEEE transactions on pattern analysis and machine intelligence,
2016-Oct.-1, 2016-10-00, 2016-10-1, 20161001, Volume:
38, Issue:
10
Journal Article
Peer reviewed
Open access
Adaptive tracking-by-detection methods are widely used in computer vision for tracking arbitrary objects. Current approaches treat the tracking problem as a classification task and use online ...learning techniques to update the object model. However, for these updates to happen one needs to convert the estimated object position into a set of labelled training examples, and it is not clear how best to perform this intermediate step. Furthermore, the objective for the classifier (label prediction) is not explicitly coupled to the objective for the tracker (estimation of object position). In this paper, we present a framework for adaptive visual object tracking based on structured output prediction. By explicitly allowing the output space to express the needs of the tracker, we avoid the need for an intermediate classification step. Our method uses a kernelised structured output support vector machine (SVM), which is learned online to provide adaptive tracking. To allow our tracker to run at high frame rates, we (a) introduce a budgeting mechanism that prevents the unbounded growth in the number of support vectors that would otherwise occur during tracking, and (b) show how to implement tracking on the GPU. Experimentally, we show that our algorithm is able to outperform state-of-the-art trackers on various benchmark videos. Additionally, we show that we can easily incorporate additional features and kernels into our framework, which results in increased tracking performance.
Supervised deep networks have achieved promising performance on image denoising, by learning image priors and noise statistics on plenty pairs of noisy and clean images. Unsupervised denoising ...networks are trained with only noisy images. However, for an unseen corrupted image, both supervised and unsupervised networks ignore either its particular image prior, the noise statistics, or both. That is, the networks learned from external images inherently suffer from a domain gap problem: the image priors and noise statistics are very different between the training and test images. This problem becomes more clear when dealing with the signal dependent realistic noise. To circumvent this problem, in this work, we propose a novel "Noisy-As-Clean" (NAC) strategy of training self-supervised denoising networks. Specifically, the corrupted test image is directly taken as the "clean" target, while the inputs are synthetic images consisted of this corrupted image and a second yet similar corruption. A simple but useful observation on our NAC is: as long as the noise is weak, it is feasible to learn a self-supervised network only with the corrupted image, approximating the optimal parameters of a supervised network learned with pairs of noisy and clean images. Experiments on synthetic and realistic noise removal demonstrate that, the DnCNN and ResNet networks trained with our self-supervised NAC strategy achieve comparable or better performance than the original ones and previous supervised/unsupervised/self-supervised networks. The code is publicly available at https://github.com/csjunxu/Noisy-As-Clean .
The class activation maps are generated from the final convolutional layer of CNN. They can highlight discriminative object regions for the class of interest. These discovered object regions have ...been widely used for weakly-supervised tasks. However, due to the small spatial resolution of the final convolutional layer, such class activation maps often locate coarse regions of the target objects, limiting the performance of weakly-supervised tasks that need pixel-accurate object locations. Thus, we aim to generate more fine-grained object localization information from the class activation maps to locate the target objects more accurately. In this paper, by rethinking the relationships between the feature maps and their corresponding gradients, we propose a simple yet effective method, called LayerCAM. It can produce reliable class activation maps for different layers of CNN. This property enables us to collect object localization information from coarse (rough spatial localization) to fine (precise fine-grained details) levels. We further integrate them into a high-quality class activation map, where the object-related pixels can be better highlighted. To evaluate the quality of the class activation maps produced by LayerCAM, we apply them to weakly-supervised object localization and semantic segmentation. Experiments demonstrate that the class activation maps generated by our method are more effective and reliable than those by the existing attention methods. The code will be made publicly available.
We extensively compare, qualitatively and quantitatively, 41 state-of-the-art models (29 salient object detection, 10 fixation prediction, 1 objectness, and 1 baseline) over seven challenging data ...sets for the purpose of benchmarking salient object detection and segmentation methods. From the results obtained so far, our evaluation shows a consistent rapid progress over the last few years in terms of both accuracy and running time. The top contenders in this benchmark significantly outperform the models identified as the best in the previous benchmark conducted three years ago. We find that the models designed specifically for salient object detection generally work better than models in closely related areas, which in turn provides a precise definition and suggests an appropriate treatment of this problem that distinguishes it from other problems. In particular, we analyze the influences of center bias and scene complexity in model performance, which, along with the hard cases for the state-of-the-art models, provide useful hints toward constructing more challenging large-scale data sets and better saliency models. Finally, we propose probable solutions for tackling several open problems, such as evaluation scores and data set bias, which also suggest future research directions in the rapidly growing field of salient object detection.
The visual saliency detection model simulates the human visual system to perceive the scene and has been widely used in many vision tasks. With the development of acquisition technology, more ...comprehensive information, such as depth cue, inter-image correspondence, or temporal relationship, is available to extend image saliency detection to RGBD saliency detection, co-saliency detection, or video saliency detection. The RGBD saliency detection model focuses on extracting the salient regions from RGBD images by combining the depth information. The co-saliency detection model introduces the inter-image correspondence constraint to discover the common salient object in an image group. The goal of the video saliency detection model is to locate the motion-related salient object in video sequences, which considers the motion cue and spatiotemporal constraint jointly. In this paper, we review different types of saliency detection algorithms, summarize the important issues of the existing methods, and discuss the existent problems and future works. Moreover, the evaluation datasets and quantitative measurements are briefly introduced, and the experimental analysis and discussion are conducted to provide a holistic overview of different saliency detection methods.
Several studies have compared the molecular responses between e14a2 and e13a2 BCR::ABL1 transcripts in chronic myeloid leukemia (CML) patients treated with front‐line imatinib, but there were very ...limited studies on nilotinib or dasatinib‐treated patients. We retrospectively analyzed the molecular responses in 1124 CML patients with the e14a2 or e13a2 transcript receiving front‐line imatinib, nilotinib or dasatinib treatment. Patients with the e14a2 transcript had higher optimal response rates than those with the e13a2 transcript at 12 months in the imatinib‐treated group, and 6 and 12 months in the nilotinib‐treated group. The optimal response rates were not significantly different between the two transcripts in the dasatinib‐treated group at landmark molecular responses. With a median follow‐up time of 48.4 months, higher cumulative incidences of BCR::ABL1 International Scale ≤1% and major molecular response were observed in patients with the e14a2 rather than the e13a2 transcript receiving front‐line imatinib or nilotinib treatment, but not in dasatinib‐treated patients. The progression‐free survival and overall survival did not differ between the two transcripts in all three treatment groups. In view of the speed and depth of molecular responses, BCR::ABL1 transcript subtypes might provide helpful information in selecting a front‐line tyrosine kinase inhibitor for individual young patients with future potential treatment‐free remission.
Patients with e14a2 transcript had higher optimal response rates than e13a2 transcript at 12 months in the imatinib‐treated group, 6 and 12 months in the nilotinib‐treated group. The optimal response rates between the 2 transcripts were similar in the dasatinib‐treated group at landmark molecular responses. In view of the speed and depth of molecular responses, BCR::ABL1 transcript subtypes might provide helpful information in selecting a front‐line tyrosine kinase inhibitor for individual young patients with future potential treatment‐free remission.