Do We Need More Training Data? Zhu, Xiangxin; Vondrick, Carl; Fowlkes, Charless C. ...
International journal of computer vision,
08/2016, Volume:
119, Issue:
1
Journal Article
Peer reviewed
Open access
Datasets for training object recognition systems are steadily increasing in size. This paper investigates the question of whether existing detectors will continue to improve as data grows, or ...saturate in performance due to limited model complexity and the Bayes risk associated with the feature spaces in which they operate. We focus on the popular paradigm of discriminatively trained templates defined on oriented gradient features. We investigate the performance of mixtures of templates as the number of mixture components and the amount of training data grows. Surprisingly, even with proper treatment of regularization and “outliers”, the performance of classic mixture models appears to saturate quickly (
∼
10
templates and
∼
100
positive training examples per template). This is not a limitation of the feature space as compositional mixtures that share template parameters via parts and that can synthesize new templates not encountered during training yield significantly better performance. Based on our analysis, we conjecture that the greatest gains in detection performance will continue to derive from improved representations and learning algorithms that can make efficient use of large datasets.
In this paper, we propose a highly accurate inpainting algorithm which reconstructs an image from a fraction of its pixels. Our algorithm is inspired by the recent progress of non‐local image ...processing techniques following the idea of ‘grouping and collaborative filtering’. In our framework, we first match and group similar patches in the input image, and then convert the problem of estimating missing values for the stack of matched patches to the problem of low‐rank matrix completion, and finally obtain the result by synthesizing all the restored patches. In our algorithm, how to accurately perform patch matching process and solve the low‐rank matrix completion problem are key points. For the first problem, we propose a robust patch matching approach, and for the second task, the alternating direction method of multipliers is employed. Experiments show that our algorithm has superior advantages over existing inpainting techniques. Besides, our algorithm can be easily extended to handle practical applications including rendering acceleration, photo restoration and object removal.
In this paper, we propose a highly accurate inpainting algorithm which reconstructs an image from a fraction of its pixels. Our algorithm is inspired by the recent progress of non‐local image processing techniques following the idea of ‘grouping and collaborative filtering.’ In our framework, we first match and group similar patches in the input image, and then convert the problem of estimating missing values for the stack of matched patches to the problem of low‐rank matrix completion and finally obtain the result by synthesizing all the restored patches. In our algorithm, how to accurately perform patch matching process and solve the low‐rank matrix completion problem are key points. For the first problem, we propose a robust patch matching approach, and for the second task, the alternating direction method of multipliers is employed. Experiments show that our algorithm has superior advantages over existing inpainting techniques. Besides, our algorithm can be easily extended to handle practical applications including rendering acceleration, photo restoration and object removal.
Animal eyes have evolved to process behaviorally important visual information, but how retinas deal with statistical asymmetries in visual space remains poorly understood. Using hyperspectral imaging ...in the field, in vivo 2-photon imaging of retinal neurons, and anatomy, here we show that larval zebrafish use a highly anisotropic retina to asymmetrically survey their natural visual world. First, different neurons dominate different parts of the eye and are linked to a systematic shift in inner retinal function: above the animal, there is little color in nature, and retinal circuits are largely achromatic. Conversely, the lower visual field and horizon are color rich and are predominately surveyed by chromatic and color-opponent circuits that are spectrally matched to the dominant chromatic axes in nature. Second, in the horizontal and lower visual field, bipolar cell terminals encoding achromatic and color-opponent visual features are systematically arranged into distinct layers of the inner retina. Third, above the frontal horizon, a high-gain UV system piggybacks onto retinal circuits, likely to support prey capture.
Display omitted
•The larval zebrafish retina is anatomically and functionally asymmetric•The upper-frontal visual field is dominated by UV-sensitive prey-capture circuits•Circuits for tetrachromatic color vision survey the horizon and lower visual field•This organization matches natural chromatic statistics and behavioral demands
With half of their brain located inside the eyes, every neuron counts in the larval zebrafish retina. By 2-photon and hyperspectral natural imaging, Zimmermann et al. show how their near-360° visual field is functionally divided into tetrachromatic, achromatic, and UV prey-capture regions to match available visual information in nature.
Unlike ballistic arm movements such as reaching, the contribution of depth information to the performance of manual tracking movements is unclear. Thus, to understand how the brain handles ...information, we investigated how a required movement along the depth axis would affect behavioral tracking performance, postulating that it would be affected by the amount of depth movement. We designed a visually guided planar tracking task that requires movement on three planes with different depths: a fronto-parallel plane called ROT (0), a sagittal plane called ROT (90), and a plane rotated by 45° with respect to the sagittal plane called ROT (45). Fifteen participants performed a circular manual tracking task under binocular and monocular visions in a three-dimensional (3D) virtual reality space. As a result, under binocular vision, ROT (90), which required the largest depth movement among the tasks, showed the greatest error in 3D. Similarly, the errors (deviation from the target path) on the depth axis revealed significant differences among the tasks. Under monocular vision, significant differences in errors were observed only on the lateral axis. Moreover, we observed that the errors in the lateral and depth axes were proportional to the required movement on these axes under binocular vision and confirmed that the required depth movement under binocular vision determined depth error independent of the other axes. This finding implies that the brain may independently process binocular vision information on each axis. Meanwhile, the required depth movement under monocular vision was independent of performance along the depth axis, indicating an intractable behavior. Our findings highlight the importance of handling depth movement, especially when a virtual reality situation, involving tracking tasks, is generated.
Aim
To describe the development and initial validation of a questionnaire measuring functional vision in dogs.
Methods
A 17‐item survey was designed to quantify functional vision in dogs. The Vision ...Impairment Score (VIS) was determined by summing responses to each question. Questions were assigned to one of five subcategories: overall vision, daily activities, peripheral vision, near vision, and distance vision. Content validity was established during development phases, and construct validity via comparing results of known groups (blind vs sighted; normal vs impaired vision; surgery to improve vision vs nonrestorative surgery), and through factor analysis. Concurrent criterion validity was determined with use of a validated health‐related quality‐of‐life (HRQL) assessment tool. Reliability and responsiveness assessments were investigated using intraclass correlation coefficient (ICC) and effect size (ES), respectively.
Results
Responses (221) from 201 dog owners were included. Compared to sighted dogs (n = 153), blind dogs (n = 48) had a higher VIS and greater impairment in all subcategories. Among sighted dogs, a higher VIS was obtained in dogs with low vision compared to those with normal vision (P < 0.001). A higher VIS was associated with poorer HRQL (P < 0.001). Perfect reliability was obtained for 6/17 questions, and excellent reliability for 11/17 questions (intraclass correlation 1.0 and >0.9, respectively), and the VIS was highly responsive to therapeutic intervention (effect size 1.46).
Conclusion
Results suggest the VIS may be clinically useful in assessing and obtaining a quantifiable measure of functional vision in dogs. Ongoing validation of the tool for clinical use is needed.
This paper investigates the capability of plain Vision Transformers (ViTs) for semantic segmentation using the encoder–decoder framework and introduce
SegViTv2
. In this study, we introduce a novel ...Attention-to-Mask (ATM) module to design a lightweight decoder effective for plain ViT. The proposed ATM converts the global attention map into semantic masks for high-quality segmentation results. Our decoder outperforms popular decoder UPerNet using various ViT backbones while consuming only about
5
%
of the computational cost. For the encoder, we address the concern of the relatively high computational cost in the ViT-based encoders and propose a
Shrunk
++ structure that incorporates edge-aware query-based down-sampling (EQD) and query-based up-sampling (QU) modules. The Shrunk++ structure reduces the computational cost of the encoder by up to
50
%
while maintaining competitive performance. Furthermore, we propose to adapt SegViT for continual semantic segmentation, demonstrating nearly zero forgetting of previously learned knowledge. Experiments show that our proposed SegViTv2 surpasses recent segmentation methods on three popular benchmarks including ADE20k, COCO-Stuff-10k and PASCAL-Context datasets. The code is available through the following link:
https://github.com/zbwxp/SegVit
.
Developing precise artificial retinas is crucial because they hold the potential to restore vision, improve visual prosthetics, and enhance computer vision systems. Emulating the luminance and ...contrast adaption features of the retina is essential to improve visual perception and efficiency to provide an environment realistic representation to the user. In this article, we introduce an artificial retina model that leverages its potent adaptation to luminance and contrast to enhance vision sensing and information processing. The model has the ability to achieve the realization of both tonic and phasic cells in the simplest manner. We have implemented the retina model using 0.18 <inline-formula><tex-math notation="LaTeX">\mu</tex-math></inline-formula>m process technology and validated the accuracy of the hardware implementation through circuit simulation that closely matches the software retina model. Additionally, we have characterized a single pixel fabricated using the same 0.18 <inline-formula><tex-math notation="LaTeX">\mu</tex-math></inline-formula>m process. This pixel demonstrates an 87.7-% ratio of variance with the temporal software model and operates with a power consumption of 369 nW.
We have compared two explanations for poor peripheral binding. Binding is the ability to assign the correct features (e.g., color, direction of motion, orientation) to objects. Wu, Kanai, and Shimojo ...(
Nature, 429
(6989), 262,
2004
) showed that subjects performed poorly on binding dot color with direction of motion in the periphery. Suzuki, Wolfe, Horowitz, and Noguchi (
Vision Research, 82
, 58–65,
2013
) similarly showed that subjects had trouble binding color with line orientation in the periphery. These authors concluded that performance in the periphery was poor because binding is poor in the periphery. However, both studies used red and green stimuli. We tested an alternative hypothesis, that poor peripheral binding is in part due to poor peripheral red/green color vision. Eccentricity-dependent changes in visual processing cause peripheral red/green vision to be worse than foveal vision. In contrast, blue/yellow vision remains centrifugally more stable. We tested 9 subjects in a replication and extension of Suzuki and colleagues’ line orientation judgment, in red and green, and in blue and yellow. There were three central conditions: (1) red (or blue) all horizontal, green (or yellow) all vertical; (2) red (or blue) all vertical, green (or yellow) all horizontal; or (3) random pairing of color and orientation. In both the red/green and the blue/yellow color schemes, peripheral performance was influenced by central line orientation, replicating Suzuki and colleagues. However, the effect with blue/yellow lines was smaller, indicating that poor peripheral “binding,” as hypothesized by both Wu and colleagues and Suzuki and colleagues, is due in part to their use of red and green stimuli.