In this paper, we propose the first framework (UCNet) to employ uncertainty for RGB-D saliency detection by learning from the data labeling process. Existing RGB-D saliency detection methods treat ...the saliency detection task as a point estimation problem, and produce a single saliency map following a deterministic learning pipeline. Inspired by the saliency data labeling process, we propose probabilistic RGB-D saliency detection network via conditional variational autoencoders to model human annotation uncertainty and generate multiple saliency maps for each input image by sampling in the latent space. With the proposed saliency consensus process, we are able to generate an accurate saliency map based on these multiple predictions. Quantitative and qualitative evaluations on six challenging benchmark datasets against 18 competing algorithms demonstrate the effectiveness of our approach in learning the distribution of saliency maps, leading to a new state-of-the-art in RGB-D saliency detection.
Densely Residual Laplacian Super-Resolution Anwar, Saeed; Barnes, Nick
IEEE transactions on pattern analysis and machine intelligence,
2022-March-1, 2022-Mar, 2022-3-1, 20220301, Volume:
44, Issue:
3
Journal Article
Peer reviewed
Open access
Super-Resolution convolutional neural networks have recently demonstrated high-quality restoration for single images. However, existing algorithms often require very deep architectures and long ...training times. Furthermore, current convolutional neural networks for super-resolution are unable to exploit features at multiple scales and weigh them equally or at only static scale only, limiting their learning capability. In this exposition, we present a compact and accurate super-resolution algorithm, namely, densely residual laplacian network (DRLN). The proposed network employs cascading residual on the residual structure to allow the flow of low-frequency information to focus on learning high and mid-level features. In addition, deep supervision is achieved via the densely concatenated residual blocks settings, which also helps in learning from high-level complex features. Moreover, we propose Laplacian attention to model the crucial features to learn the inter and intra-level dependencies between the feature maps. Furthermore, comprehensive quantitative and qualitative evaluations on low-resolution, noisy low-resolution, and real historical image benchmark datasets illustrate that our DRLN algorithm performs favorably against the state-of-the-art methods visually and accurately.
Why does this matter to science? Because to turn raw data into published research papers often requires a little programming, which means that most scientists write software. ... you scientists ...generally think the code you write is poor. Yet, software in all trades is written to be good enough for the job intended. See NewS Feature P. 775
Deep convolutional neural networks perform better on images containing spatially invariant noise (synthetic noise); however, its performance is limited on real-noisy photographs and requires multiple ...stage network modeling. To advance the practicability of the denoising algorithms, this paper proposes a novel single-stage blind real image denoising network (RIDNet) by employing a modular architecture. We use residual on the residual structure to ease the flow of low-frequency information and apply feature attention to exploit the channel dependencies. Furthermore, the evaluation in terms of quantitative metrics and visual quality on three synthetic and four real noisy datasets against 19 state-of-the-art algorithms demonstrate the superiority of our RIDNet.
Previous studies and community information about everyday difficulties in age-related macular degeneration (AMD) have focussed on domains such as reading and driving. Here, we provide the first ...in-depth examination of how impaired face perception impacts social interactions and quality of life in AMD. We also develop a Faces and Social Life in AMD brochure and information sheet, plus accompanying conversation starter, aimed at AMD patients and those who interact with them (family, friends, nursing home staff).
Semi-structured face-to-face interviews were conducted with 21 AMD patients covering the full range from mild vision loss to legally blind. Thematic analysis was used to explore the range of patient experiences.
Patients reported faces appeared blurred and/or distorted. They described recurrent failures to recognise others' identity, facial expressions and emotional states, plus failures of alternative non-face strategies (e.g., hairstyle, voice). They reported failures to follow social nuances (e.g., to pick up that someone was joking), and feelings of missing out ('I can't join in'). Concern about offending others (e.g., by unintentionally ignoring them) was common, as were concerns of appearing fraudulent ('Other people don't understand'). Many reported social disengagement. Many reported specifically face-perception-related reductions in social life, confidence, and quality of life. All effects were observed even with only mild vision loss. Patients endorsed the value of our Faces and Social Life in AMD Information Sheet, developed from the interview results, and supported future technological assistance (digital image enhancement).
Poor face perception in AMD is an important domain contributing to impaired social interactions and quality of life. This domain should be directly assessed in quantitative quality of life measures, and in resources designed to improve community understanding. The identity-related social difficulties mirror those in prosopagnosia, of cortical rather than retinal origin, implying findings may generalise to all low-vision disorders.
Previous behavioural studies demonstrate that face caricaturing can provide an effective image enhancement method for improving poor face identity perception in low vision simulations (e.g., ...age-related macular degeneration, bionic eye). To translate caricaturing usefully to patients, assignment of the multiple face landmark points needed to produce the caricatures needs to be fully automatised. Recent development in computer science allows automatic face landmark detection of 68 points in real time and in multiple viewpoints. However, previous demonstrations of the behavioural effectiveness of caricaturing have used higher-precision caricatures with 147 landmark points per face, assigned by hand. Here, we test the effectiveness of the auto-assigned 68-point caricatures. We also compare this to the hand-assigned 147-point caricatures.
We assessed human perception of how different in identity pairs of faces appear, when veridical (uncaricatured), caricatured with 68-points, and caricatured with 147-points. Across two experiments, we tested two types of low-vision images: a simulation of blur, as experienced in macular degeneration (testing two blur levels); and a simulation of the phosphenised images seen in prosthetic vision (at three resolutions).
The 68-point caricatures produced significant improvements in identity discrimination relative to veridical. They were approximately 50% as effective as the 147-point caricatures.
Realistic translation to patients (e.g., via real time caricaturing with the enhanced signal sent to smart glasses or visual prosthetic) is approaching feasibility. For maximum effectiveness software needs to be able to assign landmark points tracing out all details of feature and face shape, to produce high-precision caricatures.
A Deep Journey into Super-resolution Anwar, Saeed; Khan, Salman; Barnes, Nick
ACM computing surveys,
06/2020, Volume:
53, Issue:
3
Journal Article
Peer reviewed
Deep convolutional networks–based super-resolution is a fast-growing field with numerous practical applications. In this exposition, we extensively compare more than 30 state-of-the-art ...super-resolution Convolutional Neural Networks (CNNs) over three classical and three recently introduced challenging datasets to benchmark single image super-resolution. We introduce a taxonomy for deep learning–based super-resolution networks that groups existing methods into nine categories including linear, residual, multi-branch, recursive, progressive, attention-based, and adversarial designs. We also provide comparisons between the models in terms of network complexity, memory footprint, model input and output, learning details, the type of network losses, and important architectural differences (e.g., depth, skip-connections, filters). The extensive evaluation performed shows the consistent and rapid growth in the accuracy in the past few years along with a corresponding boost in model complexity and the availability of large-scale datasets. It is also observed that the pioneering methods identified as the benchmarks have been significantly outperformed by the current contenders. Despite the progress in recent years, we identify several shortcomings of existing techniques and provide future research directions towards the solution of these open problems. Datasets and codes for evaluation are publicly available at https://github.com/saeed-anwar/SRsurvey.
Retinal prostheses provide vision to blind patients by eliciting phosphenes through electrical stimulation. This study explored whether character identification and image localization could be ...achieved through direct multiple-electrode stimulation with a suprachoroidal retinal prosthesis.
Two of three retinitis pigmentosa patients implanted with a suprachoroidal electrode array were tested on three psychophysical tasks. Electrode patterns were stimulated to elicit perception of simple characters, following which percept localization was tested using either static or dynamic images. Eye tracking was used to assess the association between accuracy and eye movements.
In the character identification task, accuracy ranged from 2.7% to 93.3%, depending on the patient and character. In the static image localization task, accuracy decreased from near perfect to <20% with decreasing contrast (patient 1). Patient 2 scored up to 70% at 100% contrast. In the dynamic image localization task, patient 1 recognized the trajectory of the image up to speeds of 64 deg/s, whereas patient 2 scored just above chance. The degree of eye movement in both patients was related to accuracy and, to some extent, stimulus direction.
The ability to identify characters and localize percepts demonstrates the capacity of the suprachoroidal device to provide meaningful information to blind patients. The variation in scores across all tasks highlights the importance of using spatial cues from phosphenes, which becomes more difficult at low contrast. The use of spatial information from multiple electrodes and eye-movement compensation is expected to improve performance outcomes during real-world prosthesis use in a camera-based system. (ClinicalTrials.gov number, NCT01603576.).
We evaluated a novel visual representation for current and near-term prosthetic vision. Augmented depth emphasizes ground obstacles and floor-wall boundaries in a depth-based visual representation. ...This is achieved by artificially increasing contrast between obstacles and the ground surface via a novel ground plane extraction algorithm specifically designed to preserve low-contrast ground-surface boundaries.
The effectiveness of augmented depth was examined in human mobility trials compared against standard intensity-based (Intensity), depth-based (Depth) and random (Random) visual representations. Eight participants with normal vision used simulated prosthetic vision with 20 phosphenes and eight perceivable brightness levels to traverse a course with randomly placed small and low-contrast obstacles on the ground.
The number of collisions was significantly reduced using augmented depth, compared with intensity, depth and random representations (48%, 44% and 72% less collisions, respectively).
These results indicate that augmented depth may enable safe mobility in the presence of low-contrast obstacles with current and near-term implants. This is the first demonstration that an augmentation of the scene ensuring key objects are visible may provide better outcomes for prosthetic vision.
As the basic task of point cloud analysis, classification is fundamental but always challenging. To address some unsolved problems of existing methods, we propose a network that captures geometric ...features of point clouds for better representations. To achieve this, on the one hand, we enrich the geometric information of points in low-level 3D space explicitly. On the other hand, we apply CNN-based structures in high-level feature spaces to learn local geometric context implicitly. Specifically, we leverage an idea of error-correcting feedback structure to capture the local features of point clouds comprehensively. Furthermore, an attention module based on channel affinity assists the feature map to avoid possible redundancy by emphasizing its distinct channels. The performance on both synthetic and real-world point clouds datasets demonstrate the superiority and applicability of our network. Comparing with other state-of-the-art methods, our approach balances accuracy and efficiency.