Data-driven saliency has recently gained a lot of attention thanks to the use of convolutional neural networks for predicting gaze fixations. In this paper, we go beyond standard approaches to ...saliency prediction, in which gaze maps are computed with a feed-forward network, and present a novel model which can predict accurate saliency maps by incorporating neural attentive mechanisms. The core of our solution is a convolutional long short-term memory that focuses on the most salient regions of the input image to iteratively refine the predicted saliency map. In addition, to tackle the center bias typical of human eye fixations, our model can learn a set of prior maps generated with Gaussian functions. We show, through an extensive evaluation, that the proposed architecture outperforms the current state-of-the-art on public saliency prediction datasets. We further study the contribution of each key component to demonstrate their robustness on different scenarios.
Deep Visual Attention Prediction Wang, Wenguan; Shen, Jianbing
IEEE Transactions on Image Processing,
05/2018, Letnik:
27, Številka:
5
Journal Article
Recenzirano
Odprti dostop
In this paper, we aim to predict human eye fixation with view-free scenes based on an end-to-end deep learning architecture. Although convolutional neural networks (CNNs) have made substantial ...improvement on human attention prediction, it is still needed to improve the CNN-based attention models by efficiently leveraging multi-scale features. Our visual attention network is proposed to capture hierarchical saliency information from deep, coarse layers with global saliency information to shallow, fine layers with local saliency response. Our model is based on a skip-layer network structure, which predicts human attention from multiple convolutional layers with various reception fields. Final saliency prediction is achieved via the cooperation of those global and local predictions. Our model is learned in a deep supervision manner, where supervision is directly fed into multi-level layers, instead of previous approaches of providing supervision only at the output layer and propagating this supervision back to earlier layers. Our model thus incorporates multi-level saliency predictions within a single network, which significantly decreases the redundancy of previous approaches of learning multiple network streams with different input scales. Extensive experimental analysis on various challenging benchmark data sets demonstrate our method yields the state-of-the-art performance with competitive inference time.
The PCE of 15.83% is achieved for the optimized opaque PSCs with PM6:Y6 as active layer of 100 nm, resulting from the well-balanced photon harvesting and charge collection. The PCE and AVT of ...semitransparent PSCs can be simultaneously optimized by adjusting the thickness of Ag layer. The PCE of 12.37% and AVT of 18.6% are achieved for semitransparent PSCs with 10 nm thick Ag and 1 nm thick Au layers as electrode.
Display omitted
A series of opaque and semitransparent polymer solar cells (PSCs) were fabricated with PM6:Y6 as active layers, and 100 nm Al or 1 nm Au/(20, 15, 10 nm) Ag layer as electrode, respectively. The power conversion efficiency (PCE) of opaque PSCs arrives to 15.83% based on the optimized active layer with a thickness of 100 nm, resulting from the well-balanced photon harvesting and charge collection. Meanwhile, the 100 nm PM6:Y6 blend film exhibits a 50.5% average visible transmittance (AVT), which has great potential in preparing efficient semitransparent PSCs. The semitransparent electrodes were fabricated with 1 nm Au and different thick Ag layers, exhibiting a relatively high transmittance in visible light range and relatively low transmittance in near infrared range. The PCE and AVT of the semitransparent PSCs can be adjusted from 14.20% to 12.37% and from 8.9% to 18.6% along with Ag layer thickness decreasing from 20 to 10 nm, respectively, which are impressive values among the reported semitransparent PSCs.
Metalenses can potentially reduce the size and complexity of existing cameras, displays, and other optical devices, owing to their capability of flexible manipulation of the polarization, amplitude, ...and phase of light. However, a high meta-atom aspect ratio is still a drawback as it causes difficulty in fabrication of metalens. In this paper, we present the first demonstration of a human-eye inspired metalens with a much lower and constant meta-atom aspect ratio while maintaining the polarization under an arbitrarily polarized excitation in the near-infrared waveband.
•A human eye mimetic metalens has been designed that uses a meta-atom made of concentric cylinders of alternating materials of different refractive index.•The designed metalens demonstrates much lower and constant meta-atom aspect ratio under an arbitrarily polarized excitation in the near-infrared waveband.•The designed metalens exhibits a focusing efficiency of 52% with a high numerical aperture of 0.43.
360° media allows observers to explore the scene in all directions. The consequence is that the human visual attention is guided by not only the perceived area in the viewport but also the overall ...content in 360°. In this paper, we propose a method to estimate the 360° saliency map which extracts salient features from the entire 360° image in each viewport in three different Field of Views (FoVs). Our model is first pretrained with a large scale 2D image dataset to enable the interpretation of semantic contents, then fine-tuned with a relative small 360° image dataset. A novel weighting loss function attached with stretch weighted maps is introduced to adaptively weight the losses of three evaluation metrics and attenuate the impact of stretched regions in equirectangular projection during training process. Experimental results demonstrate that our model achieves better performance with the integration of three FoVs and its diverse viewport images. Results also show that the adaptive weighting losses and stretch weighted maps effectively enhance the evaluation scores compared to the fixed weighting losses solutions. Comparing to other state of the art models, our method surpasses them on three different datasets and ranks the top using 5 performance evaluation metrics on the Salient360! benchmark set.
Existing methods for simulating human visual attention primarily focus on 2D displays and limited research has been conducted on predicting visual attention in three-dimensional (3D) light field ...content. 3D light field displays provide a heightened sense of stereoscopic realism to viewers. To ensure that the content of the 3D light field display appears more consistent with human visual characteristics, we proposed a novel method for predicting human eye fixation in 3D light field display images. Firstly, we collected real eye movement data and utilized it to create an eye movement dataset based on 3D light field display images. This solves the problem of missing datasets in the field of human gaze based on three-dimensional light field images. Then, we proposed a convolutional neural network model with multiple inputs and outputs, integrating attention modules. This model was trained and used to predict eye fixation within the constructed eye movement dataset. A correlation exists between predicted human gaze of multiple distinct views of same light field image. Finally, we predicted the human gaze area of light field multi-view images based on our model. Experimental results demonstrate that our model accurately predicts human gaze regions across different views of a 3D light field image. The human gaze predicted by the model on each view is basically consistent and relatively accurate. By leveraging proposed method, we can effectively anticipate where viewers will focus their attention on the 3D light field display, which is beneficial for targeted improvement of 3D light field display content.
•A convolutional network model suitable for predicting 3D light field images is proposed.•The model has a multi-input multi-output structure and a fused attention module, which improves prediction accuracy.•An eye movement dataset based on 3D light field device is constructed.•Design a data annotation method based on light field distribution to ensure the accuracy of the dataset.
Visual saliency has been an increasingly active research area in the last ten years with dozens of saliency models recently published. Nowadays, one of the big challenges in the field is to find a ...way to fairly evaluate all of these models. In this paper, on human eye fixations, we compare the ranking of 12 state-of-the art saliency models using 12 similarity metrics. The comparison is done on Jian Li's database containing several hundreds of natural images. Based on Kendall concordance coefficient, it is shown that some of the metrics are strongly correlated leading to a redundancy in the performance metrics reported in the available benchmarks. On the other hand, other metrics provide a more diverse picture of models' overall performance. As a recommendation, three similarity metrics should be used to obtain a complete point of view of saliency model performance.
In lens, αβγ-crystallins accounting for ∼90% of ocular proteins with concentrations >400 mg/ml need to remain soluble for the whole life-span and their aggregation can lead to cataract. Mysteriously, ...despite being a metabolically-quiescent organ, lens maintains ATP concentrations of 3–7 mM. Very recently, ATP was proposed to hydrotropically prevent aggregation of crystallins but the mechanism remains unexplored. Here by NMR, DLS and DSF, we characterized the association, thermal stability and conformation of the 178-residue human γS-crystallin at concentrations from 2 to 100 mg/ml in the absence and in the presence of ATP. Results together reveal for the first time that ATP does antagonize the crowding-induced destabilization, although it has no significant binding to γS-crystallin as well as no alteration of its conformation. Therefore, ATP prevents aggregation in lens by a novel mechanism, thus rationalizing the fact that declining concentrations of ATP upon being aged is related to age-related cataractogenesis. To restore the normal concentrations of ATP in lens may represent a promising therapeutic strategy to treat aggregation-causing eye diseases.
Display omitted
•Eye lens is extremely crowded with crystallins at concentrations of ∼400 mg/ml.•Mysteriously, metabolically-quiescent lens maintains ATP concentrations of 3–7 mM.•ATP was very recently proposed to hydrotropically prevent aggregation of crystallins.•We reveal that ATP antagonize crowding-induced destabilization by a novel mechanism.•Therefore decrease of ATP concentrations may underlie age-related cataractogenesis.
We propose the gradient-weighted Object Detector Activation Maps (ODAM), a visual explanation technique for interpreting the predictions of object detectors. Utilizing the gradients of detector ...targets flowing into the intermediate feature maps, ODAM produces heat maps that show the influence of regions on the detector's decision for each predicted attribute. Compared to previous works on classification activation maps (CAM), ODAM generates instance-specific explanations rather than class-specific ones. We show that ODAM is applicable to one-stage, two-stage, and transformer-based detectors with different types of detector backbones and heads, and produces higher-quality visual explanations than the state-of-the-art in terms of both effectiveness and efficiency. We discuss two explanation tasks for object detection: 1) object specification: what is the important region for the prediction? 2) object discrimination: which object is detected? Aiming at these two aspects, we present a detailed analysis of the visual explanations of detectors and carry out extensive experiments to validate the effectiveness of the proposed ODAM. Furthermore, we investigate user trust on the explanation maps, how well the visual explanations of object detectors agrees with human explanations, as measured through human eye gaze, and whether this agreement is related with user trust. Finally, we also propose two applications, ODAM-KD and ODAM-NMS, based on these two abilities of ODAM. ODAM-KD utilizes the object specification of ODAM to generate top-down attention for key predictions and instruct the knowledge distillation of object detection. ODAM-NMS considers the location of the model's explanation for each prediction to distinguish the duplicate detected objects. A training scheme, ODAM-Train, is proposed to improve the quality on object discrimination, and help with ODAM-NMS. The code of ODAM is available: https://github.com/Cyang-Zhao/ODAM .
Background: Monitoring of treatment parameters is of utmost importance in current medicine. The use of intraocular media and irrigation fluid temperature monitoring during vitreoretinal surgery is, ...however, not common in current clinical practice. Purpose: To investigate intraocular temperature changes at various time points of vitreoretinal surgery. Materials and Methods: Twenty patients (20 eyes) who underwent vitrectomy with room-temperature (24.2±0.52 ˚С) irrigation solutions at an ambient temperature of 24.4±0.51 °С were under observation. Temperatures in the anterior, mid- and posterior vitreous were recorded before and immediately after vitrectomy and after performance of additional surgical manipulations. Results: The presence of a transvitreous temperature gradient from the anterior toward the posterior vitreous of the human eye was confirmed. At baseline, the highest temperature (34.17±0.36 °С) was recorded in the posterior vitreous. There were significant decreases in temperatures in vitreous compartments immediately after vitrectomy, with the lowest temperature of 30.1±0.45 °С recorded in the anterior vitreous, and the greatest temperature decrease (3.8±0.59 ˚С) compared to baseline, in the preretinal posterior vitreous. There were increases in temperatures in vitreous compartments after additional surgical manipulations and, with an increase in the duration of these manipulations, temperature in the vitreous increased at an average rate of 0.18°C per minute. Conclusion: Vitreoretinal surgery is commonly performed under conditions of artificially induced local hypothermia, warranting intraoperative monitoring of intraocular and irrigation fluid temperatures.