Humans can naturally and effectively find salient regions in complex scenes. Motivated by this observation, attention mechanisms were introduced into computer vision with the aim of imitating this ...aspect of the human visual system. Such an attention mechanism can be regarded as a dynamic weight adjustment process based on features of the input image. Attention mechanisms have achieved great success in many visual tasks, including image classification, object detection, semantic segmentation, video understanding, image generation, 3D vision, multimodal tasks, and self-supervised learning. In this survey, we provide a comprehensive review of various attention mechanisms in computer vision and categorize them according to approach, such as channel attention, spatial attention, temporal attention, and branch attention; a related repository
https://github.com/MenghaoGuo/Awesome-Vision-Attentions
is dedicated to collecting related work. We also suggest future directions for attention mechanism research.
The irregular domain and lack of ordering make it challenging to design deep neural networks for point cloud processing. This paper presents a novel framework named
Point Cloud Transformer
(PCT) for ...point cloud learning. PCT is based on Transformer, which achieves huge success in natural language processing and displays great potential in image processing. It is inherently permutation invariant for processing a sequence of points, making it well-suited for point cloud learning. To better capture local context within the point cloud, we enhance input embedding with the support of farthest point sampling and nearest neighbor search. Extensive experiments demonstrate that the PCT achieves the state-of-the-art performance on shape classification, part segmentation, semantic segmentation, and normal estimation tasks.
Attention mechanisms, especially self-attention, have played an increasingly important role in deep feature representation for visual tasks. Self-attention updates the feature at each position by ...computing a weighted sum of features using pair-wise affinities across all positions to capture the long-range dependency within a single sample. However, self-attention has quadratic complexity and ignores potential correlation between different samples. This article proposes a novel attention mechanism which we call external attention , based on two external, small, learnable, shared memories, which can be implemented easily by simply using two cascaded linear layers and two normalization layers; it conveniently replaces self-attention in existing popular architectures. External attention has linear complexity and implicitly considers the correlations between all data samples. We further incorporate the multi-head mechanism into external attention to provide an all-MLP architecture, external attention MLP (EAMLP), for image classification. Extensive experiments on image classification, object detection, semantic segmentation, instance segmentation, image generation, and point cloud analysis reveal that our method provides results comparable or superior to the self-attention mechanism and some of its variants, with much lower computational and memory costs.
While originally designed for natural language processing tasks, the self-attention mechanism has recently taken various computer vision areas by storm. However, the 2D nature of images brings three ...challenges for applying self-attention in computer vision: (1) treating images as 1D sequences neglects their 2D structures; (2) the quadratic complexity is too expensive for high-resolution images; (3) it only captures spatial adaptability but ignores channel adaptability. In this paper, we propose a novel linear attention named large kernel attention (LKA) to enable self-adaptive and long-range correlations in self-attention while avoiding its shortcomings. Furthermore, we present a neural network based on LKA, namely Visual Attention Network (VAN). While extremely simple, VAN achieves comparable results with similar size convolutional neural networks (CNNs) and vision transformers (ViTs) in various tasks, including image classification, object detection, semantic segmentation, panoptic segmentation, pose estimation, etc. For example, VAN-B6 achieves 87.8% accuracy on ImageNet benchmark, and sets new state-of-the-art performance (58.2 PQ) for panoptic segmentation. Besides, VAN-B2 surpasses Swin-T 4 mIoU (50.1 vs. 46.1) for semantic segmentation on ADE20K benchmark, 2.6 AP (48.8 vs. 46.2) for object detection on COCO dataset. It provides a novel method and a simple yet strong baseline for the community. The code is available at
https://github.com/Visual-Attention-Network
.
This paper discusses a novel conceptual formulation of the fractional-order variational framework for retinex, which is a fractional-order partial differential equation (FPDE) formulation of retinex ...for the multi-scale nonlocal contrast enhancement with texture preserving. The well-known shortcomings of traditional integer-order computation-based contrast-enhancement algorithms, such as ringing artefacts and staircase effects, are still in great need of special research attention. Fractional calculus has potentially received prominence in applications in the domain of signal processing and image processing mainly because of its strengths like long-term memory, nonlocality, and weak singularity, and because of the ability of a fractional differential to enhance the complex textural details of an image in a nonlinear manner. Therefore, in an attempt to address the aforementioned problems associated with traditional integer-order computation-based contrast-enhancement algorithms, we have studied here, as an interesting theoretical problem, whether it will be possible to hybridize the capabilities of preserving the edges and the textural details of fractional calculus with texture image multi-scale nonlocal contrast enhancement. Motivated by this need, in this paper, we introduce a novel conceptual formulation of the fractional-order variational framework for retinex. First, we implement the FPDE by means of the fractional-order steepest descent method. Second, we discuss the implementation of the restrictive fractional-order optimization algorithm and the fractional-order Courant-Friedrichs-Lewy condition. Third, we perform experiments to analyze the capability of the FPDE to preserve edges and textural details, while enhancing the contrast. The capability of the FPDE to preserve edges and textural details is a fundamental important advantage, which makes our proposed algorithm superior to the traditional integer-order computation-based contrast enhancement algorithms, especially for images rich in textural details.
The emerging development of connected and automated vehicles imposes a significant challenge on current vehicle control and transportation systems. This paper proposes a novel unified approach, ...Parallel Driving, a cloud-based cyberphysical-social systems(CPSS) framework aiming at synergizing connected automated driving. This study first introduces the CPSS and ACP-based intelligent machine systems. Then the parallel driving is proposed in the cyber-physical-social space,considering interactions among vehicles, human drivers, and information. Within the framework, parallel testing, parallel learning and parallel reinforcement learning are developed and concisely reviewed. Development on intelligent horizon(iHorizon)and its applications are also presented towards parallel horizon.The proposed parallel driving offers an ample solution for achieving a smooth, safe and efficient cooperation among connected automated vehicles with different levels of automation in future road transportation systems.
Bioresorbable electronic stimulators are of rapidly growing interest as unusual therapeutic platforms, i.e., bioelectronic medicines, for treating disease states, accelerating wound healing processes ...and eliminating infections. Here, we present advanced materials that support operation in these systems over clinically relevant timeframes, ultimately bioresorbing harmlessly to benign products without residues, to eliminate the need for surgical extraction. Our findings overcome key challenges of bioresorbable electronic devices by realizing lifetimes that match clinical needs. The devices exploit a bioresorbable dynamic covalent polymer that facilitates tight bonding to itself and other surfaces, as a soft, elastic substrate and encapsulation coating for wireless electronic components. We describe the underlying features and chemical design considerations for this polymer, and the biocompatibility of its constituent materials. In devices with optimized, wireless designs, these polymers enable stable, long-lived operation as distal stimulators in a rat model of peripheral nerve injuries, thereby demonstrating the potential of programmable long-term electrical stimulation for maintaining muscle receptivity and enhancing functional recovery.
Excitons, bound pairs of electrons and holes, could act as an intermediary between electronic signal processing and optical transmission, thus speeding up the interconnection of photoelectric ...communication. However, up to date, exciton‐based logic devices such as switches that work at room temperature are still lacking. This work presents a prototype of a room‐temperature optoelectronic switch based on excitons in WSe2 monolayer. The emission intensity of WSe2 stacked on Au and SiO2 substrates exhibits completely opposite behaviors upon applying gate voltages. Such observation can be ascribed to different doping behaviors of WSe2 caused by charge‐transfer and chemical‐doping effect at WSe2/Au and WSe2/SiO2 interfaces, respectively, together with the charge‐drift effect. These interesting features can be utilized for optoelectronic switching, confirmed by the cyclic PL switching test for a long time exceeding 4000 s. This study offers a universal and reliable approach for the fabrication of exciton‐based optoelectronic switches, which would be essential in integrated nanophotonics.
This work presents a room‐temperature exciton‐based optoelectronic switch in WSe2 monolayer. The emission intensity of WSe2/Au and WSe2/SiO2 exhibits opposite behaviors upon voltage biasing, because of different dopings of WSe2 caused by charge‐transfer and chemical‐doping effect at WSe2/Au and WSe2/SiO2 interfaces, respectively. This study offers a universal and reliable approach to construct exciton‐based optoelectronic switch for integrated nanophotonics.
We present a learning-based approach to reconstructing high-resolution three-dimensional (3D) shapes with detailed geometry and high-fidelity textures. Albeit extensively studied, algorithms for 3D ...reconstruction from multi-view depth-and-color (RGB-D) scans are still prone to measurement noise and occlusions; limited scanning or capturing angles also often lead to incomplete reconstructions. Propelled by recent advances in 3D deep learning techniques, in this paper, we introduce a novel computation- and memory-efficient cascaded 3D convolutional network architecture, which learns to reconstruct implicit surface representations as well as the corresponding color information from noisy and imperfect RGB-D maps. The proposed 3D neural network performs reconstruction in a progressive and coarse-to-fine manner, achieving unprecedented output resolution and fidelity. Meanwhile, an algorithm for end-to-end training of the proposed cascaded structure is developed. We further introduce Human10 , a newly created dataset containing both detailed and textured full-body reconstructions as well as corresponding raw RGB-D scans of 10 subjects. Qualitative and quantitative experimental results on both synthetic and real-world datasets demonstrate that the presented approach outperforms existing state-of-the-art work regarding visual quality and accuracy of reconstructed models.
Phthalates are widely used in consumer products. People are frequently exposed to phthalates due to their applications in daily life. In this study, 14 phthalate metabolites were analyzed in 108 ...urine samples collected from Chinese young adults using high-performance liquid chromatography–tandem mass spectrometry. The total concentrations of 14 phthalate metabolites ranged from 71.3 to 2670ng/mL, with the geometric mean concentration of 306ng/mL. mBP and miBP were the two most abundant compounds, accounting for 48% of the total concentrations. Principal component analysis suggested two major sources of phthalates: one dominated by the DEHP metabolites and one by the group of mCPP, mBP and miBP metabolites. The estimated daily intakes of DMP, DEP, DBP, DiBP and DEHP were 1.68, 2.14, 4.12, 3.52 and 1.26–2.98μg/kg-bw/day, respectively. In a sensitivity analysis, urinary concentration and body weight were the most influential variables for human exposure estimation. Furthermore, cumulative risk for hazard quotient (HQ) and hazard index (HI) were evaluated. Nearly half of Chinese young adults had high HI values exceeding the safe threshold. This is the first study on the occurrence and human exposure to urinary phthalate metabolites with Chinese young adults.
Display omitted
•14 phthalate metabolites in urine were analyzed for Chinese young adults.•Unique profile of urinary phthalate metabolites was found.•Principal component analysis (PCA) suggested two major sources of phthalates.•Half of Chinese young adults had hazard index (HI) values exceeding the threshold.