In this letter, we propose a road structure refined convolutional neural network (RSRCNN) approach for road extraction in aerial images. In order to obtain structured output of road extraction, both ...deconvolutional and fusion layers are designed in the architecture of RSRCNN. For training RSRCNN, a new loss function is proposed to incorporate the geometric information of road structure in cross-entropy loss, thus called road-structure-based loss function. Experimental results demonstrate that the trained RSRCNN model is able to advance the state-of-the-art road extraction for aerial images, in terms of precision, recall, F-score, and accuracy.
High efficiency video coding (HEVC) significantly reduces bit rates over the preceding H.264 standard but at the expense of extremely high encoding complexity. In HEVC, the quad-tree partition of the ...coding unit (CU) consumes a large proportion of the HEVC encoding complexity, due to the brute-force search for rate-distortion optimization (RDO). Therefore, this paper proposes a deep learning approach to predict the CU partition for reducing the HEVC complexity at both intra-and inter-modes, which is based on convolutional neural network (CNN) and long- and short-term memory (LSTM) network. First, we establish a large-scale database including substantial CU partition data for the HEVC intra- and inter-modes. This enables deep learning on the CU partition. Second, we represent the CU partition of an entire coding tree unit in the form of a hierarchical CU partition map (HCPM). Then, we propose an early terminated hierarchical CNN (ETH-CNN) for learning to predict the HCPM. Consequently, the encoding complexity of intra-mode HEVC can be drastically reduced by replacing the brute-force search with ETH-CNN to decide the CU partition. Third, an ETH-LSTM is proposed to learn the temporal correlation of the CU partition. Then, we combine the ETH-LSTM and the ETH-CNN to predict the CU partition for reducing the HEVC complexity at inter-mode. Finally, experimental results show that our approach outperforms the other state-of-the-art approaches in reducing the HEVC complexity at both intra- and inter-modes.
Versatile Video Coding (VVC), as the latest standard, significantly improves the coding efficiency over its predecessor standard High Efficiency Video Coding (HEVC), but at the expense of sharply ...increased complexity. In VVC, the quad-tree plus multi-type tree (QTMT) structure of the coding unit (CU) partition accounts for over 97% of the encoding time, due to the brute-force search for recursive rate-distortion (RD) optimization. Instead of the brute-force QTMT search, this paper proposes a deep learning approach to predict the QTMT-based CU partition, for drastically accelerating the encoding process of intra-mode VVC. First, we establish a large-scale database containing sufficient CU partition patterns with diverse video content, which can facilitate the data-driven VVC complexity reduction. Next, we propose a multi-stage exit CNN (MSE-CNN) model with an early-exit mechanism to determine the CU partition, in accord with the flexible QTMT structure at multiple stages. Then, we design an adaptive loss function for training the MSE-CNN model, synthesizing both the uncertain number of split modes and the target on minimized RD cost. Finally, a multi-threshold decision scheme is developed, achieving a desirable trade-off between complexity and RD performance. The experimental results demonstrate that our approach can reduce the encoding time of VVC by 44.65%~66.88% with a negligible Bjøntegaard delta bit-rate (BD-BR) of 1.322%~3.188%, significantly outperforming other state-of-the-art approaches.
The past few years have witnessed great success in applying deep learning to enhance the quality of compressed image/video. The existing approaches mainly focus on enhancing the quality of a single ...frame, not considering the similarity between consecutive frames. Since heavy fluctuation exists across compressed video frames as investigated in this paper, frame similarity can be utilized for quality enhancement of low-quality frames given their neighboring high-quality frames. This task is Multi-Frame Quality Enhancement (MFQE). Accordingly, this paper proposes an MFQE approach for compressed video, as the first attempt in this direction. In our approach, we first develop a Bidirectional Long Short-Term Memory (BiLSTM) based detector to locate Peak Quality Frames (PQFs) in compressed video. Then, a novel Multi-Frame Convolutional Neural Network (MF-CNN) is designed to enhance the quality of compressed video, in which the non-PQF and its nearest two PQFs are the input. In MF-CNN, motion between the non-PQF and PQFs is compensated by a motion compensation subnet. Subsequently, a quality enhancement subnet fuses the non-PQF and compensated PQFs, and then reduces the compression artifacts of the non-PQF. Also, PQF quality is enhanced in the same way. Finally, experiments validate the effectiveness and generalization ability of our MFQE approach in advancing the state-of-the-art quality enhancement of compressed video.
An extensive study on the in-loop filter has been proposed for a high efficiency video coding (HEVC) standard to reduce compression artifacts, thus improving coding efficiency. However, in the ...existing approaches, the in-loop filter is always applied to each single frame, without exploiting the content correlation among multiple frames. In this paper, we propose a multi-frame in-loop filter (MIF) for HEVC, which enhances the visual quality of each encoded frame by leveraging its adjacent frames. Specifically, we first construct a large-scale database containing encoded frames and their corresponding raw frames of a variety of content, which can be used to learn the in-loop filter in HEVC. Furthermore, we find that there usually exist a number of reference frames of higher quality and of similar content for an encoded frame. Accordingly, a reference frame selector (RFS) is designed to identify these frames. Then, a deep neural network for MIF (known as MIF-Net) is developed to enhance the quality of each encoded frame by utilizing the spatial information of this frame and the temporal information of its neighboring higher-quality frames. The MIF-Net is built on the recently developed DenseNet, benefiting from its improved generalization capacity and computational efficiency. In addition, a novel block-adaptive convolutional layer is designed and applied in the MIF-Net, for handling the artifacts influenced by coding tree unit (CTU) structure in HEVC. Extensive experiments show that our MIF approach achieves on average 11.621% saving of the Bjøntegaard delta bit-rate (BD-BR) on the standard test set, significantly outperforming the standard in-loop filter in HEVC and other state-of-the-art approaches.
Nowadays, 360° video/image has been increasingly popular and drawn great attention. The spherical viewing range of 360° video/image accounts for huge data, which pose the challenges to 360° ...video/image processing in solving the bottleneck of storage, transmission, etc. Accordingly, the recent years have witnessed the explosive emergence of works on 360° video/image processing. In this article, we review the state-of-the-art works on 360° video/image processing from the aspects of perception, assessment and compression. First, this article reviews both datasets and visual attention modelling approaches for 360° video/image. Second, we survey the related works on both subjective and objective visual quality assessment (VQA) of 360° video/image. Third, we overview the compression approaches for 360° video/image, which either utilize the spherical characteristics or visual attention models. Finally, we summarize this overview article and outlook the future research trends on 360° video/image processing.
Glaucoma is one of the leading causes of irreversible vision loss. Many approaches have recently been proposed for automatic glaucoma detection based on fundus images. However, none of the existing ...approaches can efficiently remove high redundancy in fundus images for glaucoma detection, which may reduce the reliability and accuracy of glaucoma detection. To avoid this disadvantage, this paper proposes an attention-based convolutional neural network (CNN) for glaucoma detection, called AG-CNN. Specifically, we first establish a large-scale attention-based glaucoma (LAG) database, which includes 11 760 fundus images labeled as either positive glaucoma (4878) or negative glaucoma (6882). Among the 11 760 fundus images, the attention maps of 5824 images are further obtained from ophthalmologists through a simulated eye-tracking experiment. Then, a new structure of AG-CNN is designed, including an attention prediction subnet, a pathological area localization subnet, and a glaucoma classification subnet. The attention maps are predicted in the attention prediction subnet to highlight the salient regions for glaucoma detection, under a weakly supervised training manner. In contrast to other attention-based CNN methods, the features are also visualized as the localized pathological area, which are further added in our AG-CNN structure to enhance the glaucoma detection performance. Finally, the experiment results from testing over our LAG database and another public glaucoma database show that the proposed AG-CNN approach significantly advances the state-of-the-art in glaucoma detection.
High efficiency video coding (HEVC) is the latest video coding standard, and it has the best performance among all the existing standards. HEVC main still picture profile (HEVC-MSP) also achieves top ...performance in image compr-ession. In this paper, we propose a closed-form bit allocation approach to optimize the saliency-guided PSNR (viewed as perceptual distortion) such that the coding efficiency of HEVC-based image compression can be significantly improved from a subjective perspective. Specifically, a bit allocation formulation is established to minimize perceptual distortion with a constraint on bit-rates. Then, this formulation is solved using the proposed recursive Taylor expansion method with a closed-form solution. On the basis of our solution, a bit allocation and re-allocation process is developed in our approach to minimize perceptual distortion, meanwhile accurately controlling bit-rates. In addition, we provide both theoretical and numerical analyses of the computational complexity, verifying the little extra time cost of our approach. The experimental results demonstrate the superior performance of our approach over the state-of-the-art HEVC-MSP, and the BD-rate savings are approximately 40% and 24% for face and generic images, respectively.
This work presents a facile and efficient one-pot melting-assisted and solvent-free method to prepare low-cost N-rich polymer SFRH. Furthermore, the polymer derived N-enriched porous carbon could be ...used for highly selective mixed-gas separation, CO2 storage and PSA.
Display omitted
•Facile one-pot and solvent-free method to preparation.•N-doped porous carbons.•Outstanding gas mixture selectivity.•Excellent pressure/vacuum swing adsorption (P/VSA) working capacity.•10-fold scaled up production.
A facile one-pot melting-assisted and solvent-free method was successfully developed for the first time for preparing nitrogen-containing polymers. Followed by activation at temperatures ranging from 600 to 800 °C led to the formation of N-rich microporous carbons possessing narrow pore size distribution (ca. 0.5–3 nm), high specific surface area (ca. 1021.4–3657.0 m2 g−1), large pore volume (ca. 0.43–2.00 cm3 g−1) and high nitrogen content (ca. up to 5.11 wt%). Particularly, the porous carbons exhibited outstanding CO2 adsorption capacity of 2.65 and 7.38 mmol g−1 at 273 K and 0.15 and 1 bar, respectively; meanwhile, it also exhibited extremely large CO2 storage capacity of 22.06 mmol g−1 at 298 K and 20 bar. Moreover, the outstanding CO2/N2, CO2/CH4 and CH4/N2 selectivity up to 36.5, 6.9 and 5.1 at 298 K and 1 bar were achieved. The determinant factors on CO2 capture at 0.15, 1 and 20 bar were carefully investigated. Furthermore, this method could be 10-fold scaled up to produce almost identical high-performance carbons. For real-world applications, pressure/vacuum swing adsorption (P/VSA) working capacity, gas-mixture transit breakthrough experiment, and recycle feasibility are evaluated. Thus, these novel materials are promising candidates for CO2 capture from dilute gas mixtures.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
10.
Learning to Detect Video Saliency With HEVC Features Xu, Mai; Jiang, Lai; Sun, Xiaoyan ...
IEEE transactions on image processing,
2017-Jan., 2017-Jan, 2017-1-00, 20170101, Volume:
26, Issue:
1
Journal Article
Peer reviewed
Saliency detection has been widely studied to predict human fixations, with various applications in computer vision and image processing. For saliency detection, we argue in this paper that the ...state-of-the-art High Efficiency Video Coding (HEVC) standard can be used to generate the useful features in compressed domain. Therefore, this paper proposes to learn the video saliency model, with regard to HEVC features. First, we establish an eye tracking database for video saliency detection, which can be downloaded from https://github.com/remega/video_database. Through the statistical analysis on our eye tracking database, we find out that human fixations tend to fall into the regions with large-valued HEVC features on splitting depth, bit allocation, and motion vector (MV). In addition, three observations are obtained with the further analysis on our eye tracking database. Accordingly, several features in HEVC domain are proposed on the basis of splitting depth, bit allocation, and MV. Next, a kind of support vector machine is learned to integrate those HEVC features together, for video saliency detection. Since almost all video data are stored in the compressed form, our method is able to avoid both the computational cost on decoding and the storage cost on raw data. More importantly, experimental results show that the proposed method is superior to other state-of-the-art saliency detection methods, either in compressed or uncompressed domain.