Liver cancer is one of the leading causes of cancer death. To assist doctors in hepatocellular carcinoma diagnosis and treatment planning, an accurate and automatic liver and tumor segmentation ...method is highly demanded in the clinical practice. Recently, fully convolutional neural networks (FCNs), including 2-D and 3-D FCNs, serve as the backbone in many volumetric image segmentation. However, 2-D convolutions cannot fully leverage the spatial information along the third dimension while 3-D convolutions suffer from high computational cost and GPU memory consumption. To address these issues, we propose a novel hybrid densely connected UNet (H-DenseUNet), which consists of a 2-D DenseUNet for efficiently extracting intra-slice features and a 3-D counterpart for hierarchically aggregating volumetric contexts under the spirit of the auto-context algorithm for liver and tumor segmentation. We formulate the learning process of the H-DenseUNet in an end-to-end manner, where the intra-slice representations and inter-slice features can be jointly optimized through a hybrid feature fusion layer. We extensively evaluated our method on the data set of the MICCAI 2017 Liver Tumor Segmentation Challenge and 3DIRCADb data set. Our method outperformed other state-of-the-arts on the segmentation results of tumors and achieved very competitive performance for liver segmentation even with a single model.
We propose an analysis of surgical videos that is based on a novel recurrent convolutional network (SV-RCNet), specifically for automatic workflow recognition from surgical videos online, which is a ...key component for developing the context-aware computer-assisted intervention systems. Different from previous methods which harness visual and temporal information separately, the proposed SV-RCNet seamlessly integrates a convolutional neural network (CNN) and a recurrent neural network (RNN) to form a novel recurrent convolutional architecture in order to take full advantages of the complementary information of visual and temporal features learned from surgical videos. We effectively train the SV-RCNet in an end-to-end manner so that the visual representations and sequential dynamics can be jointly optimized in the learning process. In order to produce more discriminative spatio-temporal features, we exploit a deep residual network (ResNet) and a long short term memory (LSTM) network, to extract visual features and temporal dependencies, respectively, and integrate them into the SV-RCNet. Moreover, based on the phase transition-sensitive predictions from the SV-RCNet, we propose a simple yet effective inference scheme, namely the prior knowledge inference (PKI), by leveraging the natural characteristic of surgical video. Such a strategy further improves the consistency of results and largely boosts the recognition performance. Extensive experiments have been conducted with the MICCAI 2016 Modeling and Monitoring of Computer Assisted Interventions Workflow Challenge dataset and Cholec80 dataset to validate SV-RCNet. Our approach not only achieves superior performance on these two datasets but also outperforms the state-of-the-art methods by a significant margin.
Shadow detection and shadow removal are fundamental and challenging tasks, requiring an understanding of the global image semantics. This paper presents a novel deep neural network design for shadow ...detection and removal by analyzing the spatial image context in a direction-aware manner. To achieve this, we first formulate the direction-aware attention mechanism in a spatial recurrent neural network (RNN) by introducing attention weights when aggregating spatial context features in the RNN. By learning these weights through training, we can recover direction-aware spatial context (DSC) for detecting and removing shadows. This design is developed into the DSC module and embedded in a convolutional neural network (CNN) to learn the DSC features at different levels. Moreover, we design a weighted cross entropy loss to make effective the training for shadow detection and further adopt the network for shadow removal by using a euclidean loss function and formulating a color transfer function to address the color and luminosity inconsistencies in the training pairs. We employed two shadow detection benchmark datasets and two shadow removal benchmark datasets, and performed various experiments to evaluate our method. Experimental results show that our method performs favorably against the state-of-the-art methods for both shadow detection and shadow removal.
Shadow detection in general photos is a nontrivial problem, due to the complexity of the real world. Though recent shadow detectors have already achieved remarkable performance on various benchmark ...data, their performance is still limited for general real-world situations. In this work, we collected shadow images for multiple scenarios and compiled a new dataset of 10,500 shadow images, each with labeled ground-truth mask, for supporting shadow detection in the complex world. Our dataset covers a rich variety of scene categories, with diverse shadow sizes, locations, contrasts, and types. Further, we comprehensively analyze the complexity of the dataset, present a fast shadow detection network with a detail enhancement module to harvest shadow details, and demonstrate the effectiveness of our method to detect shadows in general situations.
This paper presents a novel deep learning model to aggregate the attentional dilated features for salient object detection by exploring the complementary information between the global and local ...context in a convolutional neural network. There are two technical contributions to our network design. First, we develop an attentional dense atrous (dilated) spatial pyramid pooling (AD-ASPP) module to selectively use the local saliency cues captured by dilated convolutions with a small rate and the global saliency cues captured by dilated convolutions with a large rate. Second, taking the feature pyramid network as the backbone, we develop an aggregation network to integrate the refined features by formulating two consecutive chains of residual learning based modules: one chain from deep to shallow layers while another chain from shallow to deep layers. We evaluate our network on seven widely-used saliency detection benchmarks by comparing it against 21 state-of-the-art methods. Experimental results show that our network outperforms others on all the seven benchmark datasets.
Rain is a common weather phenomenon that affects environmental monitoring and surveillance systems. According to an established rain model 2, the scene visibility in the rain varies with the depth ...from the camera, where objects faraway are visually blocked more by the fog than by the rain streaks. However, existing datasets and methods for rain removal ignore these physical properties, thus limiting the rain removal efficiency on real photos. In this work, we analyze the visual effects of rain subject to scene depth and formulate a rain imaging model that collectively considers rain streaks and fog. Also, we prepare a dataset called RainCityscapes on real outdoor photos. Furthermore, we design a novel real-time end-to-end deep neural network, for which we train to learn the depth-guided non-local features and to regress a residual map to produce a rain-free output image. We performed various experiments to visually and quantitatively compare our method with several state-of-the-art methods to show its superiority over others.
This article presents a deep normal filtering network, called DNF-Net, for mesh denoising. To better capture local geometry, our network processes the mesh in terms of local patches extracted from ...the mesh. Overall, DNF-Net is an end-to-end network that takes patches of facet normals as inputs and directly outputs the corresponding denoised facet normals of the patches. In this way, we can reconstruct the geometry from the denoised normals with feature preservation. Besides the overall network architecture, our contributions include a novel multi-scale feature embedding unit, a residual learning strategy to remove noise, and a deeply-supervised joint loss function. Compared with the recent data-driven works on mesh denoising, DNF-Net does not require manual input to extract features and better utilizes the training data to enhance its denoising performance. Finally, we present comprehensive experiments to evaluate our method and demonstrate its superiority over the state of the art on both synthetic and real-scanned meshes.
This paper presents a new approach to recognizing vanishing-point-constrained building planes from a single image of street view. We first design a novel convolutional neural network (CNN) ...architecture that generates geometric segmentation of per-pixel orientations from a single street-view image. The network combines two-stream features of general visual cues and surface normals in gated convolution layers, and employs a deeply supervised loss that encapsulates multi-scale convolutional features. Our experiments on a new benchmark with fine-grained plane segmentations of real-world street views show that our network outperforms state-of-the-arts methods of both semantic and geometric segmentation. The pixel-wise segmentation exhibits coarse boundaries and discontinuities. We then propose to rectify the pixel-wise segmentation into perspectively-projected quads based on spatial proximity between the segmentation masks and exterior line segments detected through an image processing. We demonstrate how the results can be utilized to perspectively overlay images and icons on building planes in input photos, and provide visual cues for various applications.
This paper presents a non‐local low‐rank normal filtering method for mesh denoising. By exploring the geometric similarity between local surface patches on 3D meshes in the form of normal fields, we ...devise a low‐rank recovery model that filters normal vectors by means of patch groups. In summary, our method has the following key contributions. First, we present the guided normal patch covariance descriptor to analyze the similarity between patches. Second, we pack normal vectors on similar patches into the normal‐field patch‐group (NPG) matrix for rank analysis. Third, we formulate mesh denoising as a low‐rank matrix recovery problem based on the prior that the rank of the NPG matrix is high for raw meshes with noise, but can be significantly reduced for denoised meshes, whose normal vectors across similar patches should be more strongly correlated. Furthermore, we devise an objective function based on an improved truncated γ norm, and derive an optimization procedure using the alternative direction method of multipliers and iteratively re‐weighted least squares techniques. We conducted several experiments to evaluate our method using various 3D models, and compared our results against several state‐of‐the‐art methods. Experimental results show that our method consistently outperforms other methods and better preserves the fine details.