•Six convnet fusion architectures and their adaptations are investigated.•IAF R-CNN model for multispectral pedestrian detection is proposed.•An illumination-aware weighting mechanism is ...introduced.•A new state-of-the-art result on KAIST Benchmark is reported.
Multispectral images of color-thermal pairs have shown more effective than a single color channel for pedestrian detection, especially under challenging illumination conditions. However, there is still a lack of studies on how to fuse the two modalities effectively. In this paper, we deeply compare six different convolutional network fusion architectures and analyse their adaptations, enabling a vanilla architecture to obtain detection performances comparable to the state-of-the-art results. Further, we discover that pedestrian detection confidences from color or thermal images are correlated with illumination conditions. With this in mind, we propose an Illumination-aware Faster R-CNN (IAF R-CNN). Specifically, an Illumination-aware Network is introduced to give an illumination measure of the input image. Then we adaptively merge color and thermal sub-networks via a gate function defined over the illumination value. The experimental results on KAIST Multispectral Pedestrian Benchmark validate the effectiveness of the proposed IAF R-CNN.
Purpose
The automatic detection of pulmonary nodules using CT scans improves the efficiency of lung cancer diagnosis, and false‐positive reduction plays a significant role in the detection. In this ...paper, we focus on the false‐positive reduction task and propose an effective method for this task.
Methods
We construct a deep 3D residual CNN (convolution neural network) to reduce false‐positive nodules from candidate nodules. The proposed network is much deeper than the traditional 3D CNNs used in medical image processing. Specifically, in the network, we design a spatial pooling and cropping (SPC) layer to extract multilevel contextual information of CT data. Moreover, we employ an online hard sample selection strategy in the training process to make the network better fit hard samples (e.g., nodules with irregular shapes).
Results
Our method is evaluated on 888 CT scans from the dataset of the LUNA16 Challenge. The free‐response receiver operating characteristic (FROC) curve shows that the proposed method achieves a high detection performance.
Conclusions
Our experiments confirm that our method is robust and that the SPC layer helps increase the prediction accuracy. Additionally, the proposed method can easily be extended to other 3D object detection tasks in medical image processing.
Deep learning-based super-resolution (SR) techniques have generally achieved excellent performance in the computer vision field. Recently, it has been proven that three-dimensional (3D) SR for ...medical volumetric data delivers better visual results than conventional two-dimensional (2D) processing. However, deepening and widening 3D networks increases training difficulty significantly due to the large number of parameters and small number of training samples. Thus, we propose a 3D convolutional neural network (CNN) for SR of medical volumetric data called ParallelNet using parallel connections. We construct a parallel connection structure based on the group convolution and feature aggregation to build a 3D CNN that is as wide as possible with few parameters. As a result, the model thoroughly learns more feature maps with larger receptive fields. In addition, to further improve accuracy, we present an efficient version of ParallelNet (called VolumeNet), which reduces the number of parameters and deepens ParallelNet using a proposed lightweight building block module called the Queue module. Unlike most lightweight CNNs based on depthwise convolutions, the Queue module is primarily constructed using separable 2D cross-channel convolutions. As a result, the number of network parameters and computational complexity can be reduced significantly while maintaining accuracy due to full channel fusion. Experimental results demonstrate that the proposed VolumeNet significantly reduces the number of model parameters and achieves high precision results compared to state-of-the-art methods.
Joined fragment segmentation for fractured bones segmented from CT (computed tomography) images is a time-consuming task and calls for lots of interactions. To alleviate segmentation burdens of ...radiologists, we propose a graphics processing unit (GPU)–accelerated 3D segmentation framework requiring less interactions and lower time cost compared with existing methods. We first leverage the normal-based erosion method to separate joined bone fragments. After labeling the separated fragments via CCL (connected component labeling) algorithm, the record-based dilation method is eventually employed to restore bone’s original shape. Besides, we introduce an additional random walk algorithm to tackle the special case where fragments are strongly joined. For efficient fragment segmentation, the framework is carried out in parallel with GPU-acceleration technology. Experiments on realistic CT volumes demonstrate that our framework can attain accurate fragment segmentations with dice scores over 99% and averagely takes 3.47 s to complete the segmentation task for a fractured bone volume of 512 × 512 × 425 voxels.
Graphical Abstract
We propose a GPU accelerated segmentation framework, which mainly consists of normal-based erosion and record-based dilation, to automatically segment joined fragments for most cases. For the remaining cases, we introduce a random walk algorithm for segmentation with a few interactions.
Continuous penalty forces Tang, Min; Manocha, Dinesh; Otaduy, Miguel A. ...
ACM transactions on graphics,
07/2012, Letnik:
31, Številka:
4
Journal Article
Recenzirano
We present a simple algorithm to compute continuous penalty forces to determine collision response between rigid and deformable models bounded by triangle meshes. Our algorithm computes a ...well-behaved solution in contrast to the traditional stability and robustness problems of penalty methods, induced by force discontinuities. We trace contact features along their deforming trajectories and accumulate penalty forces along the penetration time intervals between the overlapping feature pairs. Moreover, we present a closed-form expression to compute the continuous and smooth collision response. Our method has very small additional overhead compared to previous penalty methods, and can significantly improve the stability and robustness. We highlight its benefits on several benchmarks.
Purpose
Liver tumor segmentation is a crucial prerequisite for computer‐aided diagnosis of liver tumors. In the clinical diagnosis of liver tumors, radiologists usually examine multiphase CT images ...as these images provide abundant and complementary information of tumors. However, most known automatic segmentation methods extract tumor features from CT images merely of a single phase, in which valuable multiphase information is ignored. Therefore, it is highly demanded to develop a method effectively incorporating multiphase information for automatic and accurate liver tumor segmentation.
Methods
In this paper, we propose a phase attention residual network (PA‐ResSeg) to model multiphase features for accurate liver tumor segmentation. A phase attention (PA) is newly proposed to additionally exploit the images of arterial (ART) phase to facilitate the segmentation of portal venous (PV) phase. The PA block consists of an intraphase attention (intra‐PA) module and an interphase attention (inter‐PA) module to capture channel‐wise self‐dependencies and cross‐phase interdependencies, respectively. Thus, it enables the network to learn more representative multiphase features by refining the PV features according to the channel dependencies and recalibrating the ART features based on the learned interdependencies between phases. We propose a PA‐based multiscale fusion (MSF) architecture to embed the PA blocks in the network at multiple levels along the encoding path to fuse multiscale features from multiphase images. Moreover, a 3D boundary‐enhanced loss (BE‐loss) is proposed for training to make the network more sensitive to boundaries.
Results
To evaluate the performance of our proposed PA‐ResSeg, we conducted experiments on a multiphase CT dataset of focal liver lesions (MPCT‐FLLs). Experimental results show the effectiveness of the proposed method by achieving a dice per case (DPC) of 0.7787, a dice global (DG) of 0.8682, a volumetric overlap error (VOE) of 0.3328, and a relative volume difference (RVD) of 0.0443 on the MPCT‐FLLs. Furthermore, to validate the effectiveness and robustness of PA‐ResSeg, we conducted extra experiments on another multiphase liver tumor dataset and obtained a DPC of 0.8290, a DG of 0.9132, a VOE of 0.2637, and a RVD of 0.0163. The proposed method shows its robustness and generalization capability in different datasets and different backbones.
Conclusions
The study demonstrates that our method can effectively model information from multiphase CT images to segment liver tumors and outperforms other state‐of‐the‐art methods. The PA‐based MSF method can learn more representative multiphase features at multiple scales and thereby improve the segmentation performance. Besides, the proposed 3D BE‐loss is conducive to tumor boundary segmentation by enforcing the network focus on boundary regions and marginal slices. Experimental results evaluated by quantitative metrics demonstrate the superiority of our PA‐ResSeg over the best‐known methods.
Using dark channel prior—a kind of statistics of the haze-free outdoor images—to remove haze from a single image input is simple and effective. However, due to the use of soft matting algorithm, the ...method suffers from massive consumption of both memory and time, which largely limits its scalability for large images. In this paper, we present a hierarchical approach to accelerate dark channel based image dehazing. The core of our approach is a novel, efficient scheme for solving the soft matting problem involved in image dehazing, using adaptively subdivided quadtrees built in image space. Acceleration is achieved by transforming the problem of solving a N-variable linear system required in soft matting, to a problem of solving a much smaller m-variable linear system, where N is the number of pixels and m is the number of the corners in the quadtree. Our approach significantly reduces both space and time cost while still maintains visual fidelity, and largely extends the practicability of dark channel based image dehazing to handle large images.
Multimodal magnetic resonance imaging (MRI) provides complementary information about targets, and the segmentation of multimodal MRI is widely used as an essential preprocessing step for initial ...diagnosis, stage differentiation, and post-treatment efficacy evaluation in clinical situations. For the main modality or each of the modalities, it is important to enhance the visual information by modeling the connection and effectively fusing the features among them. However, the existing methods for multimodal segmentation have a drawback; they coincidentally drop information of individual modality during the fusion process. Recently, graph learning-based methods have been applied in segmentation, and these methods have achieved considerable improvements by modeling the relationships across feature regions and reasoning using global information. In this paper, we propose a graph learning-based approach to efficiently extract modality-specific features and establish regional correspondence effectively among all modalities. In detail, after projecting features into a graph domain and employing graph convolution to propagate information across all regions for learning global modality-specific features, we propose a mutual information-based graph co-attention module to learn the weight coefficients of one bipartite graph constructed by the fully connected graphs having different modalities in the graph domain and by selectively fusing the node features. Based on the deformation diagram between the spatial-graph space and our proposed graph co-attention module, we present a multimodal prior-guided segmentation framework, which uses two strategies for two clinical situations: Modality-Specific Learning Strategy and Co-Modality Learning Strategy . Besides, the improved Co-Modality Learning Strategy is used with trainable weights in the multi-task loss for the optimization of the proposed framework. We validated our proposed modules and frameworks on two multimodal MRI datasets: our private liver lesion dataset and a public prostate zone dataset. Our experimental results on both datasets prove the superiority of our proposed approaches.
COVID-19 pneumonia is a disease that causes an existential health crisis in many people by directly affecting and damaging lung cells. The segmentation of infected areas from computed tomography (CT) ...images can be used to assist and provide useful information for COVID-19 diagnosis. Although several deep learning-based segmentation methods have been proposed for COVID-19 segmentation and have achieved state-of-the-art results, the segmentation accuracy is still not high enough (approximately 85%) due to the variations of COVID-19 infected areas (such as shape and size variations) and the similarities between COVID-19 and non-COVID-infected areas. To improve the segmentation accuracy of COVID-19 infected areas, we propose an interactive attention refinement network (Attention RefNet). The interactive attention refinement network can be connected with any segmentation network and trained with the segmentation network in an end-to-end fashion. We propose a skip connection attention module to improve the important features in both segmentation and refinement networks and a seed point module to enhance the important seeds (positions) for interactive refinement. The effectiveness of the proposed method was demonstrated on public datasets (COVID-19CTSeg and MICCAI) and our private multicenter dataset. The segmentation accuracy was improved to more than 90%. We also confirmed the generalizability of the proposed network on our multicenter dataset. The proposed method can still achieve high segmentation accuracy.
Estimation of 3D body shapes from dressed‐human photos is an important but challenging problem in virtual fitting. We propose a novel automatic framework to efficiently estimate 3D body shapes under ...clothes. We construct a database of 3D naked and dressed body pairs, based on which we learn how to predict 3D positions of body landmarks (which further constrain a parametric human body model) automatically according to dressed‐human silhouettes. Critical vertices are selected on 3D registered human bodies as landmarks to represent body shapes, so as to avoid the time‐consuming vertices correspondences finding process for parametric body reconstruction. Our method can estimate 3D body shapes from dressed‐human silhouettes within 4 seconds, while the fastest method reported previously need 1 minute. In addition, our estimation error is within the size tolerance for clothing industry. We dress 6042 naked bodies with 3 sets of common clothes by physically based cloth simulation technique. To the best of our knowledge, We are the first to construct such a database containing 3D naked and dressed body pairs and our database may contribute to the areas of human body shapes estimation and cloth simulation.