The state-of-the-art models for medical image segmentation are variants of U-Net and fully convolutional networks (FCN). Despite their success, these models have two limitations: (1) their optimal ...depth is apriori unknown, requiring extensive architecture search or inefficient ensemble of models of varying depths; and (2) their skip connections impose an unnecessarily restrictive fusion scheme, forcing aggregation only at the same-scale feature maps of the encoder and decoder sub-networks. To overcome these two limitations, we propose UNet++, a new neural architecture for semantic and instance segmentation, by (1) alleviating the unknown network depth with an efficient ensemble of U-Nets of varying depths, which partially share an encoder and co-learn simultaneously using deep supervision; (2) redesigning skip connections to aggregate features of varying semantic scales at the decoder sub-networks, leading to a highly flexible feature fusion scheme; and (3) devising a pruning scheme to accelerate the inference speed of UNet++. We have evaluated UNet++ using six different medical image segmentation datasets, covering multiple imaging modalities such as computed tomography (CT), magnetic resonance imaging (MRI), and electron microscopy (EM), and demonstrating that (1) UNet++ consistently outperforms the baseline models for the task of semantic segmentation across different datasets and backbone architectures; (2) UNet++ enhances segmentation quality of varying-size objects-an improvement over the fixed-depth U-Net; (3) Mask RCNN++ (Mask R-CNN with UNet++ design) outperforms the original Mask R-CNN for the task of instance segmentation; and (4) pruned UNet++ models achieve significant speedup while showing only modest performance degradation. Our implementation and pre-trained models are available at https://github.com/MrGiovanni/UNetPlusPlus.
Video Object Segmentation, and video processing in general, has been historically dominated by methods that rely on the temporal consistency and redundancy in consecutive video frames. When the ...temporal smoothness is suddenly broken, such as when an object is occluded, or some frames are missing in a sequence, the result of these methods can deteriorate significantly. This paper explores the orthogonal approach of processing each frame independently, i.e., disregarding the temporal information. In particular, it tackles the task of semi-supervised video object segmentation: the separation of an object from the background in a video, given its mask in the first frame. We present Semantic One-Shot Video Object Segmentation (OSVOS^\mathrm {S}S), based on a fully-convolutional neural network architecture that is able to successively transfer generic semantic information, learned on ImageNet, to the task of foreground segmentation, and finally to learning the appearance of a single annotated object of the test sequence (hence one shot). We show that instance-level semantic information, when combined effectively, can dramatically improve the results of our previous method, OSVOS. We perform experiments on two recent single-object video segmentation databases, which show that OSVOS^\mathrm {S}S is both the fastest and most accurate method in the state of the art. Experiments on multi-object video segmentation show that OSVOS^\mathrm {S}S obtains competitive results.
Image Segmentation Using Deep Learning: A Survey Minaee, Shervin; Boykov, Yuri; Porikli, Fatih ...
IEEE transactions on pattern analysis and machine intelligence,
07/2022, Letnik:
44, Številka:
7
Journal Article
Recenzirano
Odprti dostop
Image segmentation is a key task in computer vision and image processing with important applications such as scene understanding, medical image analysis, robotic perception, video surveillance, ...augmented reality, and image compression, among others, and numerous segmentation algorithms are found in the literature. Against this backdrop, the broad success of deep learning (DL) has prompted the development of new image segmentation approaches leveraging DL models. We provide a comprehensive review of this recent literature, covering the spectrum of pioneering efforts in semantic and instance segmentation, including convolutional pixel-labeling networks, encoder-decoder architectures, multiscale and pyramid-based approaches, recurrent networks, visual attention models, and generative models in adversarial settings. We investigate the relationships, strengths, and challenges of these DL-based segmentation models, examine the widely used datasets, compare performances, and discuss promising research directions.
A Survey on Deep Learning Technique for Video Segmentation Zhou, Tianfei; Porikli, Fatih; Crandall, David J. ...
IEEE transactions on pattern analysis and machine intelligence,
2023-June-1, 2023-Jun, 2023-6-1, 20230601, Letnik:
45, Številka:
6
Journal Article
Recenzirano
Odprti dostop
Video segmentation-partitioning video frames into multiple segments or objects-plays a critical role in a broad range of practical applications, from enhancing visual effects in movie, to ...understanding scenes in autonomous driving, to creating virtual background in video conferencing. Recently, with the renaissance of connectionism in computer vision, there has been an influx of deep learning based approaches for video segmentation that have delivered compelling performance. In this survey, we comprehensively review two basic lines of research - generic object segmentation (of unknown categories) in videos, and video semantic segmentation - by introducing their respective task settings, background concepts, perceived need, development history, and main challenges. We also offer a detailed overview of representative literature on both methods and datasets. We further benchmark the reviewed methods on several well-known datasets. Finally, we point out open issues in this field, and suggest opportunities for further research. We also provide a public website to continuously track developments in this fast advancing field: https://github.com/tfzhou/VS-Survey .
Purpose
Automated delineation of structures and organs is a key step in medical imaging. However, due to the large number and diversity of structures and the large variety of segmentation algorithms, ...a consensus is lacking as to which automated segmentation method works best for certain applications. Segmentation challenges are a good approach for unbiased evaluation and comparison of segmentation algorithms.
Methods
In this work, we describe and present the results of the Head and Neck Auto‐Segmentation Challenge 2015, a satellite event at the Medical Image Computing and Computer Assisted Interventions (MICCAI) 2015 conference. Six teams participated in a challenge to segment nine structures in the head and neck region of CT images: brainstem, mandible, chiasm, bilateral optic nerves, bilateral parotid glands, and bilateral submandibular glands.
Results
This paper presents the quantitative results of this challenge using multiple established error metrics and a well‐defined ranking system. The strengths and weaknesses of the different auto‐segmentation approaches are analyzed and discussed.
Conclusions
The Head and Neck Auto‐Segmentation Challenge 2015 was a good opportunity to assess the current state‐of‐the‐art in segmentation of organs at risk for radiotherapy treatment. Participating teams had the possibility to compare their approaches to other methods under unbiased and standardized circumstances. The results demonstrate a clear tendency toward more general purpose and fewer structure‐specific segmentation algorithms.
Saliency-Aware Video Object Segmentation Wenguan Wang; Jianbing Shen; Ruigang Yang ...
IEEE transactions on pattern analysis and machine intelligence,
2018-Jan.-1, 2018-01-00, 2018-1-1, 20180101, Letnik:
40, Številka:
1
Journal Article
Recenzirano
Odprti dostop
Video saliency, aiming for estimation of a single dominant object in a sequence, offers strong object-level cues for unsupervised video object segmentation. In this paper, we present a geodesic ...distance based technique that provides reliable and temporally consistent saliency measurement of superpixels as a prior for pixel-wise labeling. Using undirected intra-frame and inter-frame graphs constructed from spatiotemporal edges or appearance and motion, and a skeleton abstraction step to further enhance saliency estimates, our method formulates the pixel-wise segmentation task as an energy minimization problem on a function that consists of unary terms of global foreground and background models, dynamic location models, and pairwise terms of label smoothness potentials. We perform extensive quantitative and qualitative experiments on benchmark datasets. Our method achieves superior performance in comparison to the current state-of-the-art in terms of accuracy and speed.
State-of-the-art semantic segmentation methods require sufficient labeled data to achieve good results and hardly work on unseen classes without fine-tuning. Few-shot segmentation is thus proposed to ...tackle this problem by learning a model that quickly adapts to new classes with a few labeled support samples. Theses frameworks still face the challenge of generalization ability reduction on unseen classes due to inappropriate use of high-level semantic information of training classes and spatial inconsistency between query and support targets. To alleviate these issues, we propose the Prior Guided Feature Enrichment Network (PFENet). It consists of novel designs of (1) a training-free prior mask generation method that not only retains generalization power but also improves model performance and (2) Feature Enrichment Module (FEM) that overcomes spatial inconsistency by adaptively enriching query features with support features and prior masks. Extensive experiments on PASCAL-5<inline-formula><tex-math notation="LaTeX">^i</tex-math> <mml:math><mml:msup><mml:mrow/><mml:mi>i</mml:mi></mml:msup></mml:math><inline-graphic xlink:href="tian-ieq1-3013717.gif"/> </inline-formula> and COCO prove that the proposed prior generation method and FEM both improve the baseline method significantly. Our PFENet also outperforms state-of-the-art methods by a large margin without efficiency loss. It is surprising that our model even generalizes to cases without labeled support samples.
This reprint showcases a selection of bleeding-edge articles about medical image processing and segmentation workflows based on artificial intelligence algorithms. The proposed papers are applied to ...multiple and different anatomical districts and clinical scenarios.
The success of deep convolutional neural networks (NNs) on image classification and recognition tasks has led to new applications in very diversified contexts, including the field of medical imaging. ...In this paper, we investigate and propose NN architectures for automated multiclass segmentation of anatomical organs in chest radiographs (CXRs), namely for lungs, clavicles, and heart. We address several open challenges including model overfitting, reducing number of parameters, and handling of severely imbalanced data in CXR by fusing recent concepts in convolutional networks and adapting them to the segmentation problem task in CXR. We demonstrate that our architecture combining delayed subsampling, exponential linear units, highly restrictive regularization, and a large number of high-resolution low-level abstract features outperforms state-of-the-art methods on all considered organs, as well as the human observer on lungs and heart. The models use a multiclass configuration with three target classes and are trained and tested on the publicly available Japanese Society of Radiological Technology database, consisting of 247 X-ray images the ground-truth masks for which are available in the segmentation in CXR database. Our best performing model, trained with the loss function based on the Dice coefficient, reached mean Jaccard overlap scores of 95% for lungs, 86.8% for clavicles, and 88.2% for heart. This architecture outperformed the human observer results for lungs and heart.
Instance segmentation of biological cells is important in medical image analysis for identifying and segmenting individual cells, and quantitative measurement of subcellular structures requires ...further cell-level subcellular part segmentation. Subcellular structure measurements are critical for cell phenotyping and quality analysis. For these purposes, instance-aware part segmentation network is first introduced to distinguish individual cells and segment subcellular structures for each detected cell. This approach is demonstrated on human sperm cells since the World Health Organization has established quantitative standards for sperm quality assessment. Specifically, a novel Cell Parsing Net (CP-Net) is proposed for accurate instance-level cell parsing. An attention-based feature fusion module is designed to alleviate contour misalignments for cells with an irregular shape by using instance masks as spatial cues instead of as strict constraints to differentiate various instances. A coarse-to-fine segmentation module is developed to effectively segment tiny subcellular structures within a cell through hierarchical segmentation from whole to part instead of directly segmenting each cell part. Moreover, a sperm parsing dataset is built including 320 annotated sperm images with five semantic subcellular part labels. Extensive experiments on the collected dataset demonstrate that the proposed CP-Net outperforms state-of-the-art instance-aware part segmentation networks.
•Subcellular measurements are important for cell phenotyping and quality analysis.•Instance-aware part segmentation network segments subcellular structures within cell.•Attention-based feature fusion module alleviates contour misalignments.•Coarse-to-fine module effectively segments tiny subcellular structures.•The network outperforms state-of-the-art instance-aware part segmentation networks.