Semi-supervised video object segmentation is a task of propagating instance masks given in the first frame to the entire video. It is a challenging task since it usually suffers from heavy ...occlusions, large deformation, and large variations of objects. To alleviate these problems, many existing works apply time-consuming techniques such as fine-tuning, post-processing, or extracting optical flow, which makes them intractable for online segmentation. In our work, we focus on online semi-supervised video object segmentation. We propose a GCSeg (Guided Co-Segmentation) Network which is mainly composed of a Reference Module and a Co-segmentation Module, to simultaneously incorporate the short-term, middle-term, and long-term temporal inter-frame relationships. Moreover, we propose an Adaptive Search Strategy to reduce the risk of propagating inaccurate segmentation results in subsequent frames. Our GCSeg network achieves state-of-the-art performance on online semi-supervised video object segmentation on Davis 2016 and Davis 2017 datasets.
Segmenting unknown or anomalous object instances is a critical task in autonomous driving applications, and it is approached traditionally as a per-pixel classification problem. However, reasoning ...individually about each pixel without considering their contextual semantics results in high uncertainty around the objects' boundaries and numerous false positives. We propose a paradigm change by shifting from a per-pixel classification to a mask classification. Our mask-based method, Mask2Anomaly, demonstrates the feasibility of integrating a mask-classification architecture to jointly address anomaly segmentation, open-set semantic segmentation, and open-set panoptic segmentation. Mask2Anomaly includes several technical novelties that are designed to improve the detection of anomalies/unknown objects: i) a global masked attention module to focus individually on the foreground and background regions; ii) a mask contrastive learning that maximizes the margin between an anomaly and known classes; iii) a mask refinement solution to reduce false positives; and iv) a novel approach to mine unknown instances based on the mask- architecture properties. By comprehensive qualitative and qualitative evaluation, we show Mask2Anomaly achieves new state-of-the-art results across the benchmarks of anomaly segmentation, open-set semantic segmentation, and open-set panoptic segmentation. The code and pre-trained models are available: https://github.com/shyam671/Mask2Anomaly-Unmasking-Anomalies-in-Road-Scene-Segmentation/tree/main .
In this study, a survey of multiple sclerosis (MS) classification and segmentation process is presented, which is based on magnetic resonance imaging. Knowledge of MS lesions is gained by determining ...the number of sample lesions in order that the lesion development level can be followed precisely; therefore, the effects of pharmaceuticals in medical tests can be accurately assessed. Accurate recognition of MS lesions in magnetic resonance images is an additionally complex process because of their changing shapes and sizes which can be very difficult to identify based on anatomical positions in various subjects. This can be determined by precise segmentation; manual segmentation would be very difficult to perform as it requires high level knowledge which takes additional time. Inter- and intra-expert variability need to be determined in order to perform the automated segmentation of lesions. The principal aim of this survey effort is to provide an analysis of the different categorization and segmentation methods and their techniques. This survey work will be valuable for researchers working in MS by considering and carefully evaluating the past work. The benefits and drawbacks of existing techniques are reviewed and the issue of MS lesion segmentation and classification is elucidated.
Fast Object Segmentation in Unconstrained Video Papazoglou, Anestis; Ferrari, Vittorio
2013 IEEE International Conference on Computer Vision,
12/2013
Conference Proceeding, Journal Article
Open access
We present a technique for separating foreground objects from the background in a video. Our method is fast, fully automatic, and makes minimal assumptions about the video. This enables handling ...essentially unconstrained settings, including rapidly moving background, arbitrary object motion and appearance, and non-rigid deformations and articulations. In experiments on two datasets containing over 1400 video shots, our method outperforms a state-of-the-art background subtraction technique 4 as well as methods based on clustering point tracks 6, 18, 19. Moreover, it performs comparably to recent video object segmentation methods based on object proposals 14, 16, 27, while being orders of magnitude faster.
•We propose the first interactive deep learning frame work to facilitate and speed up collecting reproducible and reliable annotation in the field of computational pathology.•We propose a deep ...network model using guiding signals and multi-scale blocks for precise segmentation of microscopic objects in a range of scales.•We propose a method based on morphological skeleton for extracting guiding signals from gland masks, capable of identifying holes in objects.•Performing various experiments to show the effectiveness and generalizability of the NuClick.•We release two datasets of lymphocyte dense annotations in IHC images and touching white blood cells (WBCs) in blood sample images.
Display omitted
Object segmentation is an important step in the workflow of computational pathology. Deep learning based models generally require large amount of labeled data for precise and reliable prediction. However, collecting labeled data is expensive because it often requires expert knowledge, particularly in medical imaging domain where labels are the result of a time-consuming analysis made by one or more human experts. As nuclei, cells and glands are fundamental objects for downstream analysis in computational pathology/cytology, in this paper we propose NuClick, a CNN-based approach to speed up collecting annotations for these objects requiring minimum interaction from the annotator. We show that for nuclei and cells in histology and cytology images, one click inside each object is enough for NuClick to yield a precise annotation. For multicellular structures such as glands, we propose a novel approach to provide the NuClick with a squiggle as a guiding signal, enabling it to segment the glandular boundaries. These supervisory signals are fed to the network as auxiliary inputs along with RGB channels. With detailed experiments, we show that NuClick is applicable to a wide range of object scales, robust against variations in the user input, adaptable to new domains, and delivers reliable annotations. An instance segmentation model trained on masks generated by NuClick achieved the first rank in LYON19 challenge. As exemplar outputs of our framework, we are releasing two datasets: 1) a dataset of lymphocyte annotations within IHC images, and 2) a dataset of segmented WBCs in blood smear images.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Semantic segmentation is a fundamental task in computer vision, and it has various applications in fields such as robotic sensing, video surveillance, and autonomous driving. A major research topic ...in urban road semantic segmentation is the proper integration and use of cross-modal information for fusion. Here, we attempt to leverage inherent multimodal information and acquire graded features to develop a novel multilabel-learning network for RGB-thermal urban scene semantic segmentation. Specifically, we propose a strategy for graded-feature extraction to split multilevel features into junior, intermediate, and senior levels. Then, we integrate RGB and thermal modalities with two distinct fusion modules, namely a shallow feature fusion module and deep feature fusion module for junior and senior features. Finally, we use multilabel supervision to optimize the network in terms of semantic, binary, and boundary characteristics. Experimental results confirm that the proposed architecture, the graded-feature multilabel-learning network, outperforms state-of-the-art methods for urban scene semantic segmentation, and it can be generalized to depth data.
A common shortfall of supervised deep learning for medical imaging is the lack of labeled data, which is often expensive and time consuming to collect. This article presents a new semisupervised ...method for medical image segmentation, where the network is optimized by a weighted combination of a common supervised loss only for the labeled inputs and a regularization loss for both the labeled and unlabeled data. To utilize the unlabeled data, our method encourages consistent predictions of the network-in-training for the same input under different perturbations. With the semisupervised segmentation tasks, we introduce a transformation-consistent strategy in the self-ensembling model to enhance the regularization effect for pixel-level predictions. To further improve the regularization effects, we extend the transformation in a more generalized form including scaling and optimize the consistency loss with a teacher model, which is an averaging of the student model weights. We extensively validated the proposed semisupervised method on three typical yet challenging medical image segmentation tasks: 1) skin lesion segmentation from dermoscopy images in the International Skin Imaging Collaboration (ISIC) 2017 data set; 2) optic disk (OD) segmentation from fundus images in the Retinal Fundus Glaucoma Challenge (REFUGE) data set; and 3) liver segmentation from volumetric CT scans in the Liver Tumor Segmentation Challenge (LiTS) data set. Compared with state-of-the-art, our method shows superior performance on the challenging 2-D/3-D medical images, demonstrating the effectiveness of our semisupervised method for medical image segmentation.
Understanding the scene in which an autonomous robot operates is critical for its competent functioning. Such scene comprehension necessitates recognizing instances of traffic participants along with ...general scene semantics which can be effectively addressed by the panoptic segmentation task. In this paper, we introduce the Efficient Panoptic Segmentation (EfficientPS) architecture that consists of a shared backbone which efficiently encodes and fuses semantically rich multi-scale features. We incorporate a new semantic head that aggregates fine and contextual features coherently and a new variant of Mask R-CNN as the instance head. We also propose a novel panoptic fusion module that congruously integrates the output logits from both the heads of our EfficientPS architecture to yield the final panoptic segmentation output. Additionally, we introduce the KITTI panoptic segmentation dataset that contains panoptic annotations for the popularly challenging KITTI benchmark. Extensive evaluations on Cityscapes, KITTI, Mapillary Vistas and Indian Driving Dataset demonstrate that our proposed architecture consistently sets the new state-of-the-art on all these four benchmarks while being the most efficient and fast panoptic segmentation architecture to date.
Full text
Available for:
CEKLJ, DOBA, EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, IZUM, KILJ, KISLJ, MFDPS, NLZOH, NUK, OILJ, PILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, SIK, UILJ, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
Few-shot video object segmentation (FSVOS) aims to segment dynamic objects of unseen classes by resorting to a small set of support images that contain pixel-level object annotations. Existing ...methods have demonstrated that the domain agent-based attention mechanism is effective in FSVOS by learning the correlation between support images and query frames. However, the agent frame contains redundant pixel information and background noise, resulting in inferior segmentation performance. Moreover, existing methods tend to ignore inter-frame correlations in query videos. To alleviate the above dilemma, we propose a holistic prototype attention network (HPAN) for advancing FSVOS. Specifically, HPAN introduces a prototype graph attention module (PGAM) and a bidirectional prototype attention module (BPAM), transferring informative knowledge from seen to unseen classes. PGAM generates local prototypes from all foreground features and then utilizes their internal correlations to enhance the representation of the holistic prototypes. BPAM exploits the holistic information from support images and video frames by fusing co-attention and self-attention to achieve support-query semantic consistency and inner-frame temporal consistency. Extensive experiments on YouTube-FSVOS have been provided to demonstrate the effectiveness and superiority of our proposed HPAN method. Our source code and models are available anonymously at https://github.com/NUST-Machine-Intelligence-Laboratory/HPAN.
In this study, a multi-scale approach is used to improve the segmentation of a high spatial resolution (30 cm) color infrared image of a residential area. First, a series of 25 image segmentations ...are performed in Definiens Professional 5 using different scale parameters. The optimal image segmentation is identified using an unsupervised evaluation method of segmentation quality that takes into account global intra-segment and inter-segment heterogeneity measures (weighted variance and Moran’s I, respectively). Once the optimal segmentation is determined, under-segmented and over-segmented regions in this segmentation are identified using local heterogeneity measures (variance and Local Moran’s I). The under- and over-segmented regions are refined by (1) further segmenting under-segmented regions at finer scales, and (2) merging over-segmented regions with spectrally similar neighbors. This process leads to the creation of several segmentations consisting of segments generated at three different segmentation scales. Comparison of single- and multi-scale segmentations shows that identifying and refining under- and over-segmented regions using local statistics can improve global segmentation results.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NUK, PNG, SAZU, SBCE, SBJE, UL, UM, UPCLJ, UPUK