Understanding the scene in which an autonomous robot operates is critical for its competent functioning. Such scene comprehension necessitates recognizing instances of traffic participants along with ...general scene semantics which can be effectively addressed by the panoptic segmentation task. In this paper, we introduce the Efficient Panoptic Segmentation (EfficientPS) architecture that consists of a shared backbone which efficiently encodes and fuses semantically rich multi-scale features. We incorporate a new semantic head that aggregates fine and contextual features coherently and a new variant of Mask R-CNN as the instance head. We also propose a novel panoptic fusion module that congruously integrates the output logits from both the heads of our EfficientPS architecture to yield the final panoptic segmentation output. Additionally, we introduce the KITTI panoptic segmentation dataset that contains panoptic annotations for the popularly challenging KITTI benchmark. Extensive evaluations on Cityscapes, KITTI, Mapillary Vistas and Indian Driving Dataset demonstrate that our proposed architecture consistently sets the new state-of-the-art on all these four benchmarks while being the most efficient and fast panoptic segmentation architecture to date.
Full text
Available for:
CEKLJ, DOBA, EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, IZUM, KILJ, KISLJ, MFDPS, NLZOH, NUK, OILJ, PILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, SIK, UILJ, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
Few-shot video object segmentation (FSVOS) aims to segment dynamic objects of unseen classes by resorting to a small set of support images that contain pixel-level object annotations. Existing ...methods have demonstrated that the domain agent-based attention mechanism is effective in FSVOS by learning the correlation between support images and query frames. However, the agent frame contains redundant pixel information and background noise, resulting in inferior segmentation performance. Moreover, existing methods tend to ignore inter-frame correlations in query videos. To alleviate the above dilemma, we propose a holistic prototype attention network (HPAN) for advancing FSVOS. Specifically, HPAN introduces a prototype graph attention module (PGAM) and a bidirectional prototype attention module (BPAM), transferring informative knowledge from seen to unseen classes. PGAM generates local prototypes from all foreground features and then utilizes their internal correlations to enhance the representation of the holistic prototypes. BPAM exploits the holistic information from support images and video frames by fusing co-attention and self-attention to achieve support-query semantic consistency and inner-frame temporal consistency. Extensive experiments on YouTube-FSVOS have been provided to demonstrate the effectiveness and superiority of our proposed HPAN method. Our source code and models are available anonymously at https://github.com/NUST-Machine-Intelligence-Laboratory/HPAN.
In this study, a multi-scale approach is used to improve the segmentation of a high spatial resolution (30 cm) color infrared image of a residential area. First, a series of 25 image segmentations ...are performed in Definiens Professional 5 using different scale parameters. The optimal image segmentation is identified using an unsupervised evaluation method of segmentation quality that takes into account global intra-segment and inter-segment heterogeneity measures (weighted variance and Moran’s I, respectively). Once the optimal segmentation is determined, under-segmented and over-segmented regions in this segmentation are identified using local heterogeneity measures (variance and Local Moran’s I). The under- and over-segmented regions are refined by (1) further segmenting under-segmented regions at finer scales, and (2) merging over-segmented regions with spectrally similar neighbors. This process leads to the creation of several segmentations consisting of segments generated at three different segmentation scales. Comparison of single- and multi-scale segmentations shows that identifying and refining under- and over-segmented regions using local statistics can improve global segmentation results.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UL, UM, UPCLJ, UPUK
Glaucoma is a chronic eye disease that leads to irreversible vision loss. The cup to disc ratio (CDR) plays an important role in the screening and diagnosis of glaucoma. Thus, the accurate and ...automatic segmentation of optic disc (OD) and optic cup (OC) from fundus images is a fundamental task. Most existing methods segment them separately, and rely on hand-crafted visual feature from fundus images. In this paper, we propose a deep learning architecture, named M-Net, which solves the OD and OC segmentation jointly in a one-stage multi-label system. The proposed M-Net mainly consists of multi-scale input layer, U-shape convolutional network, side-output layer, and multi-label loss function. The multi-scale input layer constructs an image pyramid to achieve multiple level receptive field sizes. The U-shape convolutional network is employed as the main body network structure to learn the rich hierarchical representation, while the side-output layer acts as an early classifier that produces a companion local prediction map for different scale layers. Finally, a multi-label loss function is proposed to generate the final segmentation map. For improving the segmentation performance further, we also introduce the polar transformation, which provides the representation of the original image in the polar coordinate system. The experiments show that our M-Net system achieves state-of-the-art OD and OC segmentation result on ORIGA data set. Simultaneously, the proposed method also obtains the satisfactory glaucoma screening performances with calculated CDR value on both ORIGA and SCES datasets.
The deficiency of segmentation labels is one of the main obstacles to semantic segmentation in the wild. To alleviate this issue, we present a novel framework that generates segmentation labels of ...images given their image-level class labels. In this weakly supervised setting, trained models have been known to segment local discriminative parts rather than the entire object area. Our solution is to propagate such local responses to nearby areas which belong to the same semantic entity. To this end, we propose a Deep Neural Network (DNN) called AffinityNet that predicts semantic affinity between a pair of adjacent image coordinates. The semantic propagation is then realized by random walk with the affinities predicted by AffinityNet. More importantly, the supervision employed to train AffinityNet is given by the initial discriminative part segmentation, which is incomplete as a segmentation annotation but sufficient for learning semantic affinities within small image areas. Thus the entire framework relies only on image-level class labels and does not require any extra data or annotations. On the PASCAL VOC 2012 dataset, a DNN learned with segmentation labels generated by our method outperforms previous models trained with the same level of supervision, and is even as competitive as those relying on stronger supervision.
Recently, deep learning techniques have achieved significant improvements in unsupervised video object segmentation (UVOS). However, many of existing approach cannot accurately identify the ...foreground objects and the background as they commonly use the coarse temporal features (e.g., optical flow and multi-frames attention). In this paper, we present a novel model termed Flow Edge-based Motion-Attentive Network (FEM-Net), to address the unsupervised video object segmentation problem. Firstly, a motion-attentive encoder is used to jointly learn the spatial and temporal features. Then, a Flow Edge Connect (FEC) module is designed to hallucinate edges of the ambiguous or missing region in the optical flow. During the segmentation stage, the complementary temporal feature composed by the motion-attentive feature and flow edge is fed into a decoder to infer the salient foreground objects. Experimental results on two challenging public benchmarks (i.e. DAVIS-16 and FBMS) demonstrate that the proposed FEM-Net compares favorably against the state-of-the-art methods.
Deep convolutional neural networks have significantly boosted the performance of fundus image segmentation when test datasets have the same distribution as the training datasets. However, in clinical ...practice, medical images often exhibit variations in appearance for various reasons, e.g., different scanner vendors and image quality. These distribution discrepancies could lead the deep networks to over-fit on the training datasets and lack generalization ability on the unseen test datasets. To alleviate this issue, we present a novel Domain-oriented Feature Embedding ( DoFE ) framework to improve the generalization ability of CNNs on unseen target domains by exploring the knowledge from multiple source domains. Our DoFE framework dynamically enriches the image features with additional domain prior knowledge learned from multi-source domains to make the semantic features more discriminative. Specifically, we introduce a Domain Knowledge Pool to learn and memorize the prior information extracted from multi-source domains. Then the original image features are augmented with domain-oriented aggregated features, which are induced from the knowledge pool based on the similarity between the input image and multi-source domain images. We further design a novel domain code prediction branch to infer this similarity and employ an attention-guided mechanism to dynamically combine the aggregated features with the semantic features. We comprehensively evaluate our DoFE framework on two fundus image segmentation tasks, including the optic cup and disc segmentation and vessel segmentation. Our DoFE framework generates satisfying segmentation results on unseen datasets and surpasses other domain generalization and network regularization methods.
In this Letter, a hierarchical conditional random field (HCRF) model-based gastric histopathology image segmentation (GHIS) method is proposed, which can localise abnormal (cancer) regions in gastric ...histopathology images to assist histopathologists in medical work. First, to obtain pixel-level segmentation information, the authors retrain a convolutional neural network (CNN) to build up their pixel-level potentials. Then, to obtain abundant spatial segmentation information in patch level, they fine tune another three CNNs to build up their patch-level potentials. Thirdly, based on the pixel- and patch-level potentials, their HCRF model is structured. Finally, a graph-based post-processing is applied to further improve their segmentation performance. In the experiment, a segmentation accuracy of $78.91\%$78.91% is achieved on a haematoxylin and eosin stained gastric histopathological dataset with 560 images, showing the effectiveness and future potential of the proposed GHIS method.
Full text
Available for:
FZAB, GIS, IJS, KILJ, NLZOH, NUK, OILJ, SAZU, SBCE, SBMB, UL, UM, UPUK
We present a simple, fully-convolutional model for real-time (<inline-formula><tex-math notation="LaTeX">>30</tex-math> ...<mml:math><mml:mrow><mml:mo>></mml:mo><mml:mn>30</mml:mn></mml:mrow></mml:math><inline-graphic xlink:href="zhou-ieq1-3014297.gif"/> </inline-formula> fps) instance segmentation that achieves competitive results on MS COCO evaluated on a single Titan Xp, which is significantly faster than any previous state-of-the-art approach. Moreover, we obtain this result after training on only one GPU . We accomplish this by breaking instance segmentation into two parallel subtasks: (1) generating a set of prototype masks and (2) predicting per-instance mask coefficients. Then we produce instance masks by linearly combining the prototypes with the mask coefficients. We find that because this process doesn't depend on repooling, this approach produces very high-quality masks and exhibits temporal stability for free. Furthermore, we analyze the emergent behavior of our prototypes and show they learn to localize instances on their own in a translation variant manner, despite being fully-convolutional. We also propose Fast NMS, a drop-in 12 ms faster replacement for standard NMS that only has a marginal performance penalty. Finally, by incorporating deformable convolutions into the backbone network, optimizing the prediction head with better anchor scales and aspect ratios, and adding a novel fast mask re-scoring branch, our YOLACT++ model can achieve 34.1 mAP on MS COCO at 33.5 fps, which is fairly close to the state-of-the-art approaches while still running at real-time.
Why are elite jewelers reluctant to sell turquoise, despite strong demand? Why did leading investment bankers shun junk bonds for years, despite potential profits?Status Signalsis the first major ...sociological examination of how concerns about status affect market competition. Starting from the basic premise that status pervades the ties producers form in the marketplace, Joel Podolny shows how anxieties about status influence whom a producer does (or does not) accept as a partner, the price a producer can charge, the ease with which a producer enters a market, how the producer's inventions are received, and, ultimately, the market segments the producer can (and should) enter. To achieve desired status, firms must offer more than strong past performance and product quality--they must also send out and manage social and cultural signals.
Through detailed analyses of market competition across a broad array of industries--including investment banking, wine, semiconductors, shipping, and venture capital--Podolny demonstrates the pervasive impact of status. Along the way, he shows how corporate strategists, tempted by the profits of a market that would negatively affect their status, consider not only whether to enter the market but also whether they can alter the public's perception of the market. Podolny also examines the different ways in which a firm can have status. Wal-Mart, for example, has low status among the rich as a place to shop, but high status among the rich as a place to invest.
Status Signalsprovides a systematic understanding of market dynamics that have--until now--not been fully appreciated.
Full text
Available for:
CEKLJ, NUK, ODKLJ, UL, UM, UPUK