Camouflaged objects attempt to conceal their texture into the background and discriminating them from the background is hard even for human beings. The main objective of this paper is to explore the ...camouflaged object segmentation problem, namely, segmenting the camouflaged object(s) for a given image. This problem has not been well studied in spite of a wide range of potential applications including the preservation of wild animals and the discovery of new species, surveillance systems, search-and-rescue missions in the event of natural disasters such as earthquakes, floods or hurricanes. This paper addresses a new challenging problem of camouflaged object segmentation. To address this problem, we provide a new image dataset of camouflaged objects for benchmarking purposes. In addition, we propose a general end-to-end network, called the Anabranch Network, that leverages both classification and segmentation tasks. Different from existing networks for segmentation, our proposed network possesses the second branch for classification to predict the probability of containing camouflaged object(s) in an image, which is then fused into the main branch for segmentation to boost up the segmentation accuracy. Extensive experiments conducted on the newly built dataset demonstrate the effectiveness of our network using various fully convolutional networks.
•We provide a new image dataset of camouflaged objects (CAMO) to promote new methods for camouflaged object segmentation.•We propose a novel universal end-to-end network, called the Anabranch Network (ANet), for camouflaged object segmentation.
This paper pushes the envelope on decomposing camouflaged regions in an image into meaningful components, namely, camouflaged instances. To promote the new task of camouflaged instance segmentation ...of in-the-wild images, we introduce a dataset, dubbed CAMO++, that extends our preliminary CAMO dataset (camouflaged object segmentation) in terms of quantity and diversity. The new dataset substantially increases the number of images with hierarchical pixel-wise ground truths. We also provide a benchmark suite for the task of camouflaged instance segmentation. In particular, we present an extensive evaluation of state-of-the-art instance segmentation methods on our newly constructed CAMO++ dataset in various scenarios. We also present a camouflage fusion learning (CFL) framework for camouflaged instance segmentation to further improve the performance of state-of-the-art methods. The dataset, model, evaluation suite, and benchmark will be made publicly available on our project page.
Camouflaged objects are generally difficult to be detected in their natural environment even for human beings. In this paper, we propose a novel bio-inspired network, named the MirrorNet, that ...leverages both instance segmentation and bio-inspired attack stream for the camouflaged object segmentation. Differently from existing networks for segmentation, our proposed network possesses two segmentation streams: the main stream and the bio-inspired attack stream corresponding with the original image and its flipped image, respectively. The output from the bio-inspired attack stream is then fused into the main stream's result for the final camouflage map to boost up the segmentation accuracy. Extensive experiments conducted on the public CAMO dataset demonstrate the effectiveness of our proposed network. Our proposed method achieves 89% in accuracy, outperforming the state-of-the-arts.
Dictionary-guided Scene Text Recognition Nguyen, Nguyen; Nguyen, Thu; Tran, Vinh ...
2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
2021-June
Conference Proceeding
Language prior plays an important role in the way humans detect and recognize text in the wild. Current scene text recognition methods do use lexicons to improve recognition performance, but their ...naive approach of casting the output into a dictionary word based purely on the edit distance has many limitations. In this paper, we present a novel approach to incorporate a dictionary in both the training and inference stage of a scene text recognition system. We use the dictionary to generate a list of possible outcomes and find the one that is most compatible with the visual appearance of the text. The proposed method leads to a robust scene text recognition model, which is better at handling ambiguous cases encountered in the wild, and improves the overall performance of state-of-the-art scene text spotting frameworks. Our work suggests that incorporating language prior is a potential approach to advance scene text detection and recognition methods. Besides, we contribute VinText, a challenging scene text dataset for Vietnamese, where some characters are equivocal in the visual form due to accent symbols. This dataset will serve as a challenging benchmark for measuring the applicability and robustness of scene text detection and recognition algorithms. Code and dataset are available at https://github.com/VinAIResearch/dict-guided.
Despite their promise, circulating tumor DNA (ctDNA)-based assays for multi-cancer early detection face challenges in test performance, due mostly to the limited abundance of ctDNA and its inherent ...variability. To address these challenges, published assays to date demanded a very high-depth sequencing, resulting in an elevated price of test. Herein, we developed a multimodal assay called SPOT-MAS (screening for the presence of tumor by methylation and size) to simultaneously profile methylomics, fragmentomics, copy number, and end motifs in a single workflow using targeted and shallow genome-wide sequencing (~0.55×) of cell-free DNA. We applied SPOT-MAS to 738 non-metastatic patients with breast, colorectal, gastric, lung, and liver cancer, and 1550 healthy controls. We then employed machine learning to extract multiple cancer and tissue-specific signatures for detecting and locating cancer. SPOT-MAS successfully detected the five cancer types with a sensitivity of 72.4% at 97.0% specificity. The sensitivities for detecting early-stage cancers were 73.9% and 62.3% for stages I and II, respectively, increasing to 88.3% for non-metastatic stage IIIA. For tumor-of-origin, our assay achieved an accuracy of 0.7. Our study demonstrates comparable performance to other ctDNA-based assays while requiring significantly lower sequencing depth, making it economically feasible for population-wide screening.
Identifying polyps is challenging for automatic analysis of endoscopic images in computer-aided clinical support systems. Models based on convolutional networks (CNN), transformers, and their ...combinations have been proposed to segment polyps with promising results. However, those approaches have limitations either in modeling the local appearance of the polyps only or lack of multi-level feature representation for spatial dependency in the decoding process. This paper proposes a novel network, namely ColonFormer, to address these limitations. ColonFormer is an encoder-decoder architecture capable of modeling long-range semantic information at both encoder and decoder branches. The encoder is a lightweight architecture based on transformers for modeling global semantic relations at multi scales. The decoder is a hierarchical network structure designed for learning multi-level features to enrich feature representation. Besides, a refinement module is added with a new skip connection technique to refine the boundary of polyp objects in the global map for accurate segmentation. Extensive experiments have been conducted on five popular benchmark datasets for polyp segmentation, including Kvasir, CVC-Clinic DB, CVC-ColonDB, CVC-T, and ETIS-Larib. Experimental results show that our ColonFormer outperforms other state-of-the-art methods on all benchmark datasets.
The evolution of Internet of Things (IoT) networks has been studied owing to the associated benefits in useful applications. Although the evolution is highly helpful, the increasing day-to-day ...demands of mobile users have led to immense requirements for further performance improvements such as efficient spectrum utilization, massive device connectivity, and high data rates. Fortunately, reconfigurable intelligent surfaces (RIS) and non-orthogonal multiple access (NOMA) techniques have recently been introduced as two possible current-generation emerging technologies with immense potential of addressing the above-mentioned issues. In this paper, we propose the integration of RIS to the existing techniques (i.e., NOMA and relaying) to further enhance the performance for mobile users. We focus on a performance analysis of two-user group by exploiting two main performance metrics including outage probability and ergodic capacity. We provide closed-form expressions for both performance metrics to highlight how NOMA-aided RIS systems provide more benefits compared with the benchmark based on traditional orthogonal multiple access (OMA). Monte-Carlo simulations are performed to validate the correctness of obtained expressions. The simulations show that power allocation factors assigned to two users play a major role in the formation of a performance gap among two users rather than the setting of RIS. In particular, the strong user achieves optimal outage behavior when it is allocated 35% transmit power.
The Lifelog Search Challenge (LSC) is an international content retrieval competition that evaluates search for personal lifelog data. At the LSC, content-based search is performed over a multi-modal ...dataset, continuously recorded by a lifelogger over 27 days, consisting of multimedia content, biometric data, human activity data, and information activities data. In this work, we report on the first LSC that took place in Yokohama, Japan in 2018 as a special workshop at ACM International Conference on Multimedia Retrieval 2018 (ICMR 2018). We describe the general idea of this challenge, summarise the participating search systems as well as the evaluation procedure, and analyse the search performance of the teams in various aspects. We try to identify reasons why some systems performed better than others and provide an outlook as well as open issues for upcoming iterations of the challenge.
The research group of this study demonstrates how Nanosecond Pulsed Electric Field can be used to tune the localization and formation of conducting carbon black (CB) assembles into linear structures ...with various thicknesses inside an insulating polymer matrix. The Electrorheology phenomenon of CB assembles in pre-polymer of polysiloxane under application of either DC or nanosecond pulsed electric field was observed utilizing optical microscopy method. Comparing to the typical DC electric field which has a value of 1875 V/mm, the nanosecond pulsed electric field facilities the increase in its electric field strength; generated between two constructed electrodes with a space size of 160 μm, to a value reaching 7500 V/mm. This type of electric field can overcome the voltage breakdown that occurs within the tested materials. The conduction structure of CB forms linear assemblies that anchor the composite film surfaces inside the matrix, which could be developed to much thicker percolation structures over five times by the application control of the nanosecond pulsed electric fields. Furthermore, the formation of vertically upright electrical percolation structures attributed to the remarkable decrease of the electrical resistivity of the resulting composites to 3 order of magnitude compared to the composites with a uniform distribution of filler. The electrorheology phenomenon under pulsed field was also tested by the optical observation method. The thickness as well as the concentration of CB particles were able to be controlled via the increasing in the nanosecond pulsed electric field. The novelty of this study lies in the utilizing of nanosecond pulsed field with a high electric strength that overcomes the electrical breakdown during tuning the carbonaceous filler assemblies. This unique technology is energy saving through fabricating polymer-based conductive materials without using surface modification or increasing the filler content.
Authentic communication is an inevitable trend in the fourth industrial revolution. In a city, the convenient administrative formalities should be quick, accurate and secure. There are many different ...public authorities with different functionalities, and a citizen needs to register with all agencies and keeps the various information corresponding to each service. Clearly, single and central registration is necessary because it stops us from repeatedly providing personal data to many places. In this work, we propose a provable elliptic curve cryptography-based authentication scheme in multi-server architecture, where a person registers with a trusted center once and becomes authorized to access all related service-providers which can leave and join at will without any influence on current users and different providers. Our proposed scheme is suitable for many practical applications, such as smart-city or internet of things.