End-to-end text-spotting, which aims to integrate detection and recognition in a unified framework, has attracted increasing attention due to its simplicity of the two complimentary tasks. It remains ...an open problem especially when processing arbitrarily-shaped text instances. Previous methods can be roughly categorized into two groups: character-based and segmentation-based, which often require character-level annotations and/or complex post-processing due to the unstructured output. Here, we tackle end-to-end text spotting by presenting Adaptive Bezier Curve Network v2 (ABCNet v2). Our main contributions are four-fold: 1) For the first time, we adaptively fit arbitrarily-shaped text by a parameterized Bezier curve, which, compared with segmentation-based methods, can not only provide structured output but also controllable representation. 2) We design a novel BezierAlign layer for extracting accurate convolution features of a text instance of arbitrary shapes, significantly improving the precision of recognition over previous methods. 3) Different from previous methods, which often suffer from complex post-processing and sensitive hyper-parameters, our ABCNet v2 maintains a simple pipeline with the only post-processing non-maximum suppression (NMS). 4) As the performance of text recognition closely depends on feature alignment, ABCNet v2 further adopts a simple yet effective coordinate convolution to encode the position of the convolutional filters, which leads to a considerable improvement with negligible computation overhead. Comprehensive experiments conducted on various bilingual (English and Chinese) benchmark datasets demonstrate that ABCNet v2 can achieve state-of-the-art performance while maintaining very high efficiency. More importantly, as there is little work on quantization of text spotting models, we quantize our models to improve the inference time of the proposed ABCNet v2. This can be valuable for real-time applications. Code and model are available at: https://git.io/AdelaiDet .
•We explore the property that pattern in both Sobel and Canny share the same feature.•New invariant symmetric features to classify text and non-text pixels correctly.•We exploit SIFT features to ...eliminate non-text components.•Ellipse growing to extract curved text lines using text representatives.•New objective heuristics to eliminate false positives.
Text detection in the real world images captured in unconstrained environment is an important yet challenging computer vision problem due to a great variety of appearances, cluttered background, and character orientations. In this paper, we present a robust system based on the concepts of Mutual Direction Symmetry (MDS), Mutual Magnitude Symmetry (MMS) and Gradient Vector Symmetry (GVS) properties to identify text pixel candidates regardless of any orientations including curves (e.g. circles, arc shaped) from natural scene images. The method works based on the fact that the text patterns in both Sobel and Canny edge maps of the input images exhibit a similar behavior. For each text pixel candidate, the method proposes to explore SIFT features to refine the text pixel candidates, which results in text representatives. Next an ellipse growing process is introduced based on a nearest neighbor criterion to extract the text components. The text is verified and restored based on text direction and spatial study of pixel distribution of components to filter out non-text components. The proposed method is evaluated on three benchmark datasets, namely, ICDAR2005 and ICDAR2011 for horizontal text evaluation, MSRA-TD500 for non-horizontal straight text evaluation and on our own dataset (CUTE80) that consists of 80 images for curved text evaluation to show its effectiveness and superiority over existing methods.
The discipline of Automatic Text Summarization (ATS), which is expanding quickly, intends to automatically create summaries of enormous amounts of text so that readers can save time and effort. ATS ...is a rapidly growing field that aims to save readers time and effort by automatically generating summaries of large volumes of text. In recent years, significant advancements have been witnessed in this area, accompanied by challenges that have spurred extensive research. The proliferation of textual data has sparked substantial interest in ATS, which is thoroughly examined in this survey study. Researchers have been refining ATS techniques since the 1950s, primarily categorized as extractive, abstractive, or hybrid approaches. In the extractive approach, key sentences are extracted from the source document(s) and combined to form the summary, while the abstractive approach employs an intermediary representation of the input document(s) to generate a summary that may differ from the original text. Hybrid approaches combine elements of both extractive and abstractive methods. Despite various recommended methodologies, the generated summaries still exhibit noticeable differences compared to those created by humans. This research survey offers an inclusive exploration of ATS, covering its challenges, types, classifications, approaches, applications, methods, implementations, processing and preprocessing techniques, linguistic analysis, datasets, and evaluation measures, catering to the needs of researchers in the field.
Scene text recognition has made great progress in regular formats, and the recent research has focused on irregular text recognition. In this work, we investigate a new challenge problem of ...recognizing a text block instance with irregular arrangement of characters, which is referred to as irregular text block recognition (ITBR). This problem is prevalent in daily scenarios, especially with the increasing use of rich text designs in signboards, logos, posters, and other mediums. The primary challenge arises from the weakened position clues and the highly complex reading order, which can often only be deciphered by a heavy reliance on understanding the intrinsic linguistic information. Hence, conventional recognition methods that employ inflexible character grouping rules, coupled with positional information, or constrained by vocabulary reliance, may struggle with the ITBR problem. To this end, we propose a progressive layout reasoning network (PLRN) to recognize the irregular text block by decoupling visual, linguistic, and positional information. PLRN comprises a character spotting module that recognizes the character set based solely on visual features with a new TopK-rank decoding mechanism, and a linkage reasoning module to interpret the character relationships within this set with a progressive refinement strategy. The linkages are initially reasoned by linguistic information and then progressively refined through the incorporation of proximity and tendency information, allowing for explicit decoupling and improved reasoning accuracy. To assess the effectiveness of the proposed method, we construct a new dataset called TextBlock600. This dataset consists of 600 images of irregular text blocks, each with complete sequence annotations. Experimental results demonstrate that PLRN shows promising performance in ITBR, opening up possibilities for further research in this field. Code and datasets will be available at https://github.com/eezyli/PLRN.
•Proposing a new challenging task called irregular text block recognition (ITBR).•Designing a progressive layout reasoning network (PLRN) to address ITBR problem.•Two key modules for information decoupling: character spotting and linkage reasoning.•Constructing a new TB600 dataset with 600 text block images for benchmarking.•PLRN achieves superior performance over previous state-of-the-art methods on TB600.
•We propose a novel method named TextMountain which is composed of TS, TCBP and TCD.•Our experiments verify that using TCBP’s grad can greatly improve grouping text lines while TCD can help TCBP ...learning better.•The proposed method can well and efficiently handle long, multi-oriented and curved scene text.•The proposed method achieves state-of-the-art or comparable results in both quadrangle and curved polygon labeled datasets.
In this paper, we propose a novel scene text detection method named TextMountain. The key idea of TextMountain is making full use of border-center information. Different from previous works that treat center-border as a binary classification problem, we predict text center-border probability (TCBP) and text center-direction (TCD). The TCBP is just like a mountain whose top is text center and foot is text border. The mountaintop can separate text instances which cannot be easily achieved using semantic segmentation map and its rising direction can plan a road to top for each pixel on mountain foot at the group stage. The TCD helps TCBP learning better. Our label rules will not lead to the ambiguous problem with the transformation of angle, so the proposed method is robust to multi-oriented text and can also handle well curved text. In inference stage, each pixel at the mountain foot needs to search the path to the mountaintop and this process can be efficiently completed in parallel, yielding the efficiency of our method compared with others. The experiments on MLT, ICDAR2015, RCTW-17 and SCUT-CTW1500 datasets demonstrate that the proposed method achieves better or comparable performance in terms of both accuracy and efficiency. It is worth mentioning our method achieves an F-measure of 76.85% on MLT which outperforms the previous methods by a large margin. Code will be made available.
Text data present in images and video contain useful information for automatic annotation, indexing, and structuring of images. Extraction of this information involves detection, localization, ...tracking, extraction, enhancement, and recognition of the text from a given image. However, variations of text due to differences in size, style, orientation, and alignment, as well as low image contrast and complex background make the problem of automatic text extraction extremely challenging. While comprehensive surveys of related problems such as face detection, document analysis, and image & video indexing can be found, the problem of text information extraction is not well surveyed. A large number of techniques have been proposed to address this problem, and the purpose of this paper is to classify and review these algorithms, discuss benchmark data and performance evaluation, and to point out promising directions for future research.
•A novel Variable Global Feature Selection Scheme (VGFSS) is proposed.•VGFSS selects variable number of features from each class instead of equal features.•The selection of features in VGFSS is based ...on distribution of terms in the classes.•The methods are evaluated using Macro_F1 and Micro_F1 measure followed by Z-test.•The VGFSS algorithm outperforms among seven competing methods in benchmark datasets.
The feature selection is important to speed up the process of Automatic Text Document Classification (ATDC). At present, the most common method for discriminating feature selection is based on Global Filter-based Feature Selection Scheme (GFSS). The GFSS assigns a score to each feature based on its discriminating power and selects the top-N features from the feature set, where N is an empirically determined number. As a result, it may be possible that the features of a few classes are discarded either partially or completely. The Improved Global Feature Selection Scheme (IGFSS) solves this issue by selecting an equal number of representative features from all the classes. However, it suffers in dealing with an unbalanced dataset having large number of classes. The distribution of features in these classes are highly variable. In this case, if an equal number of features are chosen from each class, it may exclude some important features from the class containing a higher number of features. To overcome this problem, we propose a novel Variable Global Feature Selection Scheme (VGFSS) to select a variable number of features from each class based on the distribution of terms in the classes. It ensures that, a minimum number of terms are selected from each class. The numerical results on benchmark datasets show the effectiveness of the proposed algorithm VGFSS over classical information science methods and IGFSS.
Objectives
To examine the value of unstructured electronic health record (EHR) data (free‐text notes) in identifying a set of geriatric syndromes.
Design
Retrospective analysis of unstructured EHR ...notes using a natural language processing (NLP) algorithm.
Setting
Large multispecialty group.
Participants
Older adults (N=18,341; average age 75.9, 58.9% female).
Measurements
We compared the number of geriatric syndrome cases identified using structured claims and structured and unstructured EHR data. We also calculated these rates using a population‐level claims database as a reference and identified comparable epidemiological rates in peer‐reviewed literature as a benchmark.
Results
Using insurance claims data resulted in a geriatric syndrome prevalence ranging from 0.03% for lack of social support to 8.3% for walking difficulty. Using structured EHR data resulted in similar prevalence rates, ranging from 0.03% for malnutrition to 7.85% for walking difficulty. Incorporating unstructured EHR notes, enabled by applying the NLP algorithm, identified considerably higher rates of geriatric syndromes: absence of fecal control (2.1%, 2.3 times as much as structured claims and EHR data combined), decubitus ulcer (1.4%, 1.7 times as much), dementia (6.7%, 1.5 times as much), falls (23.6%, 3.2 times as much), malnutrition (2.5%, 18.0 times as much), lack of social support (29.8%, 455.9 times as much), urinary retention (4.2%, 3.9 times as much), vision impairment (6.2%, 7.4 times as much), weight loss (19.2%, 2.9 as much), and walking difficulty (36.34%, 3.4 as much). The geriatric syndrome rates extracted from structured data were substantially lower than published epidemiological rates, although adding the NLP results considerably closed this gap.
Conclusion
Claims and structured EHR data give an incomplete picture of burden related to geriatric syndromes. Geriatric syndromes are likely to be missed if unstructured data are not analyzed. Pragmatic NLP algorithms can assist with identifying individuals at high risk of experiencing geriatric syndromes and improving coordination of care for older adults.
The amount of digital text available for analysis by consumer researchers has risen dramatically. Consumer discussions on the internet, product reviews, and digital archives of news articles and ...press releases are just a few potential sources for insights about consumer attitudes, interaction, and culture. Drawing from linguistic theory and methods, this article presents an overview of automated text analysis, providing integration of linguistic theory with constructs commonly used in consumer research, guidance for choosing amongst methods, and advice for resolving sampling and statistical issues unique to text analysis. We argue that although automated text analysis cannot be used to study all phenomena, it is a useful tool for examining patterns in text that neither researchers nor consumers can detect unaided. Text analysis can be used to examine psychological and sociological constructs in consumer-produced digital text by enabling discovery or by providing ecological validity.