Multi-Orientation Scene Text Detection with Adaptive Clustering Yin, Xu-Cheng; Pei, Wei-Yi; Zhang, Jun ...
IEEE transactions on pattern analysis and machine intelligence,
2015-Sept.-1, 2015-Sep, 2015-9-1, 20150901, Letnik:
37, Številka:
9
Journal Article
Recenzirano
Text detection in natural scene images is an important prerequisite for many content-based image analysis tasks, while most current research efforts only focus on horizontal or near horizontal scene ...text. In this paper, first we present a unified distance metric learning framework for adaptive hierarchical clustering, which can simultaneously learn similarity weights (to adaptively combine different feature similarities) and the clustering threshold (to automatically determine the number of clusters). Then, we propose an effective multi-orientation scene text detection system, which constructs text candidates by grouping characters based on this adaptive clustering. Our text candidates construction method consists of several sequential coarse-to-fine grouping steps: morphology-based grouping via single-link clustering, orientation-based grouping via divisive hierarchical clustering, and projection-based grouping also via divisive clustering. The effectiveness of our proposed system is evaluated on several public scene text databases, e.g., ICDAR Robust Reading Competition data sets (2011 and 2013), MSRA-TD500 and NEOCR. Specifically, on the multi-orientation text data set MSRA-TD500, the <inline-formula><tex-math>f</tex-math> <inline-graphic xlink:type="simple" xlink:href="yin-ieq1-2388210.gif"/> </inline-formula> measure of our system is <inline-formula><tex-math>71</tex-math> <inline-graphic xlink:type="simple" xlink:href="yin-ieq2-2388210.gif"/> </inline-formula> percent, much better than the state-of-the-art performance. We also construct and release a practical challenging multi-orientation scene text data set (USTB-SV1K), which is available at http://prir.ustb.edu.cn/TexStar/MOMV-text-detection/.
•Operational Trend Prediction and Classification for Chemical Processes.•Convolutional Neural Network Method Based on Symbolic Hierarchical Clustering.•Application to a Chemical Plant via Industrial ...Data.•Favorable Comparison with Traditional Neural Network Methods.
In modern industrial chemical engineering plants, the quality of the product is closely related not only to the process design but also to the efficiency of human operation. Currently, single-step prediction models are adopted by process engineers to estimate the immediate system response. However, those single-step prediction models are limited as they don’t enable the operator to visualize the complete series of effects associated with the operation in the long run. In order to help make prescient predictions, this paper proposes a novel symbolic hierarchical clustering (SHC) based convolutional neural network (CNN) method for trend prediction and classification. Firstly, the raw historical operation data series are symbolized from numerical values to strings according to their distinct characteristics. Secondly, the hierarchical clustering method is used to eliminate the low-frequency operation trends and to determine and label the types of operational trends for the symbolized dataset. Subsequently, the categorized dataset and its respective label are fed into a specially tailored CNN for the training of the CNN model for trend classification. Finally, to demonstrate the effectiveness of the proposed SHC-CNN algorithm, the proposed method is applied to the methanol production process of Hainan Petrochemical Co., Ltd. to predict and classify its main operational trends. In addition, the superiority of SHC-CNN operational trend prediction is demonstrated through the comparison with traditional neural networks.
Clustering is a complex data mining tool, useful to identify similarities in large amount of data, the medical databases being highly suitable in this regard. Our paper aims to compare the efficacy ...of two well-known clustering methods, the n-means algorithm and the classical hierarchical algorithm, and to apply them in analyzing a medical-economic database on dietary habits, social economic status and oral health in a sample of 326 men, aged between 25 and 30, living in the urban area – in order to identify possible associations between dietary habits and income levels. We identified 4 clusters which correspond partially to the 4 income levels recorded in the investigated sample and reveal the associated dietary habits. The n-means clustering performed better than the Single Linkage hierarchical classification, being therefore highly suitable in the analysis of socio-economic and general health data.
Certain theoretical aspects of the stability of Russian banks under risk conditions have been studied. The relevance is due to the fact that in conditions of market uncertainty and risk, approaches ...to ensure the stability of banks using artificial intelligence are increasingly being used. The goal is to identify patterns between the characteristics of Assets and ROA (Return on Assets), an indicator of return on assets, and obtain a forecast value of Sberbank’s net profit. The result of the study was hierarchical clustering, as well as the generated Deep Learning model Random Forest, which calculated the predicted value of the Sberbank’s net profit. The novelty lies in the fact that the work puts forward and proves the hypothesis that using the Random Forest Deep learning model, a forecast of the net profit of commercial banks can be obtained, which predetermines the stability and dynamics of their development. The conclusions from the study boil down to the fact that a Deep Learning model Random Forest was developed to forecast the amount of net profit, which for Sberbank for 2023 amounted to 38,631 billion rubles, which coincided with its actual value. The area of application of the results obtained is commercial banks.
Customer segmentation by web content mining Zhou, Jinfeng; Wei, Jinliang; Xu, Bugao
Journal of retailing and consumer services,
July 2021, 2021-07-00, Letnik:
61
Journal Article
Recenzirano
This article introduces a new dimension, Interpurchase Time (T), into the existing RFM (Recency, Frequency, and Monetary) model to form an expanded RFMT model for parsing consumers' online purchase ...sequences in a long period to implement customer segmentation. The proposed RFMT model can track and discern changes in customer purchasing behaviors during their whole shopping cycle. Firstly, a web content retrieving system was developed to fetch publicly available customer data on a retailer's website, including demographic information (gender, age, location, etc.) and product information (name, price, date, etc.) of each purchase in a period from 2008 to 2019. The RFMT values of a customer were then computed from the retrieved data and subsequently analyzed by the hierarchical clustering to derive seven homogeneous clusters with specific customer profiles. Subsequently, demographic features and product preferences were identified for each cluster with business insights that can help the retailer to improve customer relationships and to implement targeted recommendation strategies.
•Expanding the traditional RFM model to the RFMT model by introducing a new dimension, interpurchase time, to include information on shopping regularity in a long-term period.•Deriving seven characteristic customer clusters from a large dataset retrieved on a global retailer's website based on the RFMT model.•Profiling customer purchasing behaviors by the cluster analysis.
The high number of migrants in the city of Yogyakarta has resulted in increased opportunities for Micro, Small and Medium Enterprises (MSMEs) in Culinary and Handicrafts. The large amount of data ...collected by the Cooperative Office, which reached thousands, caused inas to have difficulties in determining what training was needed by MSMEs and also difficulties in choosing which MSMEs would receive training held by the Cooperative Office. In addition, the Yogyakarta Cooperatives and UMKM Office had difficulties in selecting which UMKM needed to receive these trainings. Grouping can be used as a strategy in selecting MSMEs and determining training according to their individual needs. The purpose of this study was to group SMEs using the Agglomerative Hierarchical Clustering Single Linkage method and its application to provide recommendations for MSME groups to the Yogyakarta Cooperative and MSME Office. The results of the recommendations for the number of groups can be used in providing implementation, design, and evaluation of the development and empowerment of MSME data in the City of Yogyakarta. This study uses the Agglomerative Hierarchical Clustering Single Linkage method. The stages in this research are Load Data, Cleaning Data, Data Selection, Transformation Data, Clustering Process with AHC single linkage, Silhouette Coefficient, and Knowledge Representation. This research resulted in 2 group recommendations from a total of 1336 Culinary MSME data and 3 group recommendations from a total of 145 Handicraft MSME data. The results of the silhouette score test in the Culinary Sector are included in the strong structure category with a value of 0.79 and the Crafts Sector is included in the Medium Structure category with a value of 0.615. From the number of these groups, recommendations were obtained for improving a service in increasing MSMEs, especially those with a turnover of less than 10 million, marketing purposes within the Yogyakarta area, and not having financial assistance from the government. The high number of immigrants in the city of Yogyakarta has resulted in increased opportunities for Micro, Small and Medium Enterprises (MSMEs) in the Culinary and Crafts sector. The large number of MSMEs creates increasingly higher competitiveness. Apart from that, the large amount of data collected by the Department of Cooperatives and MSMEs, which reaches thousands, causes the Department to have difficulties in efforts to improve and empower these MSMEs. Grouping is one method that can be used as a strategy in mapping MSMEs, especially in efforts to improve and empower MSMEs through training conducted by the Department. The aim of this research is to group MSMEs using the Agglomerative Hierarchical Clustering (AHC) method in an effort to achieve strategies for improving and empowering MSMEs. The focus of this research isa1 MSMEs in the craft sector and MSMEs in the culinary sector. The results of this research provide 2 group recommendations from a total of 1336 Culinary MSME data and 3 group recommendations from a total of 145 Craft MSME data. The silhouette score test results in the Culinary Sector are in the strong structure category with a value of 0.79 and in the Crafts Sector are in the Medium Structure category with a value of 0.615. From the number of groups in the two MSMEs, strategies were obtained to improve and empower MSMEs, especially those with a turnover of less than 10 million, marketing objectives within the Yogyakarta area, and not having capital assistance from the government. a1the result of the revision of the Abstract
•A hierarchical clustering algorithm enhanced by multivariate probabilistic distance is proposed for damage detection.•Multivariate probabilistic distance between different TF vectors is analytically ...derived by Laplace approximation.•A function vectorization scheme is developed to facilitate data fusion and improve computational efficiency.•The new hierarchical clustering can accommodate the uncertainty and correlation of multiple TFs.•Case studies validate the advantages of the method over hierarchical clustering with deterministic distance.
This paper proposes a new damage detection method by integrating the advantage of transmissibility function (TF) as a health index sensitive to damage but robust to excitation and agglomerative hierarchical clustering (AHC) with intuitive explanation and visualization but avoiding specifying the number of clusters. Different from conventional AHC-based damage detection methods utilizing deterministic distance as a similarity metric and ignoring the distribution of structural features, a multivariate probabilistic distance-based similarity metric is proposed in this study to account for the uncertainty and correlation of multiple TFs following multivariate complex-valued Gaussian ratio distribution. To realize this, an analytically tractable approximation of the multivariate probabilistic distance is derived by Laplace’s asymptotic expansion to avoid high-dimensional numerical integration. To accelerate the computation of probabilistic distances over a wide frequency band that are fused to formulate the similarity metric in AHC, a function vectorization scheme is proposed to avoid the time-consuming loop operation among different frequency points. A threshold is established via bootstrapped Monte Carlo simulation to cut the dendrogram produced by AHC. Two case studies are used to validate the performance of the proposed method, indicating that, compared to the damage detection methods based on the deterministic distance of the TF, the proposed method exhibits better performance due to improving the similarity metric based on multivariate probabilistic distance properly accommodating the correlation of different TFs.
•Proposing a co-planning model for power generation, grid and energy storage.•Optimizing battery expansion for renewable integration and avoid grid expansion.•Modeling the upward and downward Ramp ...Reserve to maximize renewable integration.•Chronological Capturing of load demand and renewable power generation uncertainty.•Developing an accelerated Benders Dual Decomposition method to solve the model.
In this paper, an integrated multi-period model for long term expansion planning of electric energy transmission grid, power generation technologies, and energy storage devices is introduced. The proposed method gives the type, size and location of generation, transmission and storage devices to supply the electric load demand over the planning horizon. The sitting and sizing of Battery Energy Storage (BES) devices as flexible options is addressed to cover the intermittency of Renewable Energy Sources (RESs), mitigate lines congestion, and postpone the need for new transmission lines and power plants installation. For efficient handling of RESs uncertainties, and operational flexibility, the upward and downward Flexible Ramp Spinning Reserve (FRSR) are modeled. Besides, the Low-Carbon Policy (LCP) is considered in the objective function of the proposed Transmission, Generation, and Storage Expansion Planning (TGSEP) model. A hierarchical clustering method that can preserve the chronology of input time series throughout the planning horizon periods is developed to capture the short-term uncertainties of load demand and RESs. The short-term operational flexibility requirements make the joint long-term transmission and generation planning a high computational problem. Therefore, the Mixed-Integer Linear Programming (MILP) formulation of the model is solved using an accelerated Benders Dual Decomposition (BDD) method. The IEEE RTS test system is utilized to validate the effectiveness of the proposed joint expansion planning model.
Hierarchical Clustering Cohen-addad, Vincent; Kanade, Varun; Mallmann-trenn, Frederik ...
Journal of the ACM,
08/2019, Letnik:
66, Številka:
4
Journal Article
Recenzirano
Odprti dostop
Hierarchical clustering is a recursive partitioning of a dataset into clusters at an increasingly finer granularity. Motivated by the fact that most work on hierarchical clustering was based on ...providing algorithms, rather than optimizing a specific objective, Dasgupta framed similarity-based hierarchical clustering as a combinatorial optimization problem, where a “good” hierarchical clustering is one that minimizes a particular cost function 23. He showed that this cost function has certain desirable properties: To achieve optimal cost, disconnected components (namely, dissimilar elements) must be separated at higher levels of the hierarchy, and when the similarity between data elements is identical, all clusterings achieve the same cost.
We take an axiomatic approach to defining “good” objective functions for both similarity- and dissimilarity-based hierarchical clustering. We characterize a set of
admissible
objective functions having the property that when the input admits a “natural” ground-truth hierarchical clustering, the ground-truth clustering has an optimal value. We show that this set includes the objective function introduced by Dasgupta.
Equipped with a suitable objective function, we analyze the performance of practical algorithms, as well as develop better and faster algorithms for hierarchical clustering. We also initiate a beyond worst-case analysis of the complexity of the problem and design algorithms for this scenario.
We discuss some methods to quantitatively investigate the properties of correlation matrices. Correlation matrices play an important role in portfolio optimization and in several other quantitative ...descriptions of asset price dynamics in financial markets. Here, we discuss how to define and obtain hierarchical trees, correlation based trees and networks from a correlation matrix. The hierarchical clustering and other procedures performed on the correlation matrix to detect statistically reliable aspects of it are seen as filtering procedures of the correlation matrix. We also discuss a method to associate a hierarchically nested factor model to a hierarchical tree obtained from a correlation matrix. The information retained in filtering procedures and its stability with respect to statistical fluctuations is quantified by using the Kullback–Leibler distance.