Existing hierarchical clustering algorithms involve a flat clustering component and an additional agglomerative or divisive procedure. This paper presents a density peak based hierarchical clustering ...method (DenPEHC), which directly generates clusters on each possible clustering layer, and introduces a grid granulation framework to enable DenPEHC to cluster large-scale and high-dimensional (LSHD) datasets. This study consists of three parts: (1) utilizing the distribution of the parameter γ, which is defined as the product of the local density ρ and the minimal distance to data points with higher density δ in “clustering by fast search and find of density peaks” (DPClust), and a linear fitting approach to select clustering centers with the clustering hierarchy decided by finding the “stairs” in the γ curve; (2) analyzing the leading tree (in which each node except the root is led by its parent to join the same cluster) as an intermediate result of DPClust, and constructing the clustering hierarchy efficiently based on the tree; and (3) designing a framework to enable DenPEHC to cluster LSHD datasets when a large number of attributes can be grouped by their semantics. The proposed method builds the clustering hierarchy by simply disconnecting the center points from their parents with a linear computational complexity O(m), where m is the number of clusters. Experiments on synthetic and real datasets show that the proposed method has promising efficiency, accuracy and robustness compared to state-of-the-art methods.
With the prevalence of smart meters, fine-grained subprofiles reveal more information about the aggregated load and further help improve the forecasting accuracy. Ensemble is an effective approach ...for load forecasting. It either generates multiple training datasets or applies multiple forecasting models to produce multiple forecasts. In this letter, a novel ensemble method is proposed to forecast the aggregated load with subprofiles where the multiple forecasts are produced by different groupings of subprofiles. Specifically, the subprofiles are first clustered into different groups and forecasting is conducted on the grouped load profiles individually. Thus, these forecasts can be summed to form the aggregated load forecast. In this way, different aggregated load forecasts can be obtained by varying the number of clusters. Finally, an optimal weighted ensemble approach is employed to combine these forecasts and provide the final forecasting result. Case studies are conducted on two open datasets and verify the effectiveness and superiority of the proposed method.
With the development of cloud manufacturing, the automation of storage scheduling becomes popular in the steel industry. However, the high customization of steel plates makes the storage scheduling ...too complex to be optimized. To overcome this challenge, we propose to utilize the big data of steel plate orders and warehouse status to optimize the storage scheduling of steel plates. The agglomerative hierarchical clustering is firstly adopted to reduce the complexity of excessive steel plate specifications, then an optimization problem is defined to formulate the storage scheduling of steel plates with safety. A two-stage heuristic (TSH) algorithm is proposed to solve the optimization problem with low complexity. In TSH, steel plates are firstly assigned to multiple stacks, and then the arrangement of each stack is determined. Experiments are executed based on a cloud manufacturing platform for steel plates production and storage, and the results prove the effectiveness of the proposed works.
The main question of this article is about whether cryptocurrencies, within their decentralization aspects, are a real commodity or/and a virtual currency. To resolve such a dilemma, we compare 7 ...cryptocurrencies with a sample of the three types of monetary systems: 28 fiat money, 2 commodities, 2 commodity based indices, and 3 financial market indices. We use the matrix correlation method. We display dendrograms and observe “hierarchy clustering”, as a function of data coarse graining.
In fact, we confirm that the cryptocurrencies are not decentralized. We observe also that most of the currencies in the world are not significantly correlated or present a weak correlation with cryptocurrencies. Our results show that the cryptocurrency market and Forex market belong to different system communities (or regions).
This paper mainly proposes a new method for clustering linear ordinal ranking (LOR) information by agglomerative hierarchical clustering (AHC) algorithm. Considering that the cores of the AHC ...algorithm for LOR clustering are the difference measure among different LORs and the aggregation of the individual LORs, we firstly systematically analyze the existing studies for LOR distance measure, based on which we extend the method to depict LORs. Subsequently, the corresponding new distance measure is proposed starting from the perspective of utilizing the rankings’ position information and relationship information together. In addition, we simplify the dominating index and dominated index-based aggregation method for LORs fusion. Further, we present a numerical case on online financial product recommendation to illustrate the usage of the algorithm and also try to provide a feasible way for online financial product recommendation. Then, we make some discussions on the proposed distance measure and the aggregation method under the framework of the AHC algorithm to show the features of the algorithm proposed in this paper.
On 28th September 2018, a very high magnitude of earthquake Mw 7.5 struck the Palu city in the Island of Sulawesi, Indonesia. The main objective of this research is to estimate the earthquake risk ...based on probability and hazard in Palu region using cross-correlation among the derived parameters, Silhouette clustering (SC), pure locational clustering (PLC) based on hierarchical clustering analysis (HCA), convolutional neural network (CNN) and analytical hierarchy process (AHP) techniques. There is no specific or simple way of identifying risks as the definition of risk varies with time and space. The main aim of this study is: i) to conduct the clustering analysis to identify the earthquake-prone areas, ii) to develop a CNN model for probability estimation, and iii) to estimate and compare the risk using two calculation equations (Risk A and B). Owing to its high prediction ability, the CNN model assessed the probability while SC and PLC were implemented to understand the spatial clustering, Euclidean distance among clusters, spatial relationship and cross-correlation among the estimated Mw, PGA and intensity including events depth. Finally, AHP was implemented for the vulnerability assessment. To this end, earthquake probability assessment (EPA), susceptibility to seismic amplification (SSA) and earthquake vulnerability assessment (EVA) results were employed to generate risk A, while earthquake hazard assessment (EHA), SSA and EVA were used to generate risk B. The risk maps were compared and the differences in results were obtained. This research concludes that in the case of earthquake risk assessment (ERA), results obtained in Risk B are better than the risk A. This study achieved 89.47% accuracy for EPA while for EVA a consistency ratio of 0.07. These results have important implications for future large-scale risk assessment, land use planning and hazard mitigation.
Display omitted
•Performed clustering analysis to identify earthquake potential zone at Palu, Indonesia•Implemented four techniques for earthquake risk assessment for the first time•Developed a CNN model for earthquake probability estimation•Estimated the risk based on two calculation methods
There has recently been a conscious push for cities in Europe to be smarter and more sustainable, leading to the need to benchmark these cities’ efforts using robust assessment frameworks. This paper ...ranks 28 European capital cities based on how smart and sustainable they are. Using hierarchical clustering and principal component analysis (PCA), we synthesized 32 indicators into 4 components and computed rank scores. The ranking of European capital cities was based on this rank score. Our results show that Berlin and other Nordic capital cities lead the ranking, while Sofia and Bucharest obtained the lowest rank scores, and are thus not yet on the path of being smart and sustainable. While our city rank scores show little correlation with city size and city population, there is a significant positive correlation with the cities’ GDP per inhabitant, which is an indicator for wealth. Lastly, we detect a geographical divide: 12 of the top 14 cities are Western European; 11 of the bottom 14 cities are Eastern European. These results will help cities understand where they stand vis-à-vis other cities, giving policy makers an opportunity to identify areas for improvement while leveraging areas of strength.
We give a full classification of representation types of the subcategories of representations of an m×n rectangular grid with monomorphisms (dually, epimorphisms) in one or both directions, which ...appear naturally in the context of clustering as two-parameter persistent homology in degree zero. We show that these subcategories are equivalent to the category of all representations of a smaller grid, modulo a finite number of indecomposables. This equivalence is constructed from a certain cotorsion torsion triple, which is obtained from a tilting subcategory generated by said indecomposables.
It is crucial to determine the optimal number of clusters for the clustering quality in cluster analysis. From the standpoint of sample geometry, two concepts, i.e., the sample clustering dispersion ...degree and the sample clustering synthesis degree, are defined, and a new clustering validity index is designed. Moreover, a method for determining the optimal number of clusters based on an agglomerative hierarchical clustering (AHC) algorithm is proposed. The new index and the method can evaluate the clustering results produced by the AHC and determine the optimal number of clusters for multiple types of datasets, such as linear, manifold, annular, and convex structures. Theoretical research and experimental results indicate the validity and good performance of the proposed index and the method.
Recent advancement of remote sensing technologies has brought in accurate, dense, and inexpensive city-scale Light Detection And Ranging (LiDAR) point clouds, which can be utilized to model city ...objects (e.g., buildings, roads, and automobiles) for creating Digital Twin Cities (DTCs). However, processing such unstructured point clouds is very challenging, epitomized by high cost, movable objects, limited object classes, and high information inadequacy/redundancy. We noticed that many city objects are not in random shapes; rather, they have invariant cross-sections following the Gestalt design principles, including proximity, connectivity, symmetry, and similarity. In this paper, we present a novel unsupervised method, called Clustering Of Symmetric Cross-sections of Objects (COSCO), to process urban LiDAR point clouds to a hierarchy of objects based on their characteristic cross-sections. First, city objects are segmented as connected patches of proximate 3D points. Then, symmetric cross-sections are detected for symmetric city objects. Finally, the taxonomy and groups of city objects are recognized from a hierarchical clustering analysis of the dissimilarity matrix. Experimental results showed that COSCO detected the correct taxonomy and types of 12 cars from 24,126 LiDAR points in 8.28 s. Based on the cross-sections and taxonomy, a digital twin was created by registering online free 3D car models in 29.58 s. The contribution of this paper is twofold. First, it presents an effective unsupervised method for understanding and developing DTC objects in LiDAR point clouds by harnessing innate Gestalt design principles. Secondly, COSCO can be an efficient LiDAR pre-processing tool for recognizing symmetric city objects’ cross-sections, positions, heading directions, dimensions, and possible types for smart city applications in GIScience, Architecture, Engineering, Construction and Operation (AECO), and autonomous vehicles.