The extent to which the geographic diversity of the U.S. plays a significant role in melanoma incidence and mortality over time has not been precisely characterized. We obtained age-adjusted melanoma ...data for the 50 states between the years 2001-2019 from the SEER registry and performed hierarchical clustering (complete linkage, Euclidean space) to uncover geotemporal trend groups over 2 decades. While there was a global increase in incidence during this time (b1=+0.41, p<0.0001), there were 6 distinct clusters (by absolute and Z-score) with significantly different temporal trends (ANCOVA p<0.0001). Cluster 2 (C2) states had the sharpest increase in incidence with b1=+0.66, p<0.0001. For mortality, the global rate decreased (b1=-0.03, p=.0003) with 3 and 6 clusters by absolute and Z scores, respectively (ANCOVA p<0.05). Cluster 1 (C1) states exhibited the smallest decline in mortality (b1=-0.017, p=0.008). Mortality to incidence ratios (MIRs) declined (b1=-0.0037, p<0.0001) and harbored 4 and 6 clusters by absolute and Z-score analysis, respectively (ANCOVA p<0.0001). Cluster 4 (C4) states had the lowest rate of MIR decline (b1=-0.003, p<0.0001). These results provide an unprecedented higher dimensional view of melanoma behavior over space and time. With more refined analyses, geospatial studies can uncover local trends which can inform public health agencies to more properly allocate resources.
Data clustering is an important tool in data mining, that helps to retrieve useful data from large amount of available data. In this digital era data is available in abundance, but finding useful ...data has become a challenging task. For this, data clustering is an effective and common approach where we can group data by seeing some pattern or inherent data similarity in one group. Clustering is an unsupervised learning method of linearly separable and nonlinearly separable clusters widely used for different nature of application 1. Data clustering finds application in classification of patterns in different areas such as artificial intelligence, summarization, learning, segmentation, speech recognition, pattern recognition, image segmentation, biology, marketing, data mining, modelling and system identification etc 52425. No one clustering technique can be said as best or better than other, because different clustering algorithms co-exists and are application specific. This paper majorly emphasises on critical review of clustering algorithms used in control systems, but a brief overview is also given about all major algorithms.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
STUDY DESIGN.Retrospective review of prospectively-collected, multicenter adult spinal deformity (ASD) databases.
OBJECTIVE.To apply artificial intelligence (AI)-based hierarchical clustering as a ...step toward a classification scheme that optimizes overall quality, value, and safety for ASD surgery.
SUMMARY OF BACKGROUND DATA.Prior ASD classifications have focused on radiographic parameters associated with patient reported outcomes. Recent work suggests there are many other impactful preoperative data points. However, the ability to segregate patient patterns manually based on hundreds of data points is beyond practical application for surgeons. Unsupervised machine-based clustering of patient types alongside surgical options may simplify analysis of ASD patient types, procedures, and outcomes.
METHODS.Two prospective cohorts were queried for surgical ASD patients with baseline, 1-year, and 2-year SRS-22/Oswestry Disability Index/SF-36v2 data. Two dendrograms were fitted, one with surgical features and one with patient characteristics. Both were built with Ward distances and optimized with the gap method. For each possible n patient cluster by m surgery, normalized 2-year improvement and major complication rates were computed.
RESULTS.Five hundred-seventy patients were included. Three optimal patient types were identifiedyoung with coronal plane deformity (YC, n = 195), older with prior spine surgeries (ORev, n = 157), and older without prior spine surgeries (OPrim, n = 218). Osteotomy type, instrumentation and interbody fusion were combined to define four surgical clusters. The intersection of patient-based and surgery-based clusters yielded 12 subgroups, with major complication rates ranging from 0% to 51.8% and 2-year normalized improvement ranging from −0.1% for SF36v2 MCS in cluster 1,3 to 100.2% for SRS self-image score in cluster 2,1.
CONCLUSION.Unsupervised hierarchical clustering can identify data patterns that may augment preoperative decision-making through construction of a 2-year risk–benefit grid. In addition to creating a novel AI-based ASD classification, pattern identification may facilitate treatment optimization by educating surgeons on which treatment patterns yield optimal improvement with lowest risk.Level of Evidence4
This article describes the implementation and use of the R package dbscan, which provides complete and fast implementations of the popular density-based clustering algorithm DBSCAN and the augmented ...ordering algorithm OPTICS. Package dbscan uses advanced open-source spatial indexing data structures implemented in C++ to speed up computation. An important advantage of this implementation is that it is up-to-date with several improvements that have been added since the original algorithms were publications (e.g., artifact corrections and dendrogram extraction methods for OPTICS). We provide a consistent presentation of the DBSCAN and OPTICS algorithms, and compare dbscan's implementation with other popular libraries such as the R package fpc, ELKI, WEKA, PyClustering, SciKit-Learn, and SPMF in terms of available features and using an experimental comparison.
Hierarchical clustering plays a crucial role in real-world knowledge discovery and data mining applications. This powerful technique provides tree-shaped results that are typically considered data ...summaries. However, achieving well-organized outputs requires a challenging trade-off between computational complexity (both in time and space) and clustering accuracy, especially in big data scenarios. To address this challenge, we propose a novel agglomerative algorithm for hierarchical clustering. Our algorithm constructs tree-shaped subclusters using a nearest-neighbour chain search. Next, the proxy (root) for each subcluster is identified using a local density peak detection mechanism, which guides the subsequent aggregation. Additionally, we propose a non-parametric variant to facilitate the easy implementation of the algorithm in real-world applications. Comprehensive experimental studies on fourteen real-world and synthetic datasets demonstrate that our algorithm surpasses other benchmarks in terms of clustering accuracy, response time, and memory footprint in most cases. Notably, our proposed algorithm can handle up to two million data points on a personal computer, further verifying its cost-effectiveness.
•A novel agglomerative clustering algorithm based on local density peaks is proposed.•A non-parametric variant based on multi-scope cutoff distances is proposed.•A probabilistic analysis is done to establish the theoretical correctness.•Extensive experiments on real-world datasets verify the advantage of our approach.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
We investigate the application of the Ordered Weighted Averaging (OWA) data fusion operator in agglomerative hierarchical clustering. The examined setting generalises the well-known single, complete ...and average linkage schemes. It allows to embody expert knowledge in the cluster merge process and to provide a much wider range of possible linkages. We analyse various families of weighting functions on numerous benchmark data sets in order to assess their influence on the resulting cluster structure. Moreover, we inspect the correction for the inequality of cluster size distribution – similar to the one in the Genie algorithm. Our results demonstrate that by robustifying the procedure with the Genie correction, we can obtain a significant performance boost in terms of clustering quality. This is particularly beneficial in the case of the linkages based on the closest distances between clusters, including the single linkage and its “smoothed” counterparts. To explain this behaviour, we propose a new linkage process called three-stage OWA which yields further improvements. This way we confirm the intuition that hierarchical cluster analysis should rather take into account a few nearest neighbours of each point, instead of trying to adapt to their non-local neighbourhood.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Many higher-education institutions have endeavored to understand students' characteristics in order to improve the quality of education. To this end, demographic information and questionnaire surveys ...have been used, and more recently, digital information from learning management systems and other sources has emerged for students' profiling. This study adopted a novel approach using semantic trajectory data created from smart card logs of campus buildings and class attendance records to investigate the relationship between students' trajectory patterns and academic performance. More than 4000 freshmen were observed per semester at the Songdo International Campus, Yonsei University, in South Korea during four semesters in 2016 and 2017. Dynamic time warping was newly adopted to calculate the similarities among student trajectories, and the similarities of students' trajectories were grouped by hierarchical clustering. Average grade point averages (GPAs) of the groups were evaluated and compared by major and gender. The results showed that the average GPAs were statistically different from each other in general, which confirmed the hypothesis that a student's trajectory differentiates a student's GPA. Furthermore, GPA was positively associated with students' degree of activeness in movement — the more accesses to campus facilities, the better the GPA. Besides, the differences in the average GPAs of the male groups were clearer than was the case for females, and the trajectory of the second semester better characterized an individual student. The study shows that a semantic trajectory pattern generated from location logs is a new and influential factor that can be utilized to understand students' characteristics in higher education and to predict their academic performances.
•The authors proposed students' semantic trajectory for student profiling.•Large datasets were collected from over 4000 students for two school years.•Dynamic time warping, hierarchical clustering, and ANOVA tests were conducted.•Students' semantic trajectory was proved to be a new and influential factor associated with academic performance.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
In this paper, we extended a hierarchical clustering technique, which is the most researched in the sensor network field, and studied a dynamic differential clustering technique to minimize energy ...consumption and ensure equal lifespan of all sensor nodes while considering the mobility of sinks. In a sensor network environment with mobile sinks, clusters close to the sinks tend to consume more forwarding energy. Therefore, clustering that considers forwarding energy consumption is desired. Since all clusters form a hierarchical tree, the number of levels of the tree must be considered based on the size of the cluster so that the cluster size is not growing abnormally, and the energy consumption is not concentrated within specific clusters. To verify that the proposed DDC protocol satisfies these requirements, a simulation using Matlab was performed. The FND (First Node Dead), LND (Last Node Dead), and residual energy characteristics of the proposed DDC protocol were compared with the popular clustering protocols such as LEACH and EEUC. As a result, it was shown that FND appears the latest and the point at which the dead node count increases is delayed in the DDC protocol. The proposed DDC protocol presents 66.3% improvement in FND and 13.8% improvement in LND compared to LEACH protocol. Furthermore, FND improved 79.9%, but LND declined 33.2% when compared to the EEUC. This verifies that the proposed DDC protocol can last for longer time with more number of surviving nodes
Power system capacity-expansion models are typically intractable if every operating period is represented. This issue is normally overcome by using a subset of representative operating periods. For ...instance, representative operating hours can be selected by discretizing the load-duration curve, which captures the effect of load levels on system-operation costs. This approach is inappropriate if system-operating costs depend on parameters other than load (e.g., renewable-resource availability) or if there are important intertemporal operating constraints (e.g., generator-ramping limits). This paper proposes the use of representative operating days, which are selected using clustering, to surmount these issues. We propose two hierarchical clustering techniques, which are designed to capture the important statistical features of the parameters (e.g., load and renewable-resource availability), in selecting representative days. This includes temporal autocorrelations and correlations between different locations. A case study, which is based on the Texan power system, is used to demonstrate the techniques. We show that our proposed clustering techniques result in investment decisions that closely match those made using the full unclustered dataset.
Hierarchical clustering techniques help in building a tree-like structure called dendrogram from the data points which can be used to find the closest related data objects. This paper presents a ...novel hierarchical clustering technique which considers intuitionistic fuzzy sets to deal with the uncertainty present in the data. Instead of using traditional hamming distance or Euclidean distance measure to find the distance between the data points, it employs the probabilistic Euclidean distance measure to propose a novel clustering approach which we term as ‘Probabilistic Intuitionistic Fuzzy Hierarchical Clustering (PIFHC) Algorithm’. The proposed PIFHC algorithm considers probabilistic weights from the data to measure the distances between the data points. Clustering results over UCI datasets show that our proposed PIFHC algorithm gives better cluster accuracies than its existing counterparts. PIFHC efficiently provides improvements of 1%–3.5% in the clustering accuracy compared to other fuzzy hierarchical clustering algorithms for most of the datasets. We further provide experimental results with the real-world car dataset and the Listeria monocytogenes dataset for mouse susceptibility to demonstrate the practical efficacy of the proposed algorithm. For Listeria datasets as well, proposed PIFHC records 1.7% improvement against the state-of-the-art methods The dendrograms formed by the proposed PIFHC algorithm exhibits high cophenetic correlation coefficient with an improvement of 0.75% over others. We provide various AGNES methods to update the distance between merged clusters in the proposed PIFHC algorithm.
•This paper presents a novel hierarchical clustering approach based on intutionistic fuzzy sets.•The proposed approach is termed as ‘probabilistic intuitionistic fuzzy hierarchical clustering (PIFHC)” algorithm.•PIFHC employs probabilistic Euclidean distance measure with different probabilistic weights for its different components.•Also presents methods to compute the distances of the merged cluster from other clusters.•Conducts extensive experiments over a number of benchmark and real-world datasets to demonstrate PIFHC’s superiority over others.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP