This paper is the first in a two-part series analyzing human arm and hand motion during a wide range of unstructured tasks. The wide variety of motions performed by the human arm during daily tasks ...makes it desirable to find representative subsets to reduce the dimensionality of these movements for a variety of applications, including the design and control of robotic and prosthetic devices. This paper presents a novel method and the results of an extensive human subjects study to obtain representative arm joint angle trajectories that span naturalistic motions during Activities of Daily Living (ADLs). In particular, we seek to identify sets of useful motion trajectories of the upper limb that are functions of a single variable, allowing, for instance, an entire prosthetic or robotic arm to be controlled with a single input from a user, along with a means to select between motions for different tasks. Data driven approaches are used to discover clusters and representative motion averages for the wrist 3 degree of freedom (DOF), elbow-wrist 4 DOF, and full-arm 7 DOF motions. The proposed method makes use of well-known techniques such as dynamic time warping (DTW) to obtain a divergence measure between motion segments, Ward's distance criterion to build hierarchical trees, and functional principal component analysis (fPCA) to evaluate cluster variability. The emerging clusters associate various recorded motions into primarily hand start and end location for the full-arm system, motion direction for the wrist-only system, and an intermediate between the two qualities for the elbow-wrist system.
Band selection, considered as an effective dimensionality reduction technique for hyperspectral imagery (HSI), has become a hot topic for decades. Although various clustering-based methods have been ...applied to band selection, only a few studies explored the hierarchical structure among different spectral bands. And with regard to conventional hierarchical clustering, implemented in an agglomerative manner, both efficiency and accuracy of band selection still remain to rise. Moreover, the noise sensitivity is a defect inherent in the procedure of clustering. To address these issues, we propose a divisive hierarchical clustering approach (DHCA) to hyperspectral band selection. Inspired by divisive analysis, DHCA is designed to obtain any number of band subsets, which captures the intrinsic hierarchy of hyperspectral bands simultaneously. By introducing the local density into average dissimilarity, it can suppress the outliers clustering separately. Also, given the order of the spectrum, channel interval makes the similarity more rational among bands. Finally, we select a representative band in each cluster from the information viewpoint to ensure the band subset with a high quality. Extensive experiments on three real public HSI datasets fully validate the superiority of the proposed method against state-of-the-art competitors.
The socioeconomic costs of traffic congestion are crippling in urbanising China. This study is a first attempt to explore the spatiotemporal pattern of traffic congestion performance in 77 Chinese ...large cities by using real-time big data. Based upon the hourly real-time traffic performance index data collected between August 27, 2019 to September 27, 2019, four clusters of cities have been captured in line with their urban traffic congestion pattern between different days of the week. Empirical results unveil that urban traffic congestion performance varies substantially between surveyed cities on the same day of the week. Cities with advanced planning and delivery of urban road network, and well-developed urban public transportation system exhibit better traffic performance. Cities with relatively smaller scale of urban population, higher per capita road area and less amount of vehicles have also achieved relatively smooth traffic congestion performance. Particularly, cities in Northeast China tend to have earlier morning and evening peak hours than other sample cities, and the Northeastern cities are overwhelmingly diagnosed with more severe traffic congestion. Moreover, the traffic congestion patterns of surveyed large cities in China present obvious variations between different days of the week. This study provides valuable lens to understand the variegated pattern of traffic congestion performance between different regions in urban China, based upon which targeted policy recommendations have been synthesised to help alleviate urban traffic congestion and further improve urban well-being across the country.
•A first attempt to investigate traffic congestion performance across large cites by using hourly real-time big data•Urban traffic congestion performance varies substantially between Chinese large cities.•Four city-clusters are captured in referring to their traffic congestion performance.•Potential root causes of traffic congestion performance in these categorical cities are analyzed.•Tailored policy recommendations are formulated to help facilitate public policy making to alleviate urban traffic congestion
•A new Gaussian mixture learning method is proposed.•The method combines hierarchical clustering with expectation-maximization.•Adaptive splitting is designed for hierarchical clustering.•The method ...excels in both efficiency and accuracy.
In signal processing, a large number of samples can be generated by a Monte Carlo method and then encoded as a Gaussian mixture model for compactness in computation, storage, and communication. With a large number of samples to learn from, the computational efficiency of Gaussian mixture learning becomes important. In this paper, we propose a new method of Gaussian mixture learning that works both accurately and efficiently for large datasets. The proposed method combines hierarchical clustering with the expectation-maximization algorithm, with hierarchical clustering providing an initial guess for the expectation-maximization algorithm. We also propose adaptive splitting for hierarchical clustering, which enhances the quality of the initial guess and thus improves both the accuracy and efficiency of the combination. We validate the performance of the proposed method in comparison with existing methods through numerical examples of Gaussian mixture learning and its application to distributed particle filtering.
Objectives:
The pathogenesis of heterogeneity in gastric cancer (GC) is not clear and presents as a significant obstacle in providing effective drug treatment. We aimed to identify subtypes of GC and ...explore the underlying pathogenesis.
Methods:
We collected two microarray datasets from GEO (GSE84433 and GSE84426), performed an unsupervised cluster analysis based on gene expression patterns, and identified related immune and stromal cells. Then, we explored the possible molecular mechanisms of each subtype by functional enrichment analysis and identified related hub genes.
Results:
First, we identified three clusters of GC by unsupervised hierarchical clustering, with average silhouette width of 0.96, and also identified their related representative genes and immune cells. We validated our findings using dataset GSE84426. Subtypes associated with the highest mortality (subtype 2 in the training group and subtype C in the validation group) showed high expression of SPARC, COL3A1, and CCN. Both subtypes also showed high infiltration of fibroblasts, endothelial cells, hematopoietic stem cells, and a high stromal score. Furthermore, subtypes with the best prognosis (subtype 3 in the training group and subtype A in the validation group) showed high expression of FGL2, DLGAP1-AS5, and so on. Both subtypes also showed high infiltration of CD4
+
T cells, CD8
+
T cells, NK cells, pDC, macrophages, and CD4
+
T effector memory cells.
Conclusion:
We found that GC can be classified into three subtypes based on gene expression patterns and cell composition. Findings of this study help us better understand the tumor microenvironment and immune milieu associated with heterogeneity in GC and provide practical information to guide personalized treatment.
•Propose a very simple algorithm for convex clustering.•The algorithm can produce merging-only clustering paths; that is, a clustering path doesn’t have any split.•The algorithm can be applied to ...cases where clusters are non-convex.
This paper proposes an exceptionally simple algorithm, called forward-stagewise clustering, for convex clustering. Convex clustering has drawn recent attention since it nicely addresses the instability issue of traditional non-convex clustering methods. While existing algorithms can precisely solve convex clustering problems, they are sophisticated and produce (agglomerative) clustering paths that contain splits. This motivates us to propose an algorithm that only produces no-split clustering paths. The approach undertaken here follows the line of research initiated in the area of regression. Specifically, we apply the forward-stagewise technique to clustering problems and prove that the algorithm can only produce no-split clustering paths. We then modify the forward-stagewise clustering algorithm to deal with noise and outliers. We further suggest rules of thumb for the algorithm to be applicable to cases where clusters are non-convex. The performance of the proposed algorithm is evaluated through simulations and a real data application.
Social determinants of health (SDoH) have become an increasingly important area to acknowledge and address in healthcare; however, dealing with these measures in outcomes research can be challenging ...due to the inherent collinearity of these factors. Here we discuss our experience utilizing three statistical methods-exploratory factor analysis (FA), hierarchical clustering, and latent class analysis (LCA)-to analyze data collected using an electronic medical record social risk screener called Protocol for Responding to and Assessing Patient Assets, Risks, and Experience (PRAPARE). The PRAPARE tool is a standardized instrument designed to collect patient-reported data on SDoH factors, such as income, education, housing, and access to care. A total of 2380 patients had complete PRAPARE and neighborhood-level data for analysis. We identified a total of three composite SDoH clusters using FA, along with four clusters identified through hierarchical clustering, and four latent classes of patients using LCA. Our results highlight how different approaches can be used to handle SDoH, as well as how to select a method based on the intended outcome of the researcher. Additionally, our study shows the usefulness of employing multiple statistical methods to analyze complex SDoH gathered using social risk screeners such as the PRAPARE tool.
In the context of large-scale grid connection of new energy, short-term load forecasting is a vital and challenging task for power system to balance supply and demand. To effectively improve the ...forecasting accuracy, a new load forecasting method is proposed aiming to mine the characteristics of load data and study the application of artificial intelligence algorithms. In this paper, the seasonal and trend decomposition using loess (STL) method is firstly applied to decompose the load data into the trend, seasonal and residual components and the residual component with the highest complexity is further decomposed by the complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) approach. Secondly, in order to reduce the number of components, the improved hierarchical clustering technique is proposed to cluster all intrinsic mode functions (IMFs) obtained by CEEMDAN into high-frequency and low-frequency components. Then, different network models are trained to get the prediction results for different components, and the total load prediction value is achieved by stacking all of them. Finally, the national demand dataset of Great Britain in 2021–2022 is used to conduct the ablation and comparative experiments. The mean absolute percentage error (MAPE) and the root mean square error (RMSE) of the proposed method are 2.064% and 724.01 MW, respectively, which verified the effectiveness and advancement of the proposed method.
•Features of load data are effectively extracted by secondary decomposition method.•The improved clustering method is proposed to reduce the number of IMFs.•A combined forecasting model is built based on the features of different components.•Ablation and comparative experiments are given to prove the superiority of the model.
Background
Maintaining a healthy weight can reduce the risk of developing many diseases, including type 2 diabetes, hypertension, and certain types of cancers. Online social media platforms are ...popular among people seeking social support regarding weight loss and sharing their weight loss experiences, which provides opportunities for learning about weight loss behaviors.
Objective
This study aimed to investigate the extent to which the content posted by users in the r/loseit subreddit, an online community for discussing weight loss, and online interactions were associated with their weight loss in terms of the number of replies and votes that these users received.
Methods
All posts that were published before January 2018 in r/loseit were collected. We focused on users who revealed their start weight, current weight, and goal weight and were active in this online community for at least 30 days. A topic modeling technique and a hierarchical clustering algorithm were used to obtain both global topics and local word semantic clusters. Finally, we used a regression model to learn the association between weight loss and topics, word semantic clusters, and online interactions.
Results
Our data comprised 477,904 posts that were published by 7660 users within a span of 7 years. We identified 25 topics, including food and drinks, calories, exercises, family members and friends, and communication. Our results showed that the start weight (β=.823; P<.001), active days (β=.017; P=.009), and median number of votes (β=.263; P=.02), mentions of exercises (β=.145; P<.001), and nutrition (β=.120; P<.001) were associated with higher weight loss. Users who lost more weight might be motivated by the negative emotions (β=−.098; P<.001) that they experienced before starting the journey of weight loss. In contrast, users who mentioned vacations (β=−.108; P=.005) and payments (β=−.112; P=.001) tended to experience relatively less weight loss. Mentions of family members (β=−.031; P=.03) and employment status (β=−.041; P=.03) were associated with less weight loss as well.
Conclusions
Our study showed that both online interactions and offline activities were associated with weight loss, suggesting that future interventions based on existing online platforms should focus on both aspects. Our findings suggest that online personal health data can be used to learn about health-related behaviors effectively.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, UILJ, UKNU, UL, UM, UPUK
Colon adenocarcinoma (COAD) is the primary factor responsible for cancer-related mortalities in western countries, and its development and progression are affected by altered sphingolipid metabolism. ...The current study aimed at investigating the effects of sphingolipid metabolism-related (SLP) genes on multiple human cancers, especially on COAD. We obtained 1287 SLP genes from the GeneCard and MsigDb databases along with the public transcriptome data and the related clinical information. The univariate Cox regression analysis suggested that 26 SLP genes were substantially related to the prognosis of COAD, and a majority of SLP genes served as the risk genes for the tumor, insinuating a potential pathogenic effect of SLP in COAD development. Pan-cancer characterization of SLP genes summarized their expression traits, mutation traits, and methylation levels. Subsequently, we focused on the thorough research of COAD. With the help of unsupervised clustering, 1008 COAD patients were successfully divided into two distinct subtypes (C1 and C2). C1 subtype is characterized by a poor prognosis, activation of SLP pathways, high expression of SLP genes, disordered carcinogenic pathways, and immune microenvironment. Based on the clusters of SLP, we developed and validated a novel prognostic model, consisting of ANO1, C2CD4A, EEF1A2, GRP, HEYL, IGF1, LAMA2, LSAMP, RBP1, and TCEAL2, to quantitatively evaluate the clinical outcomes of COAD. The Kaplain-Meier survival curves and ROC curves highlighted the accuracy of our SLP model in both internal and external cohorts. Compared to normal colon tissues, expression of C2CD4A was detected to be significantly higher in COAD; whereas, expression levels of EEF1A2, IGF1, and TCEAL2 were detected to be significantly lower in COAD. Overall, our research emphasized the pathogenic role of SLP in COAD and found that targeting SLP might help improve the clinical outcomes of COAD. The risk model based on SLP metabolism provided a new horizon for prognosis assessment and customized patient intervention.