Recent evidence suggests that a subpopulation of cancer cells, cancer stem cells (CSCs), is responsible for tumor growth in colorectal cancer. However, the role of CSCs in colorectal cancer ...metastasis is unclear. Here, we identified a subpopulation of CD26
+ cells uniformly present in both the primary and metastatic tumors in colorectal cancer patients with liver metastasis. Furthermore, in patients without distant metastasis at the time of presentation, the presence of CD26
+ cells in their primary tumors predicted distant metastasis on follow-up. Isolated CD26
+ cells, but not CD26
− cells, led to development of distant metastasis when injected into the mouse cecal wall. CD26
+ cells were also associated with enhanced invasiveness and chemoresistance. Our findings have uncovered a critical role of CSCs in metastatic progression of cancer. Furthermore, the ability to predict metastasis based on analysis of CSC subsets in the primary tumor may have important clinical implication as a selection criterion for adjuvant therapy.
► Metastatic colorectal cancers contain a subset of CD26
+ cancer stem cells ► Nonmetastatic tumors with CD26
+ CSCs frequently proceed to metastasis ► Isolated CD26
+ CSCs can initiate distant metastasis in a mouse model ► CD26
+ CSCs show enhanced invasiveness and migratory potential
We present a novel method for hierarchical topic detection where topics are obtained by clustering documents in multiple ways. Specifically, we model document collections using a class of graphical ...models called hierarchical latent tree models (HLTMs). The variables at the bottom level of an HLTM are observed binary variables that represent the presence/absence of words in a document. The variables at other levels are binary latent variables that represent word co-occurrence patterns or co-occurrences of such patterns. Each latent variable gives a soft partition of the documents, and document clusters in the partitions are interpreted as topics. Latent variables at high levels of the hierarchy capture long-range word co-occurrence patterns and hence give thematically more general topics, while those at low levels of the hierarchy capture short-range word co-occurrence patterns and give thematically more specific topics. In comparison with LDA-based methods, a key advantage of the new method is that it represents co-occurrence patterns explicitly using model structures. Extensive empirical results show that the new method significantly outperforms the LDA-based methods in term of model quality and meaningfulness of topics and topic hierarchies.
Natural language processing (NLP) is an effective tool for generating structured information from unstructured data, the one that is commonly found in clinical trial texts. Such interdisciplinary ...research has gradually grown into a flourishing research field with accumulated scientific outputs available. In this study, bibliographical data collected from Web of Science, PubMed, and Scopus databases from 2001 to 2018 had been investigated with the use of three prominent methods, including performance analysis, science mapping, and, particularly, an automatic text analysis approach named structural topic modeling. Topical trend visualization and test analysis were further employed to quantify the effects of the year of publication on topic proportions. Topical diverse distributions across prolific countries/regions and institutions were also visualized and compared. In addition, scientific collaborations between countries/regions, institutions, and authors were also explored using social network analysis. The findings obtained were essential for facilitating the development of the NLP-enhanced clinical trial texts processing, boosting scientific and technological NLP-enhanced clinical trial research, and facilitating inter-country/region and inter-institution collaborations.
With the growing availability and popularity of sentiment-rich resources like blogs and online reviews, new opportunities and challenges have emerged regarding the identification, extraction, and ...organization of sentiments from user-generated documents or sentences. Recently, many studies have exploited lexicon-based methods or supervised learning algorithms to conduct sentiment analysis tasks separately; however, the former approaches ignore contextual information of sentences and the latter ones do not take sentiment information embedded in sentiment words into consideration. To tackle these limitations, we propose a new model named Sentiment Convolutional Neural Network (SentiCNN) to analyze the sentiments of sentences with both contextual and sentiment information of sentiment words, in which, contextual information is captured from word embeddings and sentiment information is identified using existing lexicons. We incorporate a Highway Network into our model to adaptively combine sentiment and contextual information from sentences by strengthening the connection between features of both sentences and their sentiment words. Furthermore, we propose three lexicon-based attention mechanisms (LBAMs) for our SentiCNN model to find the most important indicators of sentiments and make predictions more effectively. Experiments over two well-known datasets indicate that sentiment words, the Highway Network, and LBAMs contribute to sentiment analysis.
Real-world data are often multifaceted and can be meaningfully clustered in more than one way. There is a growing interest in obtaining multiple partitions of data. In previous work we learnt from ...data a latent tree model (LTM) that contains multiple latent variables (Chen et al. 2012). Each latent variable represents a soft partition of data and hence multiple partitions result in. The LTM approach can, through model selection, automatically determine how many partitions there should be, what attributes define each partition, and how many clusters there should be for each partition. It has been shown to yield rich and meaningful clustering results.
Our previous algorithm EAST for learning LTMs is only efficient enough to handle data sets with dozens of attributes. This paper proposes an algorithm called BI that can deal with data sets with hundreds of attributes. We empirically compare BI with EAST and other more efficient LTM learning algorithms, and show that BI outperforms its competitors on data sets with hundreds of attributes. In terms of clustering results, BI compares favorably with alternative methods that are not based on LTMs.
► We propose a generalization of Gaussian mixture models to allow multiple clusterings. ► We compare the facet determination approach and the variable selection approach to model-based clustering. ► ...We demonstrate that facet determination usually leads to better clustering results than variable selection. ► Analysis on NBA data demonstrates the effectiveness of using PLTMs for facet determination.
Variable selection is an important problem for cluster analysis of high-dimensional data. It is also a difficult one. The difficulty originates not only from the lack of class information but also the fact that high-dimensional data are often multifaceted and can be meaningfully clustered in multiple ways. In such a case the effort to find one subset of attributes that presumably gives the “best” clustering may be misguided. It makes more sense to identify various facets of a data set (each being based on a subset of attributes), cluster the data along each one, and present the results to the domain experts for appraisal and selection. In this paper, we propose a generalization of the Gaussian mixture models and demonstrate its ability to automatically identify natural facets of data and cluster data along each of those facets simultaneously. We present empirical results to show that facet determination usually leads to better clustering results than variable selection.
This paper is concerned with model-based clustering of discrete data. Latent class models (LCMs) are usually used for this task. An LCM consists of a latent variable and a number of attributes. It ...makes the overly restrictive assumption that the attributes are conditionally independent given the latent variable. We propose a novel method to relax this assumption. The key idea is to partition the attributes into groups such that correlations among the attributes in each group can be properly modeled by using a single latent variable. The latent variables for the attribute groups are then used to build a number of models, and one of them is chosen to produce the clustering results. The new method produces unidimensional clustering using latent tree models and is named UC-LTM. Extensive empirical studies were conducted to compare UC-LTM with several model-based and distance-based clustering methods. UC-LTM outperforms the alternative methods in most cases, and the differences are often large. Further, analysis on real-world social capital data further shows improved results given by UC-LTM over results given by LCMs in a previous study.
•A model-based unidimensional clustering method called UC-LTM is proposed.•Experimental results show that UC-LTM outperforms latent class models in clustering.•Interesting clusterings are found on a real-world social capital data set.
LTC: A latent tree approach to classification Wang, Yi; Zhang, Nevin L.; Chen, Tao ...
International journal of approximate reasoning,
June 2013, 2013-06-00, Letnik:
54, Številka:
4
Journal Article
Recenzirano
Odprti dostop
Latent tree models were proposed as a class of models for unsupervised learning, and have been applied to various problems such as clustering and density estimation. In this paper, we study the ...usefulness of latent tree models in another paradigm, namely supervised learning. We propose a novel generative classifier called latent tree classifier (LTC). An LTC represents each class-conditional distribution of attributes using a latent tree model, and uses Bayes rule to make prediction. Latent tree models can capture complex relationship among attributes. Therefore, LTC is able to approximate the true distribution behind data well and thus achieves good classification accuracy. We present an algorithm for learning LTC and empirically evaluate it on an extensive collection of UCI data. The results show that LTC compares favorably to the state-of-the-art in terms of classification accuracy. We also demonstrate that LTC can reveal underlying concepts and discover interesting subgroups within each class.