We consider model-based clustering methods for continuous, correlated data that account for external information available in the presence of mixed-type fixed covariates by proposing the MoEClust ...suite of models. These models allow different subsets of covariates to influence the component weights and/or component densities by modelling the parameters of the mixture as functions of the covariates. A familiar range of constrained eigen-decomposition parameterisations of the component covariance matrices are also accommodated. This paper thus addresses the equivalent aims of including covariates in Gaussian parsimonious clustering models and incorporating parsimonious covariance structures into all special cases of the Gaussian mixture of experts framework. The MoEClust models demonstrate significant improvement from both perspectives in applications to both univariate and multivariate data sets. Novel extensions to include a uniform noise component for capturing outliers and to address initialisation of the EM algorithm, model selection, and the visualisation of results are also proposed.
Docetaxel is the main treatment for advanced castration‐resistant prostate cancer; however, resistance eventually occurs. The development of intratumoral drug‐resistant subpopulations possessing a ...cancer stem cell (CSC) morphology is an emerging mechanism of docetaxel resistance, a process driven by epithelial–mesenchymal transition (EMT). This study characterised EMT in docetaxel‐resistant sublines through increased invasion, MMP‐1 production and ZEB1 and ZEB2 expression. We also present evidence for differential EMT across PC‐3 and DU145 in vitro resistance models as characterised by differential migration, cell colony scattering and susceptibility to the CSC inhibitor salinomycin. siRNA manipulation of ZEB1 and ZEB2 in PC‐3 and DU145 docetaxel‐resistant sublines identified ZEB1, through its transcriptional repression of E‐cadherin, to be a driver of both EMT and docetaxel resistance. The clinical relevance of ZEB1 was also determined through immunohistochemical tissue microarray assessment, revealing significantly increased ZEB1 expression in prostate tumours following docetaxel treatment. This study presents evidence for a role of ZEB1, through its transcriptional repression of E‐cadherin to be a driver of both EMT and docetaxel resistance in docetaxel‐resistant prostate cancer. In addition, this study highlights the heterogeneity of prostate cancer and in turn emphasises the complexity of the clinical management of docetaxel‐resistant prostate cancer.
This study investigated both ZEB1 and ZEB2 in docetaxel‐resistant prostate cancer and provides strong evidence for ZEB1, through its transcriptional repression of E‐cadherin to be a driver of EMT and docetaxel resistance. This was clinically validated, with patients treated with docetaxel exhibiting increased ZEB1 tumour expression. We also identified differential EMT across resistance models, thereby highlighting the heterogeneity of docetaxel‐resistant prostate cancer.
The diagnosis and treatment of prostate cancer (PCa) is a major health-care concern worldwide. This cancer can manifest itself in many distinct forms and the transition from clinically indolent PCa ...to the more invasive aggressive form remains poorly understood. It is now universally accepted that glycan expression patterns change with the cellular modifications that accompany the onset of tumorigenesis. The aim of this study was to investigate if differential glycosylation patterns could distinguish between indolent, significant, and aggressive PCa. Whole serum
-glycan profiling was carried out on 117 prostate cancer patients' serum using our automated, high-throughput analysis platform for glycan-profiling which utilizes ultra-performance liquid chromatography (UPLC) to obtain high resolution separation of
-linked glycans released from the serum glycoproteins. We observed increases in hybrid, oligomannose, and biantennary digalactosylated monosialylated glycans (M5A1G1S1, M8, and A2G2S1), bisecting glycans (A2B, A2(6)BG1) and monoantennary glycans (A1), and decreases in triantennary trigalactosylated trisialylated glycans with and without core fucose (A3G3S3 and FA3G3S3) with PCa progression from indolent through significant and aggressive disease. These changes give us an insight into the disease pathogenesis and identify potential biomarkers for monitoring the PCa progression, however these need further confirmation studies.
Objectives
To analyse the clinical utility of a prediction model incorporating both clinical information and a novel biomarker, p2PSA, in order to inform the decision for prostate biopsy in an Irish ...cohort of men referred for prostate cancer assessment.
Patients and Methods
Serum isolated from 250 men from three tertiary referral centres with pre‐biopsy blood draws was analysed for total prostate‐specific antigen (PSA), free PSA (fPSA) and p2PSA. From this, the Prostate Health Index (PHI) score was calculated (PHI = (p2PSA/fPSA)*√tPSA). The men's clinical information was used to derive their risk according to the Prostate Cancer Prevention Trial (PCPT) risk model. Two clinical prediction models were created via multivariable regression consisting of age, family history, abnormality on digital rectal examination, previous negative biopsy and either PSA or PHI score, respectively. Calibration plots, receiver‐operating characteristic (ROC) curves and decision curves were generated to assess the performance of the three models.
Results
The PSA model and PHI model were both well calibrated in this cohort, with the PHI model showing the best correlation between predicted probabilities and actual outcome. The areas under the ROC curve for the PHI model, PSA model and PCPT model were 0.77, 0.71 and 0.69, respectively, for the prediction of prostate cancer (PCa) and 0.79, 0.72 and 0.72, respectively, for the prediction of high grade PCa. Decision‐curve analysis showed a superior net benefit of the PHI model over both the PSA model and the PCPT risk model in the diagnosis of PCa and high grade PCa over the entire range of risk probabilities.
Conclusion
A logical and standardized approach to the use of clinical risk factors can allow more accurate risk stratification of men under investigation for PCa. The measurement of p2PSA and the integration of this biomarker into a clinical prediction model can further increase the accuracy of risk stratification, helping to better inform the decision for prostate biopsy in a referral population.
Classifying indolent prostate cancer represents a significant clinical challenge. We investigated whether integrating data from different omic platforms could identify a biomarker panel with improved ...performance compared to individual platforms alone. DNA methylation, transcripts, protein and glycosylation biomarkers were assessed in a single cohort of patients treated by radical prostatectomy. Novel multiblock statistical data integration approaches were used to deal with missing data and modelled via stepwise multinomial logistic regression, or LASSO. After applying leave‐one‐out cross‐validation to each model, the probabilistic predictions of disease type for each individual panel were aggregated to improve prediction accuracy using all available information for a given patient. Through assessment of three performance parameters of area under the curve (AUC) values, calibration and decision curve analysis, the study identified an integrated biomarker panel which predicts disease type with a high level of accuracy, with Multi AUC value of 0.91 (0.89, 0.94) and Ordinal C‐Index (ORC) value of 0.94 (0.91, 0.96), which was significantly improved compared to the values for the clinical panel alone of 0.67 (0.62, 0.72) Multi AUC and 0.72 (0.67, 0.78) ORC. Biomarker integration across different omic platforms significantly improves prediction accuracy. We provide a novel multiplatform approach for the analysis, determination and performance assessment of novel panels which can be applied to other diseases. With further refinement and validation, this panel could form a tool to help inform appropriate treatment strategies impacting on patient outcome in early stage prostate cancer.
In this study, we built a novel statistical model across multiple omic platforms to predict indolent and aggressive prostate cancer. We demonstrate using ROC, calibration and decision curves that our combined biomarker panel significantly improves on the prediction of indolent disease compared to current clinical features. This will inform appropriate treatment strategies impacting on patient outcomes in early stage prostate cancer.
Prostate cancer (PCa) represents a significant healthcare problem. The critical clinical question is the need for a biopsy. Accurate risk stratification of patients before a biopsy can allow for ...individualised risk stratification thus improving clinical decision making. This study aims to build a risk calculator to inform the need for a prostate biopsy.
Using the clinical information of 4801 patients an Irish Prostate Cancer Risk Calculator (IPRC) for diagnosis of PCa and high grade (Gleason ≥7) was created using a binary regression model including age, digital rectal examination, family history of PCa, negative prior biopsy and Prostate-specific antigen (PSA) level as risk factors. The discrimination ability of the risk calculator is internally validated using cross validation to reduce overfitting, and its performance compared with PSA and the American risk calculator (PCPT), Prostate Biopsy Collaborative Group (PBCG) and European risk calculator (ERSPC) using various performance outcome summaries. In a subgroup of 2970 patients, prostate volume was included. Separate risk calculators including the prostate volume (IPRCv) for the diagnosis of PCa (and high-grade PCa) was created.
IPRC area under the curve (AUC) for the prediction of PCa and high-grade PCa was 0.6741 (95% CI, 0.6591 to 0.6890) and 0.7214 (95% CI, 0.7018 to 0.7409) respectively. This significantly outperforms the predictive ability of cancer detection for PSA (0.5948), PCPT (0.6304), PBCG (0.6528) and ERSPC (0.6502) risk calculators; and also, for detecting high-grade cancer for PSA (0.6623) and PCPT (0.6804) but there was no significant improvement for PBCG (0.7185) and ERSPC (0.7140). The inclusion of prostate volume into the risk calculator significantly improved the AUC for cancer detection (AUC = 0.7298; 95% CI, 0.7119 to 0.7478), but not for high-grade cancer (AUC = 0.7256; 95% CI, 0.7017 to 0.7495). The risk calculator also demonstrated an increased net benefit on decision curve analysis.
The risk calculator developed has advantages over prior risk stratification of prostate cancer patients before the biopsy. It will reduce the number of men requiring a biopsy and their exposure to its side effects. The interactive tools developed are beneficial to translate the risk calculator into practice and allows for clarity in the clinical recommendations.
The Bayesian additive regression trees (BART) model is an ensemble method extensively and successfully used in regression tasks due to its consistently strong predictive performance and its ability ...to quantify uncertainty. BART combines “weak” tree models through a set of shrinkage priors, whereby each tree explains a small portion of the variability in the data. However, the lack of smoothness and the absence of an explicit covariance structure over the observations in standard BART can yield poor performance in cases where such assumptions would be necessary. The Gaussian processes Bayesian additive regression trees (GP-BART) model is an extension of BART which addresses this limitation by assuming Gaussian process (GP) priors for the predictions of each terminal node among all trees. The model's effectiveness is demonstrated through applications to simulated and real-world data, surpassing the performance of traditional modelling approaches in various scenarios.
The existing research on emerging roles in computer-supported collaborative learning (CSCL) has mostly focused on who did what rather than why, i.e., which variables led to the emergence of certain ...roles. Therefore, we aimed to bridge such a gap and investigate the variables that explain the emergence of roles. We used a large dataset of 173,838 interactions by 7054 students in 787 small groups. Two groups of variables were investigated: those related to other collaborators in the group —group size, cohesion, effort, dominance, distribution of participation and replies— as well as teacher factors —effort, influence, replies, collaborators size (ego), and uptake. The study used a novel person-centered method: mixture of experts model framework that incorporates the covariates into the model to quantify their magnitude of explanation of the emergence of the identified roles. Three roles were identified: leaders, mediators, and isolates. Our results show that leaders were likely to emerge regardless of the number of students per group and contribute to better participatory environments where more students are involved, and more posts are contributed by others and further discussed by diverse members. Mediators were more likely to emerge in averagely interactive and balanced groups, whereas isolates “lurked” in active groups which are dominated by few active students. We use our findings and a review of the literature, both in CSCL and in social sciences at large, to propose a framework —which updates the decade-old framework— for operationalization and understanding of the social roles and the factors that drive their emergence.
The paper shows how to identify different collaborative roles and the factors that are conducive of productive collaboration. Understanding these factors helps researchers implement an optimal support for students that leads to successful collaboration.
•Leaders emerge in groups regardless of the number of collaborators.•Leaders emerge in –and possibly catalyze –participatory groups where diverse members contribute.•For leaders to emerge, followers are essential, to interact with them and advance discussions.•Mediators emerge in more balanced groups with lower interactivity and isolates lurk in active groups.•A computational framework for roles is presented where new perspectives are highlighted.
Sequence analysis is an increasingly popular approach for analysing life courses represented by ordered collections of activities experienced by subjects over time. Here, we analyse a survey data set ...containing information on the career trajectories of a cohort of Northern Irish youths tracked between the ages of 16 and 22. We propose a novel, model‐based clustering approach suited to the analysis of such data from a holistic perspective, with the aims of estimating the number of typical career trajectories, identifying the relevant features of these patterns, and assessing the extent to which such patterns are shaped by background characteristics. Several criteria exist for measuring pairwise dissimilarities among categorical sequences. Typically, dissimilarity matrices are employed as input to heuristic clustering algorithms. The family of methods we develop instead clusters sequences directly using mixtures of exponential‐distance models. Basing the models on weighted variants of the Hamming distance metric permits closed‐form expressions for parameter estimation. Simultaneously allowing the component membership probabilities to depend on fixed covariates and accommodating sampling weights in the clustering process yields new insights on the Northern Irish data. In particular, we find that school examination performance is the single most important predictor of cluster membership.