Akademska digitalna zbirka SLovenije - logo
E-viri
Recenzirano Odprti dostop
  • Text Analytics and Mixed Fe...
    Bote-Curiel, Luis; Ruiz-Llorente, Sergio; Munoz-Romero, Sergio; Yague-Fernandez, Monica; Barquin, Arantzazu; Garcia-Donas, Jesus; Rojo-Alvarez, Jose Luis

    IEEE access, 2021, Letnik: 9
    Journal Article

    Developments of richer integrative analysis methods for oncological studies are needed for efficiently leveraging the amount of clinical and genetic data available to provide the clinicians with better information. However, analyses of this nature often require mixing data of different types, which are not immediate to address jointly with classical methods. In this work, our aim is to find relationships between clinical and genetic features of different types (metric, categorical, and text) and the ovarian cancer (OC) disease progression. To this end, we first propose a univariate statistical method for text type applying bootstrap resampling to Bag of Words and Latent Dirichlet Allocation in order to include as features the free-text fields of the health recordings. Secondly, we extend bootstrap resampling for metric and categorical feature extraction with Principal Component Analysis (PCA) and Multiple Correspondence Analysis (MCA), respectively. We subsequently formulate a novel and integrative method for jointly considering metric, categorical, and text features. Results obtained in text analysis indicate individual differences in some words between two OC patients groups categorised according to their sensitivity to platinum drugs. These results indicate separability between both groups for text features. Also, regarding the multivariate analysis, clinical data results showed separability patterns for the three methods analysed according to the platinum-sensitivity degree. The use of these analytical tools in our OC cohort has allowed us to demonstrate their strengths by confirming the predictive and prognostic role of widely-known clinical and genetic variables (BRCA status, value of adjuvant therapy and optimal resection, or family history) and demonstrating significant associations in other variables whose role in OC development has been studied to a lesser extent (such as PMS1, GPC3, and SLX4 genes). These results highlight the value of implementing these approaches for the identification of novel biomarkers in the context of OC.