Akademska digitalna zbirka SLovenije - logo
E-resources
Full text
Peer reviewed
  • Clinical text classificatio...
    Mujtaba, Ghulam; Shuib, Liyana; Idris, Norisma; Hoo, Wai Lam; Raj, Ram Gopal; Khowaja, Kamran; Shaikh, Khairunisa; Nweke, Henry Friday

    Expert systems with applications, 02/2019, Volume: 116
    Journal Article

    •To review free-text clinical text classification approaches from six aspects.•In selected studies, mostly content-based and concept-based features were used.•The datasets used in selected studies were categorized into four distinct types.•Selected studies used either supervised machine learning or rule-based approaches.•Ten open research challenges are presented in clinical text classification domain. The pervasive use of electronic health databases has increased the accessibility of free-text clinical reports for supplementary use. Several text classification approaches, such as supervised machine learning (SML) or rule-based approaches, have been utilized to obtain beneficial information from free-text clinical reports. In recent years, many researchers have worked in the clinical text classification field and published their results in academic journals. However, to the best of our knowledge, no comprehensive systematic literature review (SLR) has recapitulated the existing primary studies on clinical text classification in the last five years. Thus, the current study aims to present SLR of academic articles on clinical text classification published from January 2013 to January 2018. Accordingly, we intend to maximize the procedural decision analysis in six aspects, namely, types of clinical reports, data sets and their characteristics, pre-processing and sampling techniques, feature engineering, machine learning algorithms, and performance metrics. To achieve our objective, 72 primary studies from 8 bibliographic databases were systematically selected and rigorously reviewed from the perspective of the six aspects. This review identified nine types of clinical reports, four types of data sets (i.e., homogeneous–homogenous, homogenous–heterogeneous, heterogeneous–homogenous, and heterogeneous–heterogeneous), two sampling techniques (i.e., over-sampling and under-sampling), and nine pre-processing techniques. Moreover, this review determined bag of words, bag of phrases, and bag of concepts features when represented by either term frequency or term frequency with inverse document frequency, thereby showing improved classification results. SML-based or rule-based approaches were generally employed to classify the clinical reports. To measure the performance of these classification approaches, we used precision, recall, F-measure, accuracy, AUC, and specificity in binary class problems. In multi-class problems, we primarily used micro or macro-averaging precision, recall, or F-measure. Lastly, open research issues and challenges are presented for future scholars who are interested in clinical text classification. This SLR will definitely be a beneficial resource for researchers engaged in clinical text classification.