Widespread implementation of electronic databases has improved the accessibility of plaintext clinical information for supplementary use. Numerous machine learning techniques, such as supervised ...machine learning approaches or ontology-based approaches, have been employed to obtain useful information from plaintext clinical data. This study proposes an automatic multi-class classification system to predict accident-related causes of death from plaintext autopsy reports through expert-driven feature selection with supervised automatic text classification decision models.
Accident-related autopsy reports were obtained from one of the largest hospital in Kuala Lumpur. These reports belong to nine different accident-related causes of death. Master feature vector was prepared by extracting features from the collected autopsy reports by using unigram with lexical categorization. This master feature vector was used to detect cause of death according to internal classification of disease version 10 (ICD-10) classification system through five automated feature selection schemes, proposed expert-driven approach, five subset sizes of features, and five machine learning classifiers. Model performance was evaluated using precisionM, recallM, F-measureM, accuracy, and area under ROC curve. Four baselines were used to compare the results with the proposed system.
Random forest and J48 decision models parameterized using expert-driven feature selection yielded the highest evaluation measure approaching (85% to 90%) for most metrics by using a feature subset size of 30. The proposed system also showed approximately 14% to 16% improvement in the overall accuracy compared with the existing techniques and four baselines.
The proposed system is feasible and practical to use for automatic classification of ICD-10-related cause of death from autopsy reports. The proposed system assists pathologists to accurately and rapidly determine underlying cause of death based on autopsy findings. Furthermore, the proposed expert-driven feature selection approach and the findings are generally applicable to other kinds of plaintext clinical reports.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
•To review free-text clinical text classification approaches from six aspects.•In selected studies, mostly content-based and concept-based features were used.•The datasets used in selected studies ...were categorized into four distinct types.•Selected studies used either supervised machine learning or rule-based approaches.•Ten open research challenges are presented in clinical text classification domain.
The pervasive use of electronic health databases has increased the accessibility of free-text clinical reports for supplementary use. Several text classification approaches, such as supervised machine learning (SML) or rule-based approaches, have been utilized to obtain beneficial information from free-text clinical reports. In recent years, many researchers have worked in the clinical text classification field and published their results in academic journals. However, to the best of our knowledge, no comprehensive systematic literature review (SLR) has recapitulated the existing primary studies on clinical text classification in the last five years. Thus, the current study aims to present SLR of academic articles on clinical text classification published from January 2013 to January 2018. Accordingly, we intend to maximize the procedural decision analysis in six aspects, namely, types of clinical reports, data sets and their characteristics, pre-processing and sampling techniques, feature engineering, machine learning algorithms, and performance metrics. To achieve our objective, 72 primary studies from 8 bibliographic databases were systematically selected and rigorously reviewed from the perspective of the six aspects. This review identified nine types of clinical reports, four types of data sets (i.e., homogeneous–homogenous, homogenous–heterogeneous, heterogeneous–homogenous, and heterogeneous–heterogeneous), two sampling techniques (i.e., over-sampling and under-sampling), and nine pre-processing techniques. Moreover, this review determined bag of words, bag of phrases, and bag of concepts features when represented by either term frequency or term frequency with inverse document frequency, thereby showing improved classification results. SML-based or rule-based approaches were generally employed to classify the clinical reports. To measure the performance of these classification approaches, we used precision, recall, F-measure, accuracy, AUC, and specificity in binary class problems. In multi-class problems, we primarily used micro or macro-averaging precision, recall, or F-measure. Lastly, open research issues and challenges are presented for future scholars who are interested in clinical text classification. This SLR will definitely be a beneficial resource for researchers engaged in clinical text classification.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Display omitted
•Novel conceptual graph-based model is proposed to classify forensic autopsy reports.•The performance of proposed model was evaluated on four manners of death datasets.•The proposed ...model outperformed compared to existing fully automated baseline models.•SVM, RF, and ensembled voted classifiers showed better classification performance.•Two-level classification showed better results compared to one-level classification.
Text categorization has been used extensively in recent years to classify plain-text clinical reports. This study employs text categorization techniques for the classification of open narrative forensic autopsy reports. One of the key steps in text classification is document representation. In document representation, a clinical report is transformed into a format that is suitable for classification. The traditional document representation technique for text categorization is the bag-of-words (BoW) technique. In this study, the traditional BoW technique is ineffective in classifying forensic autopsy reports because it merely extracts frequent but discriminative features from clinical reports. Moreover, this technique fails to capture word inversion, as well as word-level synonymy and polysemy, when classifying autopsy reports. Hence, the BoW technique suffers from low accuracy and low robustness unless it is improved with contextual and application-specific information. To overcome the aforementioned limitations of the BoW technique, this research aims to develop an effective conceptual graph-based document representation (CGDR) technique to classify 1500 forensic autopsy reports from four (4) manners of death (MoD) and sixteen (16) causes of death (CoD). Term-based and Systematized Nomenclature of Medicine–Clinical Terms (SNOMED CT) based conceptual features were extracted and represented through graphs. These features were then used to train a two-level text classifier. The first level classifier was responsible for predicting MoD. In addition, the second level classifier was responsible for predicting CoD using the proposed conceptual graph-based document representation technique. To demonstrate the significance of the proposed technique, its results were compared with those of six (6) state-of-the-art document representation techniques. Lastly, this study compared the effects of one-level classification and two-level classification on the experimental results. The experimental results indicated that the CGDR technique achieved 12% to 15% improvement in accuracy compared with fully automated document representation baseline techniques. Moreover, two-level classification obtained better results compared with one-level classification. The promising results of the proposed conceptual graph-based document representation technique suggest that pathologists can adopt the proposed system as their basis for second opinion, thereby supporting them in effectively determining CoD.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Automatic text classification techniques are useful for classifying plaintext medical documents. This study aims to automatically predict the cause of death from free text forensic autopsy reports by ...comparing various schemes for feature extraction, term weighing or feature value representation, text classification, and feature reduction.
For experiments, the autopsy reports belonging to eight different causes of death were collected, preprocessed and converted into 43 master feature vectors using various schemes for feature extraction, representation, and reduction. The six different text classification techniques were applied on these 43 master feature vectors to construct a classification model that can predict the cause of death. Finally, classification model performance was evaluated using four performance measures i.e. overall accuracy, macro precision, macro-F-measure, and macro recall.
From experiments, it was found that that unigram features obtained the highest performance compared to bigram, trigram, and hybrid-gram features. Furthermore, in feature representation schemes, term frequency, and term frequency with inverse document frequency obtained similar and better results when compared with binary frequency, and normalized term frequency with inverse document frequency. Furthermore, the chi-square feature reduction approach outperformed Pearson correlation, and information gain approaches. Finally, in text classification algorithms, support vector machine classifier outperforms random forest, Naive Bayes, k-nearest neighbor, decision tree, and ensemble-voted classifier.
Our results and comparisons hold practical importance and serve as references for future works. Moreover, the comparison outputs will act as state-of-art techniques to compare future proposals with existing automated text classification techniques.
•Compare various ATC techniques to determine CoD from forensic autopsy reports.•Unigram feature extraction scheme outperformed bigram, trigram and hybrid schemes.•For feature ranking chi-square outperformed information gain and Pearson Correlation.•SVM outperformed kNN, J48, RF, NB and ensemble voted classifiers.•Our findings could stimulate researchers to propose new techniques in this domain.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UL, UM, UPCLJ, UPUK, ZRSKP
Forensic autopsy focuses on revealing the cause of death (CoD) by examination of a dead body. In this research study, various feature extraction schemes, feature value representation schemes and text ...classification algorithms have been applied on forensic autopsy reports to discover the suitable feature extraction approach, feature value representation approach and text classification approach. From experimental results, it was found that the unigram features outperformed bigram, trigram and hybrids of unigram, bigram and trigram features. Moreover, TF and TFiDF feature value representation schemes were proven more suitable than binary representation and normalized TFiDF schemes. Finally, SVM decision models outperformed RF and NB.