Breast cancer remains to be a leading cause of cancer-related deaths among women. Mortality is mainly attributed to metastasis and recurrence. Hence, early detection of breast cancer recurrence has ...become a real-world medical problem. Using data mining approaches, we compared four popular machine learning models (Logistic Regression, Naïve Bayes, K-Nearest Neighbors, and Support Vector Machines) on a high-dimensional but very small dataset, the Wisconsin Prognostic Breast Cancer Data Set for classifying breast cancer recurrences on four different configurations: a) only scaling applied, b) scaling with PCA, c) scaling with PCA and oversampling of minority class, and d) only with selected features (i.e. choose only one from each set of features that have high correlation with each other). Our results showed that Logistic Regression provided the best scores in almost all metrics (precision, recall, accuracy, f1 score (weighted), AUROC, AUPROC, and Cohen Kappa statistic in all four configurations, followed by Support Vector Machines, and then by K-Nearest Neighbors. Naive Bayes performed poorly especially in the scaling with PCA configuration, however, when we retained only one of many features that have high correlations with each other, Naïve Bayed performance improved. KNN improved its recall with oversampling while SVM’s accuracy score has been fairly constant in all four configurations. Based on this study, the Logistic Regression model can serve as a potential model for predicting breast cancer recurrence that would enable clinicians to propose treatment options based on whether patient’s features correspond to a good or bad prognosis (recurrence). This indicates the clinical utility of data mining methods for the early detection of breast cancer recurrence in post-surgical patients to save lives.
Accurate and precise glaucoma screening is crucial for early assessment to prevent blindness, particularly in communities where access to ophthalmologists’ services is limited. It is in this context ...where convolutional neural networks (CNN) can be harnessed to help in the correct and dependable recognition of glaucoma. One of the neural network parameters that needed to be considered is the issue of batch size. There were very few studies exploring the impact of batch size in the diagnostic capability of neural networks but none so far for glaucoma classification. The purpose of the study is to probe the diagnostic capability of CNN in glaucoma classification as well as to investigate the suitable batch size generating the optimum classification performance of these models. Various CNN models (base CNN, VGG16, ResNet50, and DenseNet121) were applied to a Retinal Image Dataset. The models generated high values for the performance indicators with accuracy (79% - 88%), recall (80% - 93%), precision (79% - 86%) and F1-score (81% - 89%) indicating good and acceptable diagnostic capability for glaucoma recognition. DenseNet121 obtained the highest normalized Matthews Correlation Coefficient stipulating the best diagnostic capability for glaucoma classification. The batch sizes with the best diagnostic capability were 64 and 128 (base CNN model), 64 (VGG16), 16 (ResNet50), and 16 (DenseNet121). The results agreed with the recommendations of other published works of using lower batch size particularly for deeper neural network models as higher batch sizes did not generate superior performance as well posing hardware constraints. The deployment and incorporation of these CNN models can be useful complementary support tools aiding health professionals in the initial assessment of patients. This approach would allow health professionals to initiate referrals to experts for further management thereby preventing visual impairment to our patients. A working partnership of healthcare workers with data scientists is very ideal and helpful for quick and precise diagnosis of glaucoma leading to prevention of blindness in our patients.
Cardiovascular disease is a highly prevalent health problem in both underdeveloped and developing countries worldwide. As such, it remains to be one of the top health priorities in many countries. In ...coronary artery disease (CAD), the formation of an atherosclerotic plaque is evident in the lumen of blood vessels leading to the derangement in blood flow resulting to diminished delivery of oxygen to the myocardium. Single Photon Emission Computed Tomography – Myocardial Perfusion Imaging (SPECT-MPI) is a usually requested imaging modality to evaluate for CAD. Visual evaluation of the MPI images is performed by a nuclear medicine doctor and is largely dependent on his experience showing significant inter-observer variability. The study aims to assess the performance of convolutional neural networks (CNN) using transfer learning to classify SPECT-MPI for perfusion abnormalities using an anonymized publicly available SPECT-MPI dataset. The pre-processing methods that were applied to the dataset were the following: (a) normalization of images, (b) shuffling of images, (c) train-test split, and (d) geometric augmentation. The pre-processed data was then entered to the popular pre-trained CNNs typically applied to medical images: VGG16, DenseNet121, InceptionV3 and ResNet50. The best performing models were obtained by VGG16 and InceptionV3 with the highest accuracy rate of 84.38%. However, VGG16 had higher recall and F1-scores as compared to InceptionV3 while InceptionV3 had higher precision. Nonetheless, VGG16, InceptionV3 and DenseNet121 obtained similar performance metrics with each other (recall:80-100%, precision: 80.65-100%, F1-scores: 88.89-90.91%) while ResNet50 generated the lowest performance metrics. Overall findings suggest that any of these 3 CNN models (VGG16, InceptionV3, DenseNet121) can be deployed by nuclear medicine physicians in their clinical practice to further augment their decision skills in the interpretation of SPECT-MPI tests. The models can also be adopted as dependable and trusted secondary assessment which can guide junior doctors seeking consultation for a reliable diagnosis. These models can likewise serve as teaching or learning materials for the less experienced physicians particularly those still in their training career. This highlights the clinical utility of these models in the practice of nuclear cardiology. The results of the research exhibited encouraging outcomes which may possibly be incorporated clinical work. The study has the potential to enrich CAD discernment and monitoring.
Pneumonia is a respiratory illness that is highly infectious and affecting one or both lungs. It can lead to fatalities if left undiagnosed and untreated in time. While a radiologist can diagnose ...pneumonia just by looking at a chest X-ray, there are certain factors that may increase the chances of a misdiagnosis such as fatigue, multi-presenting symptoms, or inadequate overall experience in assessment of patients with low-prevalence presentations. A publicly available chest X-ray dataset was used in this study to create a portable executable clinical decision support tool to help determine the existence of pneumonia on a given chest X-ray image. ResNet50, DenseNet201, Xception, MobileNetV2, and ResNet101 were retrained to confirm pneumonia in chest X-rays both on the original and augmented dataset. An ensemble network was also constructed to improve the results by combining the strengths of the included pre-trained models using average probability. In this study, a total of 37 experiments were performed. Since there are instances when no single model excels on all metrics, the model with the highest Matthews Correlation Coefficient (MCC) was selected as the best model. MCC outputs an optimal score only if the classifier was able to get a high percentage for both positive and negative samples. To increase further the performance of the ensemble model, a novel method was introduced that determines which pre-trained model will give the greatest increase in the ensemble model performance when removed from the ensemble. The result is the identification of the best ensemble model variant consisting of only ResNet50, DenseNet201, Xception, and ResNet101 with Accuracy=93%, Precision=91%, Recall=98%, F1-Score=94%, Specificity=83%, MCC=85%. A stand-alone no-install application was created for this purpose to enable those with limited access or no access to internet to run the tool on their Window-based computer. Our tool can be used as portable stand-alone clinical decision support tool by radiologists when evaluating possible pneumonia case. It can also be used as a teaching tool for general practitioners and medical students.
X-linked dystonia parkinsonism (XDP) is a neurodegenerative rapidly progressive movement disorder seen almost exclusively among male patients of Filipino descent with traces of ancestry to Panay ...Island in the Philippines. The prevalence of XDP is only 0.31/100,000 for the entire Philippines but it is 23.66/100,000 in Capiz Province in Panay. Using a retrospectively obtained data from a previous study collected from all the 16 towns and 1 city of Capiz province in the island of Panay in the Philippines, the XDP dataset showed superior clustering tendency. Two clustering approaches commonly used in data science namely Partitioning Around Medoids (PAM) and K-modes were applied to the XDP dataset. Both clustering approaches obtained good internal and external validation. Likewise, both clustering techniques generated good performance metrics. Overall, K-modes performed better than PAM with an 81% accuracy, 78% recall (sensitivity), 82% specificity, 68% precision, 72% F1-score and 0.58 Matthews Correlation Coefficient. Both PAM and K-modes were able to highlight the differences between the clusters. Thus, the resulting clusters can be useful as screening tools to differentiate those with and without XDP. Feature importance of the attributes was also performed with feet shuffling generating the highest discriminative power in distinguishing patients with and without XDP. The collaboration of data scientists with neurology experts in movement disorders is a step forward to the deployment and acceptability of machine learning tools in clinical practice in neurology.
Many machine learning (ML) applications in healthcare that are capable of generating very good performance are still not integrated in the clinical workflow primarily because the end-users, the ...physicians, could not understand the logic used by these models. For them, these are black boxes incapable of providing reasons for arriving at the diagnosis thus creating trust issues and acceptability. In the end, it is still the physician who will make the final decision and so for the tool to be considered a worthy clinical decision support tool, it should be able to communicate very well how it arrived at the result. Only then can it gain the physician's trust and acceptance. Thus, Explainable AI (XAI) should be feature in any ML application in order to gain acceptability and integration in the workplace. In this study we presented a tool that is capable of predicting patients who will more likely be readmitted and its reasons for arriving at such a conclusion. We used Random Forest (RF), AdaBoost, and K-Nearest Neighbors (K-NN) to build the models, performed hyperparameter tuning in order to improve performance, calculated the feature importance to understand which features are deemed important to each model, and then added a visual explainer using Local Interpretable Model-Agnostic Explanation (LIME) to help the physician understand the logic employed by each model in making the classification.
Late IUGR is associated with a higher risk of perinatal hypoxic events and suboptimal neurodevelopment and is a leading cause of perinatal mortality and usually suspected in the third trimester of ...pregnancy, eventually confirmed at birth. As IUGR in the third trimester of pregnancy is greatly associated with unexplained stillbirths in low-risk pregnancies, prompt antenatal diagnosis and treatment with timely delivery could significantly reduce these risks. In this study, we assessed the presence of late IUGR using CTG findings. We applied 7 machine learning models (Naïve Bayes, k-Nearest Neighbors, Support Vector Machine, Logistic Regression, Decision Tree, Random Forest and AdaBoost) under three different experiments highlighting the effect of pre-processing techniques. The best performing models are obtained by logistic regression and support vector machine with accuracy rates of 84-85%, precision of 79-80%, recall at 85-89% and F-scores of 82-84%. The models perform exceptionally well in all evaluation metrics, showing robustness and flexibility as a predictive model for the late IUGR. Based on the results of our experiments, we stressed the importance of feature elimination as a pre-processing technique to improve model performance. Through feature importance method, we also identified the top relevant features in predicting late IUGR for both logistic regression and support vector machine.
An optimizer is a function that modifies the attributes of a neural network by changing the epoch's weights and learning rates during training to ensure that the loss function is at a minimum. The ...loss function computes the error or loss between the actual and desired output and reflects how well the model predicts given a data point. The role of these optimizers is to help improve accuracy and training speed of the model. An alarming rate of monkeypox cases (more than 85,000) have been reported to the Center for Disease Control coming from geographical areas not previously known to have cases of monkeypox. Prompt diagnosis of monkeypox is extremely daunting as the clinical presentation is very similar to other viral diseases such as scarlet fever, smallpox, and roseola. The aims of this research study are (1) to investigate the diagnostic capability of convolutional neural networks in the evaluation of monkeypox skin lesions and (2) to ascertain the appropriate optimization algorithm which generated the optimum diagnostic performance. InceptionV3, Efficient NetB3, VGG 16 and DenseNet169 were applied to the monkeypox skin images dataset to appraise the presence of monkeypox. Several optimization algorithms were evaluated in terms of diagnostic capability to recognize monkeypox. These included Adam, AdaMax, Adagrad, AdaDelta, SGD and RMSprop. For all the models, Adam and AdaMax generated superior performance yielding the highest Matthews Correlation Coefficient (MCC). SGD had the lowest diagnostic performance for all the CNN models. Overall, the topmost performing CNN model with Adam optimizer was achieved by InceptionV3 with the highest performance metrics of 95% accuracy, 95% sensitivity, 96% specificity, 95% precision, 95% F1-score and MCC of 0.9100. The findings suggest that CNN models can be beneficial complementary support tools for improved surveillance and control of monkeypox especially in times of disease outbreaks where swift recognition is crucial to institute treatment measures and quarantine of patients. A working cooperation of monkeypox experts and machine learning modelers is imperative to attain the target objective curbing community spread of monkeypox.
There is a demand for flowers globally all year round, more particularly roses, necessitating increased production for flowers. Demand for roses has increased due to their year-long availability as ...well as its uses in cosmetic, perfume, medicinal products, food raw materials and decoration industry. Rose plants are prone to drastic fluctuations in temperature, drought stress damages, and low precipitations. These resulted to an increase in greenhouse production to generate optimum supply to meet growing demands as controlled environment provides several advantages. In this study, four machine learning models (random forest, support vector machine, multinomial logistic regression, and artificial neural networks) were applied to roses greenhouse cultivation dataset. The study aims to classify the most suitable greenhouse environment to upgrade the roses state leading to the optimal production of roses. Four model configurations corresponding to the pre-processing techniques were tested. These were scaling only, scaling plus removal of outliers, scaling plus SMOTE, and scaling with removal of outliers plus SMOTE. Random forest with all pre-processing steps applied to the dataset obtained the best performance with the highest weighted F1- scores, weighted-average precision, weighted-average recall, and Cohen's kappa statistic. This indicates that machine learning models can predict corrective actions leading to improved conditions of roses. The notable contribution of this research is to find valid and reliable classification models that assist growers in predicting the best greenhouse micro-environment.