The aim of this study was to evaluate and compare the performance of multivariate classification algorithms, specifically Partial Least Squares Discriminant Analysis (PLS-DA) and machine learning ...algorithms, in the classification of Monthong durian pulp based on its dry matter content (DMC) and soluble solid content (SSC), using the inline acquisition of near-infrared (NIR) spectra. A total of 415 durian pulp samples were collected and analyzed. Raw spectra were preprocessed using five different combinations of spectral preprocessing techniques: Moving Average with Standard Normal Variate (MA+SNV), Savitzky-Golay Smoothing with Standard Normal Variate (SG+SNV), Mean Normalization (SG+MN), Baseline Correction (SG+BC), and Multiplicative Scatter Correction (SG+MSC). The results revealed that the SG+SNV preprocessing technique produced the best performance with both the PLS-DA and machine learning algorithms. The optimized wide neural network algorithm of machine learning achieved the highest overall classification accuracy of 85.3%, outperforming the PLS-DA model, with overall classification accuracy of 81.4%. Additionally, evaluation metrics such as recall, precision, specificity, F1-score, AUC ROC, and kappa were calculated and compared between the two models. The findings of this study demonstrate the potential of machine learning algorithms to provide similar or better performance compared to PLS-DA in classifying Monthong durian pulp based on DMC and SSC using NIR spectroscopy, and they can be applied in the quality control and management of durian pulp production and storage.
Objective: Mueller matrix polarimetry technique has been regarded as a powerful tool for probing the microstructural information of tissues. The multiplying of cells and remodeling of collagen fibers ...in breast carcinoma tissues have been reported to be related to patient survival and prognosis, and they give rise to observable patterns in hematoxylin and eosin (H&E) sections of typical breast tissues (TBTs) that the pathologist can label as three distinctive pathological features (DPFs)-cell nuclei, aligned collagen, and disorganized collagen. The aim of this paper is to propose a pixel-based extraction approach of polarimetry feature parameters (PFPs) using a linear discriminant analysis (LDA) classifier. These parameters provide quantitative characterization of the three DPFs in four types of TBTs. Methods: The LDA-based training method learns to find the most simplified linear combination from polarimetry basis parameters (PBPs) constrained under the accuracy remains constant to characterize the specific microstructural feature quantitatively in TBTs. Results: We present results from a cohort of 32 clinical patients with analysis of 224 regions-of-interest. The characterization accuracy for PFPs ranges from 0.82 to 0.91. Conclusion: This work demonstrates the ability of PFPs to quantitatively characterize the DPFs in the H&E pathological sections of TBTs. Significance: This technique paves the way for automatic and quantitative evaluation of specific microstructural features in histopathological digitalization and computer-aided diagnosis.
Linear discriminant analysis (LDA) is a classical statistical machine-learning method, which aims to find a linear data transformation increasing class discrimination in an optimal discriminant ...subspace. Traditional LDA sets assumptions related to the Gaussian class distributions and single-label data annotations. In this article, we propose a new variant of LDA to be used in multilabel classification tasks for dimensionality reduction on original data to enhance the subsequent performance of any multilabel classifier. A probabilistic class saliency estimation approach is introduced for computing saliency-based weights for all instances. We use the weights to redefine the between-class and within-class scatter matrices needed for calculating the projection matrix. We formulate six different variants of the proposed saliency-based multilabel LDA (SMLDA) based on different prior information on the importance of each instance for their class(es) extracted from labels and features. Our experiments show that the proposed SMLDA leads to performance improvements in various multilabel classification problems compared to several competing dimensionality reduction methods.
Display omitted
•Hydrostatic and pressurised springs were proposed from a hydrogeological viewpoint.•Definition of hydraulic head and its components were adapted to springs and streams.•A ...classification methodology was demonstrated for uneven spring data distribution.•The fundamental role of elevation of spring orifice in grouping was identified.
Springs are sources of freshwater supply. Furthermore, they can also deliver valuable insight into the hydrogeologic processes of a mountainous region, a natural conservation area or a remote study site with no wells. In order to assess the appearance, peculiarities, quality, stability, longevity and resilience of springs and related ecosystems, they need to be regarded in the context of basin-scale groundwater flow systems. The application of spring data evaluation on a basin scale was demonstrated via the carbonate system of Transdanubian Mts., Hungary. The readily measurable physical parameters of springs, the elevation of spring orifice, temperature and volumetric discharge rate provided reasonable classification and characterisation of springs and the related groundwater flow systems. Applying these parameters seemed prospective in a basin-scale understanding of flow systems in data-scarce regions, as monitoring discharge rate and water temperature are cost-effective, requiring no specific tools and analysing procedures. The combined cluster and discriminant analysis (CCDA) can handle uneven data distribution, unequal length and spacing of time series, data gaps, and consider the time-dependent variability of parameters. The optimal number of groups can be determined based on frequently sampled springs (or other entities). The less monitored springs (or other entities) can be classified using a similarity-based approach and linear discriminant analysis (LDA). Diagnosing the relation of springs to groundwater flow systems can advance sustainable water resources management, considering the ecological water needs maintaining various ecosystem services, therefore enhancing the resilience of springs and groundwater-dependent ecosystems.
This study developed and tested a theory-based measure of authentic leadership using five separate samples obtained from China, Kenya, and the United States. Confirmatory factor analyses supported a ...higher order, multidimensional model of the authentic leadership construct (the Authentic Leadership Questionnaire ALQ) comprising leader self-awareness, relational transparency, internalized moral perspective, and balanced processing. Structural equation modeling (SEM) demonstrated the predictive validity for the ALQ measure for important work-related attitudes and behaviors, beyond what ethical and transformational leadership offered. Finally, results revealed a positive relationship between authentic leadership and supervisor-rated performance. Implications for research and practice are discussed.
Apoptosis is associated with some human diseases, including cancer, autoimmune disease, neurodegenerative disease and ischemic damage, etc. Apoptosis proteins subcellular localization information is ...very important for understanding the mechanism of programmed cell death and the development of drugs. Therefore, the prediction of subcellular localization of apoptosis protein is still a challenging task.
In this paper, we propose a novel method for predicting apoptosis protein subcellular localization, called PsePSSM-DCCA-LFDA. Firstly, the protein sequences are extracted by combining pseudo-position specific scoring matrix (PsePSSM) and detrended cross-correlation analysis coefficient (DCCA coefficient), then the extracted feature information is reduced dimensionality by LFDA (local Fisher discriminant analysis). Finally, the optimal feature vectors are input to the SVM classifier to predict subcellular location of the apoptosis proteins. The overall prediction accuracy of 99.7, 99.6 and 100% are achieved respectively on the three benchmark datasets by the most rigorous jackknife test, which is better than other state-of-the-art methods.
The experimental results indicate that our method can significantly improve the prediction accuracy of subcellular localization of apoptosis proteins, which is quite high to be able to become a promising tool for further proteomics studies. The source code and all datasets are available at https://github.com/QUST-BSBRC/PsePSSM-DCCA-LFDA/ .
•Machine learning models show improved bankruptcy prediction accuracy over traditional models.•Various models were tested using different accuracy metrics.•Boosting, bagging, and random forest models ...provide better results.
There has been intensive research from academics and practitioners regarding models for predicting bankruptcy and default events, for credit risk management. Seminal academic research has evaluated bankruptcy using traditional statistics techniques (e.g. discriminant analysis and logistic regression) and early artificial intelligence models (e.g. artificial neural networks). In this study, we test machine learning models (support vector machines, bagging, boosting, and random forest) to predict bankruptcy one year prior to the event, and compare their performance with results from discriminant analysis, logistic regression, and neural networks. We use data from 1985 to 2013 on North American firms, integrating information from the Salomon Center database and Compustat, analysing more than 10,000 firm-year observations. The key insight of the study is a substantial improvement in prediction accuracy using machine learning techniques especially when, in addition to the original Altman’s Z-score variables, we include six complementary financial indicators. Based on Carton and Hofer (2006), we use new variables, such as the operating margin, change in return-on-equity, change in price-to-book, and growth measures related to assets, sales, and number of employees, as predictive variables. Machine learning models show, on average, approximately 10% more accuracy in relation to traditional models. Comparing the best models, with all predictive variables, the machine learning technique related to random forest led to 87% accuracy, whereas logistic regression and linear discriminant analysis led to 69% and 50% accuracy, respectively, in the testing sample. We find that bagging, boosting, and random forest models outperform the others techniques, and that all prediction accuracy in the testing sample improves when the additional variables are included. Our research adds to the discussion of the continuing debate about superiority of computational methods over statistical techniques such as in Tsai, Hsu, and Yen (2014) and Yeh, Chi, and Lin (2014). In particular, for machine learning mechanisms, we do not find SVM to lead to higher accuracy rates than other models. This result contradicts outcomes from Danenas and Garsva (2015) and Cleofas-Sanchez, Garcia, Marques, and Senchez (2016), but corroborates, for instance, Wang, Ma, and Yang (2014), Liang, Lu, Tsai, and Shih (2016), and Cano et al. (2017). Our study supports the applicability of the expert systems by practitioners as in Heo and Yang (2014), Kim, Kang, and Kim (2015) and Xiao, Xiao, and Wang (2016).
Display omitted
•Food processing bacteria were studied using SERS and Raman spectroscopy.•SERS found convincing in acquiring spectral features for bacteria identification.•PCA and PLSDA models were ...found useful in differentiation of bacterial species.
Food processing bacteria play important role in providing flavors, ingredients and other beneficial characteristics to the food but at the same time some bacteria are responsible for food spoilage. Therefore, quick and reliable identification of these food processing bacteria is very necessary for the differentiation between different species which may help in the development of more useful food processing methodologies. In this study, analysis of different bacterial species (Lactobacillus fermentum, Fructobacillus fructosus, Pediococcus pentosaceus and Halalkalicoccus jeotgali) was performed with our in-house developed Ag NPs-based surface-enhanced Raman spectroscopy (SERS) method. The SERS spectral data was analyzed by multivariate data analysis techniques including principal component analysis (PCA) and partial least square discriminant analysis (PLS-DA). Bacterial species were differentiated on the basis of SERS spectral features and potential of SERS was compared with the Raman spectroscopy (RS). SERS along with PCA and PLS-DA was found to be an efficient technique for identification and differentiation of food processing bacterial species. Differentiation with accuracy of 99.5% and sensitivity of 99.7% was depicted by PLS-DA model using leave one out cross validation.