Fuzzy support vector machine (FSVM) has been combined with class imbalance learning (CIL) strategies to address the problem of classifying skewed data. However, the existing approaches hold several ...inherent drawbacks, causing the inaccurate prior data distribution estimation, further decreasing the quality of the classification model. To solve this problem, we present a more robust prior data distribution information extraction method named relative density, and two novel FSVM-CIL algorithms based on the relative density information in this paper. In our proposed algorithms, a K-nearest neighbors-based probability density estimation (KNN-PDE) alike strategy is utilized to calculate the relative density of each training instance. In particular, the relative density is irrelevant with the dimensionality of data distribution in feature space, but only reflects the significance of each instance within its class; hence, it is more robust than the absolute distance information. In addition, the relative density can better seize the prior data distribution information, no matter the data distribution is easy or complex. Even for the data with small injunctions or a large class overlap, the relative density information can reflect its details well. We evaluated the proposed algorithms on an amount of synthetic and real-world imbalanced datasets. The results show that our proposed algorithms obviously outperform to some previous work, especially on those datasets with sophisticated distributions.
Feature selection in classification is an important task in machine learning. Inspired by the success of Universum support vector machine proposed by Weston et al. on improving the classification ...ability of classical support vector machine, this paper considers a special type of Universum and further lets it play its role in both useful feature identification and separating hyperplane construction, aiming to improve both the feature selection ability and classification performance of Universum support vector machine. By introducing this special Universum, a redundant feature can be identified by observing whether some Universum sample is useful. In fact, we prove that by observing the dual solution of the optimization problem, useful features can be selected from a set satisfying some properties. Due to the introduction of these extra Universum samples, it needs to cope with a large-scale optimization problem. To improve the training efficiency, we modify the sequential minimal optimization algorithm and further combine it with the coordinate descent technique to solve the proposed model. Experimental results on artificial datasets, benchmark datasets, and text classification datasets demonstrate that the proposed method improves the classification performance of support vector machine and Universum support vector machine, and also has good feature selection ability.
•Special Universum samples are introduced to fulfill both feature selection and classification.•It identifies a feature as useful if its corresponding Universum sample contributes to classification.•An effective algorithm by combining modified SMO and coordinate descent technique is designed.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Anomaly detection in energy consumption is a crucial step towards developing efficient energy saving systems, diminishing overall energy expenditure and reducing carbon emissions. Therefore, ...implementing powerful techniques to identify anomalous consumption in buildings and providing this information to end‐users and managers is of significant importance. Accordingly, two novel schemes are proposed in this paper; the first one is an unsupervised abnormality detection based on one‐class support vector machine, namely UAD‐OCSVM, in which abnormalities are extracted without the need of annotated data; the second is a supervised abnormality detection based on micromoments (SAD‐M2), which is implemented in the following steps: (i) normal and abnormal power consumptions are defined and assigned; (ii) a rule‐based algorithm is introduced to extract the micromoments representing the intent‐rich moments, in which the end‐users make decisions to consume energy; and (iii) an improved K‐nearest neighbors model is introduced to automatically classify consumption footprints as normal or abnormal. Empirical evaluation conducted in this framework under three different data sets demonstrates that SAD‐M2 achieves both a highest abnormality detection performance and real‐time processing capability with considerably lower computational cost in comparison with other machine learning methods. For instance, up to 99.71% accuracy and 99.77% F1 score have been achieved using a real‐world data set collected at the Qatar University energy lab.
Full text
Available for:
FZAB, GIS, IJS, KILJ, NLZOH, NUK, OILJ, SBCE, SBMB, UL, UM, UPUK
The static, clean and movement free characteristics of solar energy along with its contribution towards global warming mitigation, enhanced stability and increased efficiency advocates solar power ...systems as one of the most feasible energy generation resources. Considering the influence of stochastic weather conditions over the output power of photovoltaic (PV) systems, the necessity of a sophisticated forecasting model is increased rapidly. A genetic algorithm-based support vector machine (GASVM) model for short-term power forecasting of residential scale PV system is proposed in this manuscript. The GASVM model classifies the historical weather data using an SVM classifier initially and later it is optimized by the genetic algorithm using an ensemble technique. In this research, a local weather station was installed along with the PV system at Deakin University for accurately monitoring the immediate surrounding environment avoiding the inaccuracy caused by the remote collection of weather parameters (Bureau of Meteorology). The forecasting accuracy of the proposed GASVM model is evaluated based on the root mean square error (RMSE) and mean absolute percentage error (MAPE). Experimental results demonstrated that the proposed GASVM model outperforms the conventional SVM model by the difference of about 669.624 W in the RMSE value and 98.7648% of the MAPE error.
•We propose a stochastic GASVM model for short-term forecasting of PV output power.•We use historical weather data collected from a local weather station for accurately modeling the system.•We evaluate the forecasting accuracy of the proposed technique based on RMSE and MAPE.•The results are compared with conventional SVM technique.•The proposed forecasting technique stands out for features like being robust, accurate, fast and needing less memory.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
•A large number of sentiment reviews, blogs and comments present online.•These reviews must be classified to obtain a meaningful information.•Four different supervised machine learning algorithm used ...for classification.•Unigram, Bigram, Trigram models and their combinations used for classification.•The classification is done on IMDb movie review dataset.
With the ever increasing social networking and online marketing sites, the reviews and blogs obtained from those, act as an important source for further analysis and improved decision making. These reviews are mostly unstructured by nature and thus, need processing like classification or clustering to provide a meaningful information for future uses. These reviews and blogs may be classified into different polarity groups such as positive, negative, and neutral in order to extract information from the input dataset. Supervised machine learning methods help to classify these reviews. In this paper, four different machine learning algorithms such as Naive Bayes (NB), Maximum Entropy (ME), Stochastic Gradient Descent (SGD), and Support Vector Machine (SVM) have been considered for classification of human sentiments. The accuracy of different methods are critically examined in order to access their performance on the basis of parameters such as precision, recall, f-measure, and accuracy.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UL, UM, UPCLJ, UPUK, ZRSKP
•Electroencephalogram signal classification is performed using universum learning.•Support vector machine classifier uses prior information from interictal signals.•Many feature extraction techniques ...are used for comparing the algorithms.•Universum support vector machine is used first time for seizure classification.
Support vector machine (SVM) has been used widely for classification of electroencephalogram (EEG) signals for the diagnosis of neurological disorders such as epilepsy and sleep disorders. SVM shows good generalization performance for high dimensional data due to its convex optimization problem. The incorporation of prior knowledge about the data leads to a better optimized classifier. Different types of EEG signals provide information about the distribution of EEG data. To include prior information in the classification of EEG signals, we propose a novel machine learning approach based on universum support vector machine (USVM) for classification. In our approach, the universum data points are generated by selecting universum from the EEG dataset itself which are the interictal EEG signals. This removes the effect of outliers on the generation of universum data. Further, to reduce the computation time, we use our approach of universum selection with universum twin support vector machine (UTSVM) which has less computational cost in comparison to traditional SVM. For checking the validity of our proposed methods, we use various feature extraction techniques for different datasets consisting of healthy and seizure signals. Several numerical experiments are performed on the generated datasets and the results of our proposed approach are compared with other baseline methods. Our proposed USVM and proposed UTSVM show better generalization performance compared to SVM, USVM, Twin SVM (TWSVM) and UTSVM. The proposed UTSVM has achieved highest classification accuracy of 99% for the healthy and seizure EEG signals.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UL, UM, UPCLJ, UPUK, ZRSKP
Imbalanced datasets are prominent in real-world problems. In such problems, the data samples in one class are significantly higher than in the other classes, even though the other classes might be ...more important. The standard classification algorithms may classify all the data into the majority class, and this is a significant drawback of most standard learning algorithms, so imbalanced datasets need to be handled carefully. One of the traditional algorithms, twin support vector machines (TSVM), performed well on balanced data classification but poorly on imbalanced datasets classification. In order to improve the TSVM algorithm’s classification ability for imbalanced datasets, recently, driven by the universum twin support vector machine (UTSVM), a reduced universum twin support vector machine for class imbalance learning (RUTSVM) was proposed. The dual problem and finding classifiers involve matrix inverse computation, which is one of RUTSVM’s key drawbacks. In this paper, we improve the RUTSVM and propose an improved reduced universum twin support vector machine for class imbalance learning (IRUTSVM). We offer alternative Lagrangian functions to tackle the primal problems of RUTSVM in the suggested IRUTSVM approach by inserting one of the terms in the objective function into the constraints. As a result, we obtain new dual formulation for each optimization problem so that we need not compute inverse matrices neither in the training process nor in finding the classifiers. Moreover, the smaller size of the rectangular kernel matrices is used to reduce the computational time. Extensive testing is carried out on a variety of synthetic and real-world imbalanced datasets, and the findings show that the IRUTSVM algorithm outperforms the TSVM, UTSVM, and RUTSVM algorithms in terms of generalization performance.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
In order to provide an accurate State-Of-Health (SOH) estimation, a novel estimation method is proposed in this paper. In this work, some battery SOH relate features are selected theoretically, ...proved and then re-screened mathematically. These features can reflect the battery degeneration from different aspects. Also, a new training set design idea is proposed for Least Squares Support Vector Machine algorithm, thereby a model that is suitable for lithium-ion Battery SOH estimation under multi-working conditions can be built. Several lithium-ion battery degeneration testing datasets from National Aeronautics and Space Administration Ames Prognostics Center of Excellence are used to validate the proposed method. Results demonstrate both the superiority of the proposed method and its potential applicability as an effective SOH estimation method for embedded Battery Management System.
•A novel estimation method for State-Of-Health estimation is proposed.•Some features related to battery SOH are selected and re-screened mathematically.•A model for lithium-ion Battery SOH estimation under multi-working conditions is built.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Support vector machine (SVM) is a supervised machine learning approach that was recognized as a statistical learning apotheosis for the small-sample database. SVM has shown its excellent learning and ...generalization ability and has been extensively employed in many areas. This paper presents a performance analysis of six types of SVMs for the diagnosis of the classical Wisconsin breast cancer problem from a statistical point of view. The classification performance of standard SVM (St-SVM) is analyzed and compared with those of the other modified classifiers such as proximal support vector machine (PSVM) classifiers, Lagrangian support vector machines (LSVM), finite Newton method for Lagrangian support vector machine (NSVM), Linear programming support vector machines (LPSVM), and smooth support vector machine (SSVM). The experimental results reveal that these SVM classifiers achieve very fast, simple, and efficient breast cancer diagnosis. The training results indicated that LSVM has the lowest accuracy of 95.6107 %, while St-SVM performed better than other methods for all performance indices (accuracy = 97.71 %) and is closely followed by LPSVM (accuracy = 97.3282). However, in the validation phase, the overall accuracies of LPSVM achieved 97.1429 %, which was superior to LSVM (95.4286 %), SSVM (96.5714 %), PSVM (96 %), NSVM (96.5714 %), and St-SVM (94.86 %). Value of ROC and MCC for LPSVM achieved 0.9938 and 0.9369, respectively, which outperformed other classifiers. The results strongly suggest that LPSVM can aid in the diagnosis of breast cancer.
Full text
Available for:
DOBA, EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, IZUM, KILJ, KISLJ, MFDPS, NLZOH, NUK, OBVAL, ODKLJ, OILJ, PILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UILJ, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
To timely detect the incipient failure of rolling bearing and find out the accurate fault location, a novel rolling bearing fault diagnosis method is proposed based on the composite multiscale fuzzy ...entropy (CMFE) and ensemble support vector machines (ESVMs). Fuzzy entropy (FuzzyEn), as an improvement of sample entropy (SampEn), is a new nonlinear method for measuring the complexity of time series. Since FuzzyEn (or SampEn) in single scale can not reflect the complexity effectively, multiscale fuzzy entropy (MFE) is developed by defining the FuzzyEns of coarse-grained time series, which represents the system dynamics in different scales. However, the MFE values will be affected by the data length, especially when the data are not long enough. By combining information of multiple coarse-grained time series in the same scale, the CMFE algorithm is proposed in this paper to enhance MFE, as well as FuzzyEn. Compared with MFE, with the increasing of scale factor, CMFE obtains much more stable and consistent values for a short-term time series. In this paper CMFE is employed to measure the complexity of vibration signals of rolling bearings and is applied to extract the nonlinear features hidden in the vibration signals. Also the physically meanings of CMFE being suitable for rolling bearing fault diagnosis are explored. Based on these, to fulfill an automatic fault diagnosis, the ensemble SVMs based multi-classifier is constructed for the intelligent classification of fault features. Finally, the proposed fault diagnosis method of rolling bearing is applied to experimental data analysis and the results indicate that the proposed method could effectively distinguish different fault categories and severities of rolling bearings.
•Composite multiscale fuzzy entropy is proposed for measuring the complexity of time series.•CMFE is compared with multiscale fuzzy entropy.•CMFE is employed to extract the hidden features from vibration signals.•A novel fault diagnosis method is proposed based on the CMFE and ensemble SVMs.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UL, UM, UPCLJ, UPUK, ZRSKP