A well-configured spare parts supply chain (SC) can reduce costs and increase the competitiveness of spare parts retailers. A structured method for configuring spare parts SCs should be used to ...determine whether to centralise or decentralise inventory management, also considering hybrid configurations. Moreover, such a method should define whether or not to switch the production of spare parts from Conventional Manufacturing (CM) technologies to Additive Manufacturing (AM) ones. Indeed, AM is considered the next revolution in the field of spare parts, and the adoption of AM technologies strongly affects the characteristics of SCs. However, the choice between centralisation and decentralisation is not the subject of much scientific research, and it is also not clear when AM would be the preferable manufacturing technology for spare parts. This paper aims to assist managers and practitioners in determining how to design their spare parts SCs, thus defining both the spare parts SC configuration and the manufacturing technology to adopt through the development of a decision support system (DSS). The proposed DSS is a user-friendly decision tree, and, for the first time, it allows comparison of the total costs of SCs characterised by different degrees of centralisation with both AM and CM spare parts.
Data mining information about people is becoming increasingly important in the data-driven society of the 21st century. Unfortunately, sometimes there are real-world considerations that conflict with ...the goals of data mining; sometimes the privacy of the people being data mined needs to be considered. This necessitates that the output of data mining algorithms be modified to preserve privacy while simultaneously not ruining the predictive power of the outputted model. Differential privacy is a strong, enforceable definition of privacy that can be used in data mining algorithms, guaranteeing that nothing will be learned about the people in the data that could not already be discovered without their participation. In this survey, we focus on one particular data mining algorithm—decision trees—and how differential privacy interacts with each of the components that constitute decision tree algorithms. We analyze both greedy and random decision trees, and the conflicts that arise when trying to balance privacy requirements with the accuracy of the model.
Preparation of landslide susceptibility map is the first step for landslide hazard mitigation and risk assessment. The main aim of this study is to explore potential applications of two new models ...such as two-class Kernel Logistic Regression (KLR) and Alternating Decision Tree (ADT) for landslide susceptibility mapping at the Yihuang area (China). The ADT has not been used in landslide susceptibility modeling and this paper attempts a novel application of this technique. For the purpose of comparison, a conventional method of Support Vector Machines (SVM) which has been widely used in the literature was included and their results were assessed. At first, a landslide inventory map with 187 landslide locations for the study area was constructed from various sources. Landslide locations were then spatially randomly split in a ratio of 70/30 for building landslide models and for the model validation. Then a spatial database with a total of fourteen landslide conditioning factors was prepared, including slope, aspect, altitude, topographic wetness index (TWI), stream power index (SPI), sediment transport index (STI), plan curvature, landuse, normalized difference vegetation index (NDVI), lithology, distance to faults, distance to rivers, distance to roads, and rainfall. Using the KLR, the SVM, and the ADT, three landslide susceptibility models were constructed using the training dataset. The three resulting models were validated and compared using the receive operating characteristic (ROC), Kappa index, and five statistical evaluation measures. In addition, pairwise comparisons of the area under the ROC curve were carried out to assess if there are significant differences on the overall performance of the three models. The goodness-of-fits are 92.5% (the KLR model), 88.8% (the SVM model), and 95.7% (the ADT model). The prediction capabilities are 81.1%, 84.2%, and 93.3% for the KLR, the SVM, and the ADT models, respectively. The result shows that the ADT model yielded better overall performance and accurate results than the KLR and SVM models. The KLR model considered slightly better than SVM model in terms of the positive prediction values. The ADT and KLR are the two promising data mining techniques which might be considered to use in landslide susceptibility mapping. The results from this study may be useful for landuse planning and decision making in landslide prone areas.
•Spatial prediction of landslide hazards was carried out by using KLR, ADT, and SVM.•All landslide models have a good prediction capability.•ADT model has better prediction capabilities.
One of the most common and critical destructive attacks on the victim system is the advanced persistent threat (APT)-attack. An APT attacker can achieve its hostile goal through obtaining information ...and gaining financial benefits from the infrastructure of a network. One of the solutions to detect a unanimous APT attack is using network traffic. Due to the nature of the APT attack in terms of being on the network for a long time and the fact that the system may crash due to the high traffic, it is difficult to detect this type of attack. Hence, in this study, machine learning methods of C5.0 decision tree, Bayesian network, and deep learning are used for the timely detection and classification of APT-attacks on the NSL-KDD dataset. Moreover, a 10-fold cross-validation method is used to experiment with these models. As a result, the accuracy (ACC) of the C5.0 decision tree, Bayesian network, and 6-layer deep learning models is obtained as 95.64%, 88.37%, and 98.85%, respectively. Also, in terms of the critical criterion of the false positive rate (FPR), the FPR value for the C5.0 decision tree, Bayesian network, and 6-layer deep learning models is obtained as 2.56, 10.47, and 1.13, respectively. Other criterions such as sensitivity, specificity, accuracy, false-negative rate, and F-measure are also investigated for the models, and the experimental results show that the deep learning model with automatic multi-layered extraction of features has the best performance for timely detection of an APT-attack comparing to other classification models.
Face recognition is the process of identifying people through facial images. It has become vital for security and surveillance applications and required everywhere including institutions, ...organizations, offices, and social places. There are a number of challenges faced in face recognition which includes face pose, age, gender, illumination, and other variable condition. Another challenge is that the database size for these applications is usually small. So, training and recognition become difficult. Face recognition methods can be divided into two major categories, appearance-based method and feature-based method. In this paper, the authors have presented the feature-based method for 2D face images. speeded up robust features (SURF) and scale-invariant feature transform (SIFT) are used for feature extraction. Five public datasets, namely Yale2B, Face 94, M2VTS, ORL, and FERET, are used for experimental work. Various combinations of SIFT and SURF features with two classification techniques, namely decision tree and random forest, have experimented in this work. A maximum recognition accuracy of 99.7% has been reported by the authors with a combination of SIFT (64-components) and SURF (32-components).
Display omitted
•The correlation between flood and explanatory factors are investigated.•PCF is integrated with tree-based methods for spatial prediction of flood.•RF illuminates a superior reliable ...model in flood susceptibility assessment.
Floods are one of the most devastating types of disasters that cause loss of lives and property worldwide each year. This study aimed to evaluate and compare the prediction capability of the naïve Bayes tree (NBTree), alternating decision tree (ADTree), and random forest (RF) methods for the spatial prediction of flood occurrence in the Quannan area, China. A flood inventory map with 363 flood locations was produced and partitioned into training and validation datasets through random selection with a ratio of 70/30. The spatial flood database was constructed using thirteen flood explanatory factors. The probability certainty factor (PCF) method was used to analyze the correlation between the factors and flood occurrences. Consequently, three flood susceptibility maps were produced using the NBTree, ADTree, and RF methods. Finally, the area under the curve (AUC) and statistical measures were used to validate the flood susceptibility models. The results indicated that the RF method is an efficient and reliable model in flood susceptibility assessment, with the highest AUC values, positive predictive rate, negative predictive rate, sensitivity, specificity, and accuracy for the training (0.951, 0.892, 0.941, 0.945, 0.886, and 0.915, respectively) and validation (0.925, 0.851, 0.938, 0.945, 0.835, and 0.890, respectively) datasets.
The ensemble method random forests has become a popular classification tool in bioinformatics and related fields. The out-of-bag error is an error estimation technique often used to evaluate the ...accuracy of a random forest and to select appropriate values for tuning parameters, such as the number of candidate predictors that are randomly drawn for a split, referred to as mtry. However, for binary classification problems with metric predictors it has been shown that the out-of-bag error can overestimate the true prediction error depending on the choices of random forests parameters. Based on simulated and real data this paper aims to identify settings for which this overestimation is likely. It is, moreover, questionable whether the out-of-bag error can be used in classification tasks for selecting tuning parameters like mtry, because the overestimation is seen to depend on the parameter mtry. The simulation-based and real-data based studies with metric predictor variables performed in this paper show that the overestimation is largest in balanced settings and in settings with few observations, a large number of predictor variables, small correlations between predictors and weak effects. There was hardly any impact of the overestimation on tuning parameter selection. However, although the prediction performance of random forests was not substantially affected when using the out-of-bag error for tuning parameter selection in the present studies, one cannot be sure that this applies to all future data. For settings with metric predictor variables it is therefore strongly recommended to use stratified subsampling with sampling fractions that are proportional to the class sizes for both tuning parameter selection and error estimation in random forests. This yielded less biased estimates of the true prediction error. In unbalanced settings, in which there is a strong interest in predicting observations from the smaller classes well, sampling the same number of observations from each class is a promising alternative.
Hydraulic brakes in automobiles are important components for the safety of passengers; therefore, the brakes are a good subject for condition monitoring. The condition of the brake components can be ...monitored by using the vibration characteristics. On-line condition monitoring by using machine learning approach is proposed in this paper as a possible solution to such problems. The vibration signals for both good as well as faulty conditions of brakes were acquired from a hydraulic brake test setup with the help of a piezoelectric transducer and a data acquisition system. Descriptive statistical features were extracted from the acquired vibration signals and the feature selection was carried out using the C4.5 decision tree algorithm. There is no specific method to find the right number of features required for classification for a given problem. Hence an extensive study is needed to find the optimum number of features. The effect of the number of features was also studied, by using the decision tree as well as Support Vector Machines (SVM). The selected features were classified using the C-SVM and Nu-SVM with different kernel functions. The results are discussed and the conclusion of the study is presented.
•First time brake fault diagnosis using machine learning approach is reported.•Statistical parameters were used as features and SVM was used as classifier.•Feature selection was performed using decision tree.•Effect of number of features was studied using decision tree and SVM.•The classification accuracy of 98% was obtained.
Random forest for label ranking Zhou, Yangming; Qiu, Guoping
Expert systems with applications,
12/2018, Letnik:
112
Journal Article
Recenzirano
Odprti dostop
•An effective random forest based label ranking method is proposed.•A novel two-step rank aggregation strategy is proposed.•The proposed method is evaluated on benchmarks with complete and partial ...ranking.•The proposed method is highly competitive compared with state-of-the-art methods.
Label ranking aims to learn a mapping from instances to rankings over a finite number of predefined labels. Random forest is a powerful and one of the most successful general-purpose machine learning algorithms of modern times. In this paper, we present a powerful random forest label ranking method which uses random decision trees to retrieve nearest neighbors. We have developed a novel two-step rank aggregation strategy to effectively aggregate neighboring rankings discovered by the random forest into a final predicted ranking. Compared with existing methods, the new random forest method has many advantages including its intrinsically scalable tree data structure, highly parallel-able computational architecture and much superior performance. We present extensive experimental results to demonstrate that our new method achieves the highly competitive performance compared with state-of-the-art methods for datasets with complete ranking and datasets with only partial ranking information.
In recent years, the variety and volume of data acquired by modern analytical instruments in order to conduct a better authentication of food has dramatically increased. Several pattern recognition ...tools have been developed to deal with the large volume and complexity of available trial data. The most widely used methods are principal component analysis (PCA), partial least squares-discriminant analysis (PLS-DA), soft independent modelling by class analogy (SIMCA), k-nearest neighbours (kNN), parallel factor analysis (PARAFAC), and multivariate curve resolution-alternating least squares (MCR-ALS). Nevertheless, there are alternative data treatment methods, such as support vector machine (SVM), classification and regression tree (CART) and random forest (RF), that show a great potential and more advantages compared to conventional ones. In this paper, we explain the background of these methods and review and discuss the reported studies in which these three methods have been applied in the area of food quality and authenticity. In addition, we clarify the technical terminology used in this particular area of research.
Display omitted
•Alternative machine learning methods to perform the authentication of foods are described•Chemometric multivariate tools are similar to data mining methods•The terms used in different work areas are discussed and defined•RF and SVM methods provide better results than the traditional chemometrics in the food quality field