•Explores how binarization permits/improves diversification in deep machines.•Shows the effectiveness of pre-emphasizing samples for deep classification.•Combines the above with data augmentation to ...reach record results.•Opens further research lines in deep learning.
To aggregate diverse learners and to train deep architectures are the two principal avenues towards increasing the expressive capabilities of neural networks. Therefore, their combinations merit attention. In this contribution, we study how to apply some conventional diversity methods –bagging and label switching– to a general deep machine, the stacked denoising auto-encoding classifier, in order to solve a number of appropriately selected image recognition problems. The main conclusion of our work is that binarizing multi-class problems is the key to obtain benefit from those diversity methods.
Additionally, we check that adding other kinds of performance improvement procedures, such as pre-emphasizing training samples and elastic distortion mechanisms, further increases the quality of the results. In particular, an appropriate combination of all the above methods leads us to reach a new absolute record in classifying MNIST handwritten digits.
These facts reveal that there are clear opportunities for designing more powerful classifiers by means of combining different improvement techniques.
•Imbalance can be mitigated by rebalancing (costs, population) or ensemble learning.•Asymmetric label switching creates diversity in ensemble learning.•Rebalancing and switching can be combined in a ...principled way.•Optimum decision thresholds for these combinations are analytically derived.•A gating network aggregating the learners contributions improves performance.
Asymmetric label switching is an effective and principled method for creating a diverse ensemble of learners for imbalanced classification problems. This technique can be combined with other rebalancing mechanisms, such as those based on cost policies or class proportion modifications. In this study, and under the Bayesian theory framework, we specify the optimal decision thresholds for the combination of these mechanisms. In addition, we propose using a gating network to aggregate the learners contributions as an additional mechanism to improve the overall performance of the system.
•The main contribution of this manuscript is a new training algorithm for binary classification using neural networks.•The training algorithm is based on the minimization of an estimate of the Bayes ...risk.•Parzen windows method is used to estimate the conditional distributions necessary to compute the probabilities of error included in the Bayes risk.•A new set of training algorithms emerge from this Bayes risk minimization formulation using Parzen windows.•Some interesting relationships with classical training methods are discovered.
A new training algorithm for neural networks in binary classification problems is presented. It is based on the minimization of an estimate of the Bayes risk by using Parzen windows applied to the final one-dimensional nonlinear transformation of the samples to estimate the probability of classification error. This leads to a very general approach to error minimization and training, where the risk that is to be minimized is defined in terms of integrated one-dimensional Parzen windows, and the gradient descent algorithm used to minimize this risk is a function of the window that is used. By relaxing the constraints that are typically applied to Parzen windows when used for probability density function estimation, for example by allowing them to be non-symmetric or possibly infinite in duration, an entirely new set of training algorithms emerge. In particular, different Parzen windows lead to different cost functions, and some interesting relationships with classical training methods are discovered. Experiments with synthetic and real benchmark datasets show that with the appropriate choice of window, fitted to the specific problem, it is possible to improve the performance of neural network classifiers over those that are trained using classical methods.
On improving CNNs performance: The case of MNIST Alvear-Sandoval, Ricardo F.; Sancho-Gómez, José L.; Figueiras-Vidal, Aníbal R.
Information fusion,
December 2019, 2019-12-00, Letnik:
52
Journal Article
Recenzirano
•We check that CNNs accept performance improvement techniques in MNIST.•These techniques reduce the advantage of CNNs over SDAE classifiers.•Adding a SDAE classifier over an improved CNN ensemble ...improves results.•The above approaches provide MNIST classification records.•We indicate some research lines emerging from this work.
In this note, we follow two directions to improve the performance of CNN classifiers. The first is to apply to CNN units the same improvement techniques that we have successfully used with Stacked Denoising Auto-Encoder classifiers. This leads to obtain a new performance record when classifying MNIST digits. The second consists of applying a Stacked Denoising Auto-Encoder classifier to the output of the best of the previous designs, trying to take advantage of the limitations of CNN architectures. An even better classification record is obtained for MNIST.
The above results permit to conclude that combining improvement techniques and stacking deep machines of different nature can be useful to better solve other real-world problems.
It has been demonstrated that modified denoising stacking autoencoders (MSDAEs) serve to implement high-performance missing value imputation schemes. On the other hand, complete MSDAE (CMSDAE) ...classifiers, which extend their inputs with target estimates from an auxiliary classifier and are layer by layer trained to recover both the observation and the target estimates, offer classification results that are better than those provided by MSDAEs. As a consequence, investigating whether CMSDAEs can improve the MSDAEs imputation processes has an obvious practical importance. In this correspondence, two types of imputation mechanisms with CMSDAEs are considered. The first is a direct procedure in which the CMSDAE output is just the target. The second mechanism is suggested by the presence of the targets in the vectors to be autoencoded, and it uses the well-known multitask learning (MTL) ideas, including the observations as a secondary task. Experimental results show that these CMSDAE structures increase the quality of the missing value imputations, in particular the MTL versions. They give the best result in 5 out of 6 missing value problems.
Using functional weights in a conventional linear combination architecture is a way of obtaining expressive power and represents an alternative to classical trainable and implicit nonlinear ...transformations. In this brief, we explore this way of constructing binary classifiers, taking advantage of the possibility of generating functional weights by means of a gate with fixed radial basis functions. This particular form of the gate permits training the machine directly with maximal margin algorithms. We call the resulting scheme "feature combiners with gate generated weights for classification." Experimental results show that these architectures outperform support vector machines (SVMs) and Real AdaBoost ensembles in most considered benchmark examples. An increase in the computational design effort due to cross-validation demands is the price to be paid to obtain this advantage. Nevertheless, the operational effort is usually lower than that needed by SVMs.
•We designed and implemented a novel ensemble based on Class-Switching to deal with the imbalanced class problem.•The ensemble SwitchingNED changes a fraction of instances of the majority class to ...the minority class following a new section method based on Nearest Enemy Distance.•This procedure in combination with traditional data sampling techniques achieves the equilibrium of the class distributions.•We compare the resulting SwitchingNED with five distinctive ensemble-based approaches. With a better performance, SwitchingNED is settled as one of best approaches on the field.
The imbalanced data classification has been deeply studied by the machine learning practitioners over the years and it is one of the most challenging problems in the field. In many real-life situations, the under representation of a class in contrary to the rest commonly produces the tendency to ignore the minority class, this being normally the target of the problem. Consequently, many different techniques have been proposed. Among those, the ensemble approaches have resulted to be very reliable. New ways of generating ensembles have also been studied for standard classification. In particular, Class Switching, as a mechanism to produce training perturbed sets, has been proved to perform well in slightly imbalanced scenarios. In this paper, we analyze its potential to deal with highly imbalanced problems, fighting against its major limitations. We introduce a novel ensemble approach based on Switching with a new technique to select the switched examples based on Nearest Enemy Distance. We compare the resulting SwitchingNED with five distinctive ensemble-based approaches, with different combinations of sampling techniques. With a better performance, SwitchingNED is settled as one of best approaches on the field.
► This paper provides a reasonable classification-oriented imputation mechanism. ► Multi-Task Learning offers an interesting approach to solve missing data problems. ► Classification and imputation ...are combined in only one neural architecture. ► A balance between both classification and imputation tasks is achieved using MTL. ► Extensive experiments show the proposed method has a valuable robustness property.
Datasets with missing values are frequent in real-world classification problems. It seems obvious that imputation of missing values can be considered as a series of secondary tasks, while classification is the main purpose of any machine dealing with these datasets. Consequently, Multi-Task Learning (MTL) schemes offer an interesting alternative approach to solve missing data problems. In this paper, we propose an MTL-based method for training and operating a modified Multi-Layer Perceptron (MLP) architecture to work in incomplete data contexts. The proposed approach achieves a balance between both classification and imputation by exploiting the advantages of MTL. Extensive experimental comparisons with well-known imputation algorithms show that this approach provides excellent results. The method is never worse than the traditional algorithms – an important robustness property – and, also, it clearly outperforms them in several problems.
Complete modified stacked denoising auto-encoder (CMSDAE) machines constitute a version of stacked auto-encoders in which a target estimate is included at the input, and are trained layer-by-layer by ...minimizing a convex combination of the errors corresponding to the input sample and the target. This permits to carry out the transformation of the observation space without forgetting what the target is. It has been shown in recent publications that this method produces a clear performance advantage in classification tasks. The above facts motivate to explore whether CMSDAE machines also offer performance improvements in regression problems, and in particular for time series prediction where conventional discriminative machines find difficulties: The layer-by-layer reconstruction of the target (together with the input) can help to reduce these difficulties. This contribution presents the CMSDAE regression/prediction machines and their design, showing experimental evidence of their frequent superior performance —never lower— with respect to other machine architectures. Some subsequent research directions are indicated together with the conclusions.