•An estimate of the Bayes cost is proposed as the loss to train neural networks for ordinal classification of imbalanced data.•The network parameters, as well as the decision thresholds, are updated ...during training to minimize the Bayes cost.•The neural network architecture has a single neuron in the output layer (one-dimensional input space).•Both shallow networks and deep networks can be used.•Experiments with real data show the accuracy and flexibility of the proposed method, specially in imbalanced problems.
Ordinal classification of imbalanced data is a challenging problem that appears in many real world applications. The challenge is to simultaneously consider the order of the classes and the class imbalance, which can notably improve the performance metrics. The Bayesian formulation allows to deal with these two characteristics jointly: It takes into account the prior probability of each class and the decision costs, which can be used to include the imbalance and the ordinal information, respectively. We propose to use the Bayesian formulation to train neural networks, which have shown excellent results in many classification tasks. A loss function is proposed to train networks with a single neuron in the output layer and a threshold based decision rule. The loss is an estimate of the Bayesian classification cost, based on the Parzen windows estimator, which is fitted for a thresholded decision. Experiments with several real datasets show that the proposed method provides competitive results in different scenarios, due to its high flexibility to specify the relative importance of the errors in the classification of patterns of different classes, considering the order and independently of the probability of each class.
Pattern classification with missing data: a review García-Laencina, Pedro J.; Sancho-Gómez, José-Luis; Figueiras-Vidal, Aníbal R.
Neural computing & applications,
03/2010, Letnik:
19, Številka:
2
Journal Article
Recenzirano
Pattern classification has been successfully applied in many problem domains, such as biometric recognition, document classification or medical diagnosis. Missing or unknown data are a common ...drawback that pattern recognition techniques need to deal with when solving real-life classification tasks. Machine learning approaches and methods imported from statistical learning theory have been most intensively studied and used in this subject. The aim of this work is to analyze the missing data problem in pattern classification tasks, and to summarize and compare some of the well-known methods used for handling missing values.
Combination approaches provide an interesting way to improve adaptive filter performance. In this paper, we study the mean-square performance of a convex combination of two transversal filters. The ...individual filters are independently adapted using their own error signals, while the combination is adapted by means of a stochastic gradient algorithm in order to minimize the error of the overall structure. General expressions are derived that show that the method is universal with respect to the component filters, i.e., in steady-state, it performs at least as well as the best component filter. Furthermore, when the correlation between the a priori errors of the components is low enough, their combination is able to outperform both of them. Using energy conservation relations, we specialize the results to a combination of least mean-square filters operating both in stationary and in nonstationary scenarios. We also show how the universality of the scheme can be exploited to design filters with improved tracking performance.
Adaptively Biasing the Weights of Adaptive Filters Lazaro-Gredilla, Miguel; Azpicueta-Ruiz, Luis A; Figueiras-Vidal, Aníbal R ...
IEEE transactions on signal processing,
2010-July, 2010-07-00, 20100701, Letnik:
58, Številka:
7
Journal Article
Recenzirano
Odprti dostop
It is a well-known result of estimation theory that biased estimators can outperform unbiased ones in terms of expected quadratic error. In steady state, many adaptive filtering algorithms offer an ...unbiased estimation of both the reference signal and the unknown true parameter vector. In this correspondence, we propose a simple yet effective scheme for adaptively biasing the weights of adaptive filters using an output multiplicative factor. We give theoretical results that show that the proposed configuration is able to provide a convenient bias versus variance tradeoff, leading to reductions in the filter mean-square error, especially in situations with a low signal-to-noise ratio (SNR). After reinterpreting the biased estimator as the combination of the original filter and a filter with constant output equal to 0, we propose practical schemes to adaptively adjust the multiplicative factor. Experiments are carried out for the normalized least-mean-squares (NLMS) adaptive filter, improving its mean-square performance in stationary situations and during the convergence phase.
A principled methodology for solving imbalanced binary classification problems has been recently introduced. It permits to obtain high performance designs avoiding the risks of degradation that other ...procedures suffer from. The corresponding paper Benítez-Buenache et al. (2019) shows evidence of these facts by applying direct versions, using just one of the possible rebalancing techniques and applying full rebalancing.
In this contribution, we extend the above study for maximizing the performance of the resulting designs. To this end, we combine principled techniques in order to taking benefit from their different characteristics. The combination weights as well as the rebalance degree are selected by means of a simple (cross-validation) search. A number of experiments with different kinds of databases shows significant performance improvements. At the same time, the database characteristics that limit the performance improvements −such as small size and noisy samples− are detected.
•Combining different (neutral) principled rebalancing techniques is proposed.•The combination degree and the rebalancing intensity are found by cross validation.•Extensive experiments support the effectiveness of the proposal.•Shallow and deep neural networks and ensembles are used in the experiments.•The database characteristics that reduce combinations performance are detected.
•We introduce a generalized form of emphasis weights for building boosting ensembles.•The main novelty of this emphasis is the inclusion of an intensity regulation term.•The algorithm is easy and ...offers results never worse than standard boosting.•In a significant number of problems, the new algorithm outperforms Real Adaboost.•The algorithm is competitive with other computationally more intensive forms.
Boosting ensembles have deserved much attention because their high performance. But they are also sensitive to adverse conditions, such as noisy environments or the presence of outliers. A way to fight against their degradation is to modify the forms of the emphasis weighting which is applied to train each new learner. In this paper, we propose to use a general form for that emphasis function, which not only includes an error dependent and a proximity to the classification boundary dependent term, but also a constant value which serves to control how much emphasis is applied. Two convex combinations are used to consider these terms, and this makes possible to control their relative influence. Experimental results support the effectiveness of this general form of boosting emphasis.
Example-dependent cost classification is a special case of pattern classification where the costs are specific for each individual pattern. Most of the practical applications related to this kind of ...classification problem exhibit class imbalance in the available data, thus including an additional difficulty to the classification task. This problem has high practical importance because it appears intrinsically in relevant application fields, such as Finance or Health. We propose to use a 2-step Bayesian methodology to solve this problem because its formulation allows the inclusion of the individual example costs in the classification and takes into account the class probabilities. In particular, the main contribution is to apply principled rebalancing classification algorithms in the first step: We propose 3 Neural Network based learning machines, WR-MLP, WSR-MLPE and WSR-DNN, to provide the estimates of the required conditional probabilities for the Bayesian test. Unlike some similar approaches in the literature that use heuristic methods in the first step, which in most cases require calibration mechanisms to compensate for the estimation biases, the consistency of the proposed estimates is theoretically supported, thus providing a clear potential advantage. Experiments with seven real-world datasets show that the proposed methods are competitive against eleven state-of-the-art benchmarks, and provide an advantage in the less favourable situations: cases with a strong imbalance and highly nonlinear classification borders.
•Example-dependent cost classification problems are usually imbalanced.•The state-of-the-art methods are not principled and their performance is not robust.•We propose to apply principled rebalancing classification algorithms.•We propose principled mixed rebalancing techniques to combat imbalance.•The proposed methods improve the state of the art in performance and robustness.
•A principled method for imbalanced classification is presented.•The iff conditions for principled re-balancing are established.•Informed two-step re-balancing techniques are introduced.•Extensive ...examples support the analysis.
This contribution proves that neutral re-balancing mechanisms, that do not alter the likelihood ratio, and training discriminative machines using Bregman divergences as surrogate costs are necessary and sufficient conditions to estimate the likelihood ratio of imbalanced binary classification problems in a consistent manner. These two conditions permit the estimation of the theoretical Neyman–Pearson operating characteristic corresponding to the problem under study. In practice, a classifier operates at a certain working point corresponding to, for example, a given false positive rate. This perspective allows the introduction of an additional principled procedure to improve classification performance by means of a second design step in which more weight is assigned to the appropriate training samples. The paper includes a number of examples that demonstrate the performance capabilities of the methods presented, and concludes with a discussion of relevant research directions and open problems in the area.
The standard Gaussian Process regression (GP) is usually formulated under stationary hypotheses: The noise power is considered constant throughout the input space and the covariance of the prior ...distribution is typically modeled as depending only on the difference between input samples. These assumptions can be too restrictive and unrealistic for many real-world problems. Although nonstationarity can be achieved using specific covariance functions, they require a prior knowledge of the kind of nonstationarity, not available for most applications. In this paper we propose to use the Laplace approximation to make inference in a divisive GP model to perform nonstationary regression, including heteroscedastic noise cases. The log-concavity of the likelihood ensures a unimodal posterior and makes that the Laplace approximation converges to a unique maximum. The characteristics of the likelihood also allow to obtain accurate posterior approximations when compared to the Expectation Propagation (EP) approximations and the asymptotically exact posterior provided by a Markov Chain Monte Carlo implementation with Elliptical Slice Sampling (ESS), but at a reduced computational load with respect to both, EP and ESS.
Proportionate adaptive filters, such as those based on the improved proportionate normalized least-mean-square (IPNLMS) algorithm, have been proposed for echo cancellation as an interesting ...alternative to the normalized least-mean-square (NLMS) filter. Proportionate schemes offer improved performance when the echo path is sparse, but are still subject to some compromises regarding their convergence properties and steady-state error. In this paper, we study how combination schemes, where the outputs of two independent adaptive filters are adaptively mixed together, can be used to increase IPNLMS robustness to channels with different degrees of sparsity, as well as to alleviate the rate of convergence versus steady-state misadjustment tradeoff imposed by the selection of the step size. We also introduce a new block-based combination scheme which is specifically designed to further exploit the characteristics of the IPNLMS filter. The advantages of these combined filters are justified theoretically and illustrated in several echo cancellation scenarios.