NUK - logo
E-viri
Celotno besedilo
Recenzirano Odprti dostop
  • Non-removal strategy for ou...
    Castejón-limas, Manuel; Alaiz-Moreton, Hector; Fernández-Robles, Laura; Alfonso-Cendón, Javier; Fernández-Llamas, Camino; Sánchez-González, lidia; Pérez, Hilde

    Logic journal of the IGPL, 08/2020, Letnik: 28, Številka: 4
    Journal Article

    Abstract This paper reports the experience of using the PAELLA algorithm as a helper tool in robust regression instead of as originally intended for outlier identification and removal. This novel usage of the algorithm takes advantage of the occurrence vector calculated by the algorithm in order to strengthen the effect of the more reliable samples and lessen the impact of those that otherwise would be considered outliers. Following that aim, a series of experiments is conducted in order to learn how to better use the information contained in the occurrence vector. Using a contrively difficult artificial data set, a reference predictive model is fit using the whole raw dataset. The second experiment reports the results of fitting a similar predictive model but discarding the samples marked as outliers by PAELLA. The third experiment uses the occurrence vector provided by PAELLA in order to classify the observations in multiple bins and fit every possible model changing which bins are considered for fitting and which are discarded in that particular model. The fourth experiment introduces a sampling process before fitting in which the occurrence vector represents the likelihood of being considered in the training data set. The fifth experiment considers the sampling process as an internal step to be performed interleaved between the training epochs. The last experiment compares our approach using weighted neural networks to a state of the art method.