UNI-MB - logo
UMNIK - logo
 
E-resources
Peer reviewed Open access
  • Handling incomplete heterog...
    Nazábal, Alfredo; Olmos, Pablo M.; Ghahramani, Zoubin; Valera, Isabel

    Pattern recognition, November 2020, 2020-11-00, Volume: 107
    Journal Article

    •Evidence Lower Bound on incomplete datasets, computed only on the observed data, regardless of the pattern of missing data.•Generative model that handles mixed numerical and nominal likelihood models, parametrized using deep neural networks (DNNs).•Stable recognition model that handles incomplete datasets without increasing its complexity or promoting overfitting.•Data-normalization input/output layer prevents a few dimensions of the data dominating the training of the VAE, improving the training convergence.•Comparison with state-of-the-art methods on six datasets for both missing data imputation and predictive tasks. Variational autoencoders (VAEs), as well as other generative models, have been shown to be efficient and accurate for capturing the latent structure of vast amounts of complex high-dimensional data. However, existing VAEs can still not directly handle data that are heterogenous (mixed continuous and discrete) or incomplete (with missing data at random), which is indeed common in real-world applications. In this paper, we propose a general framework to design VAEs suitable for fitting incomplete heterogenous data. The proposed HI-VAE includes likelihood models for real-valued, positive real valued, interval, categorical, ordinal and count data, and allows accurate estimation (and potentially imputation) of missing data. Furthermore, HI-VAE presents competitive predictive performance in supervised tasks, outperforming supervised models when trained on incomplete data.