•Evidence Lower Bound on incomplete datasets, computed only on the observed data, regardless of the pattern of missing data.•Generative model that handles mixed numerical and nominal likelihood ...models, parametrized using deep neural networks (DNNs).•Stable recognition model that handles incomplete datasets without increasing its complexity or promoting overfitting.•Data-normalization input/output layer prevents a few dimensions of the data dominating the training of the VAE, improving the training convergence.•Comparison with state-of-the-art methods on six datasets for both missing data imputation and predictive tasks.
Variational autoencoders (VAEs), as well as other generative models, have been shown to be efficient and accurate for capturing the latent structure of vast amounts of complex high-dimensional data. However, existing VAEs can still not directly handle data that are heterogenous (mixed continuous and discrete) or incomplete (with missing data at random), which is indeed common in real-world applications.
In this paper, we propose a general framework to design VAEs suitable for fitting incomplete heterogenous data. The proposed HI-VAE includes likelihood models for real-valued, positive real valued, interval, categorical, ordinal and count data, and allows accurate estimation (and potentially imputation) of missing data. Furthermore, HI-VAE presents competitive predictive performance in supervised tasks, outperforming supervised models when trained on incomplete data.
Machine learning is increasingly used to inform decision making in sensitive situations where decisions have consequential effects on individuals’ lives. In these settings, in addition to requiring ...models to be accurate and robust, socially relevant values such as fairness, privacy, accountability, and explainability play an important role in the adoption and impact of said technologies. In this work, we focus on algorithmic recourse, which is concerned with providing explanations and recommendations to individuals who are unfavorably treated by automated decision-making systems. We first perform an extensive literature review, and align the efforts of many authors by presenting unified definitions, formulations, and solutions to recourse. Then, we provide an overview of the prospective research directions toward which the community may engage, challenging existing assumptions and making explicit connections to other ethical challenges such as security, privacy, and fairness.
Neural population responses in sensory systems are driven by external physical stimuli. This stimulus-response relationship is typically characterized by receptive fields, which have been estimated ...by neural system identification approaches. Such models usually require a large amount of training data, yet, the recording time for animal experiments is limited, giving rise to epistemic uncertainty for the learned neural transfer functions. While deep neural network models have demonstrated excellent power on neural prediction, they usually do not provide the uncertainty of the resulting neural representations and derived statistics, such as most exciting inputs (MEIs), from in silico experiments. Here, we present a Bayesian system identification approach to predict neural responses to visual stimuli, and explore whether explicitly modeling network weight variability can be beneficial for identifying neural response properties. To this end, we use variational inference to estimate the posterior distribution of each model weight given the training data. Tests with different neural datasets demonstrate that this method can achieve higher or comparable performance on neural prediction, with a much higher data efficiency compared to Monte Carlo dropout methods and traditional models using point estimates of the model parameters. At the same time, our variational method provides us with an effectively infinite ensemble, avoiding the idiosyncrasy of any single model, to generate MEIs. It allows to estimate the uncertainty of stimulus-response function, which we have found to be negatively correlated with the predictive performance at model level and may serve to evaluate models. Furthermore, our approach enables us to identify response properties with credible intervals and to determine whether the inferred features are meaningful by performing statistical tests on MEIs. Finally, in silico experiments show that our model generates stimuli driving neuronal activity significantly better than traditional models in the limited-data regime.Neural population responses in sensory systems are driven by external physical stimuli. This stimulus-response relationship is typically characterized by receptive fields, which have been estimated by neural system identification approaches. Such models usually require a large amount of training data, yet, the recording time for animal experiments is limited, giving rise to epistemic uncertainty for the learned neural transfer functions. While deep neural network models have demonstrated excellent power on neural prediction, they usually do not provide the uncertainty of the resulting neural representations and derived statistics, such as most exciting inputs (MEIs), from in silico experiments. Here, we present a Bayesian system identification approach to predict neural responses to visual stimuli, and explore whether explicitly modeling network weight variability can be beneficial for identifying neural response properties. To this end, we use variational inference to estimate the posterior distribution of each model weight given the training data. Tests with different neural datasets demonstrate that this method can achieve higher or comparable performance on neural prediction, with a much higher data efficiency compared to Monte Carlo dropout methods and traditional models using point estimates of the model parameters. At the same time, our variational method provides us with an effectively infinite ensemble, avoiding the idiosyncrasy of any single model, to generate MEIs. It allows to estimate the uncertainty of stimulus-response function, which we have found to be negatively correlated with the predictive performance at model level and may serve to evaluate models. Furthermore, our approach enables us to identify response properties with credible intervals and to determine whether the inferred features are meaningful by performing statistical tests on MEIs. Finally, in silico experiments show that our model generates stimuli driving neuronal activity significantly better than traditional models in the limited-data regime.
Infinite Factorial Unbounded-State Hidden Markov Model Valera, Isabel; Ruiz, Francisco J. R.; Perez-Cruz, Fernando
IEEE transactions on pattern analysis and machine intelligence,
2016-Sept.-1, 2016-09-00, 2016-9-1, 20160901, Volume:
38, Issue:
9
Journal Article
Peer reviewed
There are many scenarios in artificial intelligence, signal processing or medicine, in which a temporal sequence consists of several unknown overlapping independent causes, and we are interested in ...accurately recovering those canonical causes. Factorial hidden Markov models (FHMMs) present the versatility to provide a good fit to these scenarios. However, in some scenarios, the number of causes or the number of states of the FHMM cannot be known or limited a priori. In this paper, we propose an infinite factorial unbounded-state hidden Markov model (IFUHMM), in which the number of parallel hidden Markov models (HMMs) and states in each HMM are potentially unbounded. We rely on a Bayesian nonparametric (BNP) prior over integer-valued matrices, in which the columns represent the Markov chains, the rows the time indexes, and the integers the state for each chain and time instant. First, we extend the existent infinite factorial binary-state HMM to allow for any number of states. Then, we modify this model to allow for an unbounded number of states and derive an MCMC-based inference algorithm that properly deals with the trade-off between the unbounded number of states and chains. We illustrate the performance of our proposed models in the power disaggregation problem.
The emergence and wide-spread use of online social networks has led to a dramatic increase on the availability of social activity data. Importantly, this data can be exploited to investigate, at a ...microscopic level, some of the problems that have captured the attention of economists, marketers and sociologists for decades, such as, e.g., product adoption, usage and competition. In this paper, we propose a continuous-time probabilistic model, based on temporal point processes, for the adoption and frequency of use of competing products, where the frequency of use of one product can be modulated by those of others. This model allows us to efficiently simulate the adoption and recurrent usages of competing products, and generate traces in which we can easily recognize the effect of social influence, recency and competition. We then develop an inference method to efficiently fit the model parameters by solving a convex program. The problem decouples into a collection of smaller subproblems, thus scaling easily to networks with hundred of thousands of nodes. We validate our model over synthetic and real diffusion data gathered from Twitter, and show that the proposed model does not only provides a good fit to the data and more accurate predictions than alternatives but also provides interpretable model parameters, which allow us to gain insights into some of the factors driving product adoption and frequency of use.
We aim at finding the comorbidity patterns of substance abuse, mood and personality disorders using the diagnoses from the National Epidemiologic Survey on Alcohol and Related Conditions database. To ...this end, we propose a novel Bayesian nonparametric latent feature model for categorical observations, based on the Indian buffet process, in which the latent variables can take values between 0 and 1. The proposed model has several interesting features for modeling psychiatric disorders. First, the latent features might be off, which allows distinguishing between the subjects who suffer a condition and those who do not. Second, the active latent features take positive values, which allows modeling the extent to which the patient has that condition. We also develop a new Markov chain Monte Carlo inference algorithm for our model that makes use of a nested expectation propagation procedure.
Multitask learning is being increasingly adopted in applications domains like computer vision and reinforcement learning. However, optimally exploiting its advantages remains a major challenge due to ...the effect of negative transfer. Previous works have tracked down this issue to the disparities in gradient magnitudes and directions across tasks, when optimizing the shared network parameters. While recent work has acknowledged that negative transfer is a two-fold problem, existing approaches fall short as they only focus on either homogenizing the gradient magnitude across tasks; or greedily change the gradient directions, overlooking future conflicts. In this work, we introduce RotoGrad, an algorithm that tackles negative transfer as a whole: it jointly homogenizes gradient magnitudes and directions, while ensuring training convergence. We show that RotoGrad outperforms competing methods in complex problems, including multi-label classification in CelebA and computer vision tasks in the NYUv2 dataset. A Pytorch implementation can be found in https://github.com/adrianjav/rotograd.
New communication standards need to deal with machine-to-machine communications, in which users may start or stop transmitting at any time in an asynchronous manner. Thus, the number of users is an ...unknown and time-varying parameter that needs to be accurately estimated in order to properly recover the symbols transmitted by all users in the system. In this paper, we address the problem of joint channel parameter and data estimation in a multiuser communication channel in which the number of transmitters is not known. For that purpose, we develop the infinite factorial finite state machine model, a Bayesian nonparametric model based on the Markov Indian buffet that allows for an unbounded number of transmitters with arbitrary channel length. We propose an inference algorithm that makes use of slice sampling and particle Gibbs with ancestor sampling. Our approach is fully blind as it does not require a prior channel estimation step, prior knowledge of the number of transmitters, or any signaling information. Our experimental results, loosely based on the LTE random access channel, show that the proposed approach can effectively recover the data-generating process for a wide range of scenarios, with varying number of transmitters, number of receivers, constellation order, channel length, and signal-to-noise ratio.