Recommendations as treatments Joachims, Thorsten; London, Ben; Su, Yi ...
The AI magazine,
09/2021, Volume:
42, Issue:
3
Journal Article
Peer reviewed
Open access
In recent years, a new line of research has taken an interventional view of recommender systems, where recommendations are viewed as actions that the system takes to have a desired effect. This ...interventional view has led to the development of counterfactual inference techniques for evaluating and optimizing recommendation policies. This article explains how these techniques enable unbiased offline evaluation and learning despite biased data, and how they can inform considerations of fairness and equity in recommender systems.
Recommendations as treatments Joachims, Thorsten; London, Ben; Su, Yi ...
The AI magazine,
10/2021, Volume:
42, Issue:
3
Journal Article
Peer reviewed
In recent years, a new line of research has taken an interventional view of recommender systems, where recommendations are viewed as actions that the system takes to have a desired effect. This ...interventional view has led to the development of counterfactual inference techniques for evaluating and optimizing recommendation policies. This article explains how these techniques enable unbiased offline evaluation and learning despite biased data, and how they can inform considerations of fairness and equity in recommender systems.
Collective Graph Identification Namata, Galileo Mark; London, Ben; Getoor, Lise
ACM transactions on knowledge discovery from data,
02/2016, Volume:
10, Issue:
3
Journal Article
Peer reviewed
Data describing networks—such as communication networks, transaction networks, disease transmission networks, collaboration networks, etc.—are becoming increasingly available. While observational ...data can be useful, it often only hints at the actual underlying process that governs interactions and attributes. For example, an email communication network provides insight into its users and their relationships, but is not the same as the “real” underlying social network. In this article, we introduce the problem of
graph identification
, i.e., discovering the latent graph structure underlying an observed network. We cast the problem as a probabilistic inference task, in which we must infer the nodes, edges, and node labels of a hidden graph, based on evidence. This entails solving several canonical problems in network analysis:
entity resolution
(determining when two observations correspond to the same entity),
link prediction
(inferring the existence of links), and
node labeling
(inferring hidden attributes). While each of these subproblems has been well studied in isolation, here we consider them as a single, collective task. We present a simple, yet novel, approach to address all three subproblems simultaneously. Our approach, which we refer to as
C
3
, consists of a collection of Coupled Collective Classifiers that are applied iteratively to propagate inferred information among the subproblems. We consider variants of
C
3
using different learning and inference techniques and empirically demonstrate that
C
3
is superior, both in terms of predictive accuracy and running time, to state-of-the-art probabilistic approaches on four real problems.
Off-Policy Estimation (OPE) methods allow us to learn and evaluate decision-making policies from logged data. This makes them an attractive choice for the offline evaluation of recommender systems, ...and several recent works have reported successful adoption of OPE methods to this end. An important assumption that makes this work is the absence of unobserved confounders: random variables that influence both actions and rewards at data collection time. Because the data collection policy is typically under the practitioner's control, the unconfoundedness assumption is often left implicit, and its violations are rarely dealt with in the existing literature. This work aims to highlight the problems that arise when performing off-policy estimation in the presence of unobserved confounders, specifically focusing on a recommendation use-case. We focus on policy-based estimators, where the logging propensities are learned from logged data. We characterise the statistical bias that arises due to confounding, and show how existing diagnostics are unable to uncover such cases. Because the bias depends directly on the true and unobserved logging propensities, it is non-identifiable. As the unconfoundedness assumption is famously untestable, this becomes especially problematic. This paper emphasises this common, yet often overlooked issue. Through synthetic data, we empirically show how na\"ive propensity estimation under confounding can lead to severely biased metric estimates that are allowed to fly under the radar. We aim to cultivate an awareness among researchers and practitioners of this important problem, and touch upon potential research directions towards mitigating its effects.
We study the generalization error of randomized learning algorithms -- focusing on stochastic gradient descent (SGD) -- using a novel combination of PAC-Bayes and algorithmic stability. Importantly, ...our generalization bounds hold for all posterior distributions on an algorithm's random hyperparameters, including distributions that depend on the training data. This inspires an adaptive sampling algorithm for SGD that optimizes the posterior at runtime. We analyze this algorithm in the context of our generalization bounds and evaluate it on a benchmark dataset. Our experiments demonstrate that adaptive sampling can reduce empirical risk faster than uniform sampling while also improving out-of-sample accuracy.
We present a Bayesian view of counterfactual risk minimization (CRM) for offline learning from logged bandit feedback. Using PAC-Bayesian analysis, we derive a new generalization bound for the ...truncated inverse propensity score estimator. We apply the bound to a class of Bayesian policies, which motivates a novel, potentially data-dependent, regularization technique for CRM. Experimental results indicate that this technique outperforms standard \(L_2\) regularization, and that it is competitive with variance regularization while being both simpler to implement and more computationally efficient.
We propose the first boosting algorithm for off-policy learning from logged bandit feedback. Unlike existing boosting methods for supervised learning, our algorithm directly optimizes an estimate of ...the policy's expected reward. We analyze this algorithm and prove that the excess empirical risk decreases (possibly exponentially fast) with each round of boosting, provided a ''weak'' learning condition is satisfied by the base learner. We further show how to reduce the base learner to supervised learning, which opens up a broad range of readily available base learners with practical benefits, such as decision trees. Experiments indicate that our algorithm inherits many desirable properties of tree-based boosting algorithms (e.g., robustness to feature scaling and hyperparameter tuning), and that it can outperform off-policy learning with deep neural networks as well as methods that simply regress on the observed rewards.
Practical Bandits: An Industry Perspective van den Akker, Bram; Jeunen, Olivier; Li, Ying ...
Proceedings of the 17th ACM International Conference on Web Search and Data Mining,
03/2024
Conference Proceeding
Open access
The bandit paradigm provides a unified modeling framework for problems that require decision-making under uncertainty. Because many business metrics can be viewed as rewards (a.k.a. utilities) that ...result from actions, bandit algorithms have seen a large and growing interest from industrial applications, such as search, recommendation and advertising. Indeed, with the bandit lens comes the promise of direct optimisation for the metrics we care about.
Nevertheless, the road to successfully applying bandits in production is not an easy one. Even when the action space and rewards are well-defined, practitioners still need to make decisions regarding multi-arm or contextual approaches, on- or off-policy setups, delayed or immediate feedback, myopic or long-term optimisation, etc. To make matters worse, industrial platforms typically give rise to large action spaces in which existing approaches tend to break down. The research literature on these topics is broad and vast, but this can overwhelm practitioners, whose primary aim is to solve practical problems, and therefore need to decide on a specific instantiation or approach for each project. This tutorial will take a step towards filling that gap between the theory and practice of bandits. Our goal is to present a unified overview of the field and its existing terminology, concepts and algorithms---with a focus on problems relevant to industry. We hope our industrial perspective will help future practitioners who wish to leverage the bandit paradigm for their application.
Sustainability at scale Tomkins, Sabina; Isley, Steven; London, Ben ...
Proceedings of the 12th ACM Conference on Recommender Systems,
09/2018
Conference Proceeding
Finding sustainable products and evaluating their claims is a significant barrier facing sustainability-minded customers. Tools that reduce both these burdens are likely to boost the sale of ...sustainable products. However, it is difficult to determine the sustainability characteristics of these products --- there are a variety of certifications and definitions of sustainability, and quality labeling requires input from domain experts. In this paper, we propose a flexible probabilistic framework that uses domain knowledge to identify sustainable products and customers, and uses these labels to predict customer purchases. We evaluate our approach on grocery items from the Amazon catalog. Our proposed approach outperforms established recommender system models in predicting future purchases while jointly inferring sustainability scores for customers and products.
"Clipping" (a.k.a. importance weight truncation) is a widely used variance-reduction technique for counterfactual off-policy estimators. Like other variance-reduction techniques, clipping reduces ...variance at the cost of increased bias. However, unlike other techniques, the bias introduced by clipping is always a downward bias (assuming non-negative rewards), yielding a lower bound on the true expected reward. In this work we propose a simple extension, called \(\textit{double clipping}\), which aims to compensate this downward bias and thus reduce the overall bias, while maintaining the variance reduction properties of the original estimator.