Spurious features interfere with the goal of obtaining robust models that perform well across many groups within the population. A natural remedy is to remove such features from the model. However, ...in this work, we show that removing spurious features can surprisingly decrease accuracy due to the inductive biases of overparameterized models. In noiseless overparameterized linear regression, we completely characterize how the removal of spurious features affects accuracy across different groups (more generally, test distributions). In addition, we show that removal of spurious features can decrease the accuracy even on balanced datasets (where each target co-occurs equally with each spurious feature); and it can inadvertently make the model more susceptible to other spurious features. Finally, we show that robust self-training produces models that no longer depend on spurious features without affecting their overall accuracy. The empirical results on the Toxic-Comment-Detection and CelebA datasets show that our results hold in non-linear models.
Machine learning models influence people’s lives profoundly. In spite of their great performance, it has been observed that their predictions are often discriminatory against protected groups (e.g., ...women). One way to analyze the discrimination in machine learning models is to measure the difference in model performance across groups or individuals known as loss discrepancy. There are two flavors of loss discrepancies: statistical and counterfactual loss discrepancy. Statistical loss discrepancy measures how much protected groups are impacted differently. Counterfactual loss discrepancy measures how much similar individuals are treated differently because of their group membership. Recent studies usually attribute this loss discrepancy to an information deficiency for one group (e.g., one group has less data). In this thesis, I show that: 1) Even when there is no information deficiency specific to one group (e.g., both groups have infinite data), adding the same amount of feature noise to all individuals leads to loss discrepancy; 2) Even when features are noiseless and perfectly determine the prediction target, the inductive bias in the overparameterized regime leads to reliance on spurious features and loss discrepancy. Understanding the source of loss discrepancies helps to come up with mitigation methods. I explain two methods to mitigate loss discrepancy. First, I explain how to leverage unlabeled data to reduce counterfactual loss discrepancy without affecting accuracy. Next, I propose another mitigation method, the unanimity principle: only predict when all models consistent with the training data predict the same output. I operationalize this principle for semantic parsing, the task of mapping utterances to logical forms. I develop a simple, efficient method that reasons over the infinite set of consistent models by only checking two models. I prove that this method obtains 100% precision, thus no loss discrepancy. Finally, I investigate the methods to measure of loss discrepancy when there is no information about protected attributes, or there are exponentially many protected groups. I introduce and study a notion which I call maximum weighted loss discrepancy (MWLD), the maximum (weighted) difference between the loss of a group and the loss of the population. I show that it is statistically impossible to estimate MWLD when all groups have equal weights, but for a particular family of weighting functions, I show how to estimate MWLD efficiently. Finally, I draw a relation between MWLD and loss variance, a quantity that arises in generalization bounds.
We study sequential language games in which two players, each with private information, communicate to achieve a common goal. In such games, a successful player must (i) infer the partner’s private ...information from the partner’s messages, (ii) generate messages that are most likely to help with the goal, and (iii) reason pragmatically about the partner’s strategy. We propose a model that captures all three characteristics and demonstrate their importance in capturing human behavior on a new goal-oriented dataset we collected using crowdsourcing.
Despite substantial advancements, Natural Language Processing (NLP) models often require post-training adjustments to enforce business rules, rectify undesired behavior, and align with user values. ...These adjustments involve operationalizing "concepts"--dictating desired model responses to certain inputs. However, it's difficult for a single entity to enumerate and define all possible concepts, indicating a need for a multi-user, collaborative model alignment framework. Moreover, the exhaustive delineation of a concept is challenging, and an improper approach can create shortcuts or interfere with original data or other concepts. To address these challenges, we introduce CoDev, a framework that enables multi-user interaction with the model, thereby mitigating individual limitations. CoDev aids users in operationalizing their concepts using Large Language Models, and relying on the principle that NLP models exhibit simpler behaviors in local regions. Our main insight is learning a \emph{local} model for each concept, and a \emph{global} model to integrate the original data with all concepts. We then steer a large language model to generate instances within concept boundaries where local and global disagree. Our experiments show CoDev is effective at helping multiple users operationalize concepts and avoid interference for a variety of scenarios, tasks, and models.
The presence of spurious features interferes with the goal of obtaining robust models that perform well across many groups within the population. A natural remedy is to remove spurious features from ...the model. However, in this work we show that removal of spurious features can decrease accuracy due to the inductive biases of overparameterized models. We completely characterize how the removal of spurious features affects accuracy across different groups (more generally, test distributions) in noiseless overparameterized linear regression. In addition, we show that removal of spurious feature can decrease the accuracy even in balanced datasets -- each target co-occurs equally with each spurious feature; and it can inadvertently make the model more susceptible to other spurious features. Finally, we show that robust self-training can remove spurious features without affecting the overall accuracy. Experiments on the Toxic-Comment-Detectoin and CelebA datasets show that our results hold in non-linear models.
The performance of standard learning procedures has been observed to differ widely across groups. Recent studies usually attribute this loss discrepancy to an information deficiency for one group ...(e.g., one group has less data). In this work, we point to a more subtle source of loss discrepancy---feature noise. Our main result is that even when there is no information deficiency specific to one group (e.g., both groups have infinite data), adding the same amount of feature noise to all individuals leads to loss discrepancy. For linear regression, we thoroughly characterize the effect of feature noise on loss discrepancy in terms of the amount of noise, the difference between moments of the two groups, and whether group information is used or not. We then show this loss discrepancy does not vanish immediately if a shift in distribution causes the groups to have similar moments. On three real-world datasets, we show feature noise increases the loss discrepancy if groups have different distributions, while it does not affect the loss discrepancy on datasets where groups have similar distributions.
Even when aggregate accuracy is high, state-of-the-art NLP models often fail systematically on specific subgroups of data, resulting in unfair outcomes and eroding user trust. Additional data ...collection may not help in addressing these weaknesses, as such challenging subgroups may be unknown to users, and underrepresented in the existing and new data. We propose Targeted Data Generation (TDG), a framework that automatically identifies challenging subgroups, and generates new data for those subgroups using large language models (LLMs) with a human in the loop. TDG estimates the expected benefit and potential harm of data augmentation for each subgroup, and selects the ones most likely to improve within group performance without hurting overall performance. In our experiments, TDG significantly improves the accuracy on challenging subgroups for state-of-the-art sentiment analysis and natural language inference models, while also improving overall test accuracy.
Prompt engineering is a challenging yet crucial task for optimizing the performance of large language models on customized tasks. It requires complex reasoning to examine the model's errors, ...hypothesize what is missing or misleading in the current prompt, and communicate the task with clarity. While recent works indicate that large language models can be meta-prompted to perform automatic prompt engineering, we argue that their potential is limited due to insufficient guidance for complex reasoning in the meta-prompt. We fill this gap by infusing into the meta-prompt three key components: detailed descriptions, context specification, and a step-by-step reasoning template. The resulting method, named PE2, exhibits remarkable versatility across diverse language tasks. It finds prompts that outperform "let's think step by step" by 6.3% on MultiArith and 3.1% on GSM8K, and outperforms competitive baselines on counterfactual tasks by 6.9%. Further, we show that PE2 can make targeted and highly specific prompt edits, rectify erroneous prompts, and induce multi-step plans for complex tasks.
Though machine learning algorithms excel at minimizing the average loss over a population, this might lead to large discrepancies between the losses across groups within the population. To capture ...this inequality, we introduce and study a notion we call maximum weighted loss discrepancy (MWLD), the maximum (weighted) difference between the loss of a group and the loss of the population. We relate MWLD to group fairness notions and robustness to demographic shifts. We then show MWLD satisfies the following three properties: 1) It is statistically impossible to estimate MWLD when all groups have equal weights. 2) For a particular family of weighting functions, we can estimate MWLD efficiently. 3) MWLD is related to loss variance, a quantity that arises in generalization bounds. We estimate MWLD with different weighting functions on four common datasets from the fairness literature. We finally show that loss variance regularization can halve the loss variance of a classifier and hence reduce MWLD without suffering a significant drop in accuracy.