Differential privacy is at a turning point. Implementations have been successfully leveraged in private industry, the public sector, and academia in a wide variety of applications, allowing ...scientists, engineers, and researchers the ability to learn about populations of interest without specifically learning about these individuals. Because differential privacy allows us to quantify cumulative privacy loss, these differentially private systems will, for the first time, allow us to measure and compare the total privacy loss due to these personal data-intensive activities. Appropriately leveraged, this could be a watershed moment for privacy.
Like other technologies and techniques that allow for a range of instantiations, implementation details matter. When meaningfully implemented, differential privacy supports deep data-driven insights with minimal worst-case privacy loss. When not meaningfully implemented, differential privacy delivers privacy mostly in name. Using differential privacy to maximize learning while providing a meaningful degree of privacy requires judicious choices with respect to the privacy parameter epsilon, among other factors. However, there is little understanding of what is the optimal value of epsilon for a given system or classes of systems/purposes/data etc. or how to go about figuring it out.
To understand current differential privacy implementations and how organizations make these key choices in practice, we conducted interviews with practitioners to learn from their experiences of implementing differential privacy. We found no clear consensus on how to choose epsilon, nor is there agreement on how to approach this and other key implementation decisions. Given the importance of these implementation details there is a need for shared learning amongst the differential privacy community. To serve these purposes, we propose the creation of the Epsilon Registry—a publicly available communal body of knowledge about differential privacy implementations that can be used by various stakeholders to drive the identification and adoption of judicious differentially private implementations.
We formalize a notion of a privacy wrapper, defined as an algorithm that can take an arbitrary and untrusted script and produce an output with differential privacy guarantees. Our novel privacy ...wrapper, named TAHOE, incorporates two design ideas: a type of stability under subsetting, and randomization over subset size. We show that TAHOE imposes differential privacy for every possible script. When the data alphabet is finite and small enough, TAHOE can be practically run on a single computer. Performance simulations show that TAHOE has greater accuracy than a benchmark algorithm based on a subsample-and-aggregate approach for certain scenarios and parameter values.
This Thing Called Fairness Mulligan, Deirdre K.; Kroll, Joshua A.; Kohli, Nitin ...
Proceedings of the ACM on human-computer interaction,
11/2019, Volume:
3, Issue:
CSCW
Journal Article
Peer reviewed
The explosion in the use of software in important sociotechnical systems has renewed focus on the study of the way technical constructs reflect policies, norms, and human values. This effort requires ...the engagement of scholars and practitioners from many disciplines. And yet, these disciplines often conceptualize the operative values very differently while referring to them using the same vocabulary. The resulting conflation of ideas confuses discussions about values in technology at disciplinary boundaries. In the service of improving this situation, this paper examines the value of shared vocabularies, analytics, and other tools that facilitate conversations about values in light of these disciplinary specific conceptualizations, the role such tools play in furthering research and practice, outlines different conceptions of "fairness" deployed in discussions about computer systems, and provides an analytic tool for interdisciplinary discussions and collaborations around the concept of fairness. We use a case study of risk assessments in criminal justice applications to both motivate our effort--describing how conflation of different concepts under the banner of "fairness" led to unproductive confusion--and illustrate the value of the fairness analytic by demonstrating how the rigorous analysis it enables can assist in identifying key areas of theoretical, political, and practical misunderstanding or disagreement, and where desired support alignment or collaboration in the absence of consensus.
We use decision theory to compare variants of differential privacy from the perspective of prospective study participants. We posit the existence of a preference ordering on the set of potential ...consequences that study participants can incur, which enables the analysis of individual utility functions. Drawing upon the theory of measurement, we argue that changes in expected utilities should be measured via the classic Euclidean metric. We then consider the question of which privacy guarantees would be more appealing for individuals under different decision settings. Through our analysis, we found that the nature of the potential participant's utility function, along with the specific values of \(\epsilon\) and \(\delta\), can greatly alter which privacy guarantees are preferable.
The data revolution in low- and middle-income countries is quickly
transforming how companies approach emerging markets. As mobile phones and
mobile money proliferate, they generate new streams of ...data that enable
innovation in consumer finance, credit, and insurance. Already, this new
generation of products are being used by hundreds of millions of consumers,
often to use financial services for the first time. However, the collection,
analysis, and use of these data, particularly from economically disadvantaged
populations, raises serious privacy concerns. This white paper describes a
research agenda to advance our understanding of the problem and solution space
of data privacy in emerging market fintech and financial services. We highlight
five priority areas for research: conducting comprehensive landscape analyses;
understanding local definitions of ``data privacy''; documenting key sources of
risk, and potential technical solutions (such as differential privacy and
homomorphic encryption); improving non-technical approaches to data privacy
(such as policies and practices); and understanding the tradeoffs involved in
deploying privacy-enhancing solutions. Taken together, we hope this research
agenda will focus attention on the multi-faceted nature of privacy in emerging
markets, and catalyze efforts to develop responsible and consumer-oriented
approaches to data-intensive applications.
The behavior of a differentially private system is governed by a parameter epsilon which sets a balance between protecting the privacy of individuals and returning accurate results. While a system ...owner may use a number of heuristics to select epsilon, existing techniques may be unresponsive to the needs of the users who's data is at risk. A promising alternative is to allow users to express their preferences for epsilon. In a system we call epsilon voting, users report the parameter values they want to a chooser mechanism, which aggregates them into a single value. We apply techniques from mechanism design to ask whether such a chooser mechanism can itself be truthful, private, anonymous, and also responsive to users. Without imposing restrictions on user preferences, the only feasible mechanisms belong to a class we call randomized dictatorships with phantoms. This is a restrictive class in which at most one user has any effect on the chosen epsilon. On the other hand, when users exhibit single-peaked preferences, a broader class of mechanisms - ones that generalize the median and other order statistics - becomes possible.
Personal mobility data from mobile phones and other sensors are increasingly used to inform policymaking during pandemics, natural disasters, and other humanitarian crises. However, even aggregated ...mobility traces can reveal private information about individual movements to potentially malicious actors. This paper develops and tests an approach for releasing private mobility data, which provides formal guarantees over the privacy of the underlying subjects. Specifically, we (1) introduce an algorithm for constructing differentially private mobility matrices, and derive privacy and accuracy bounds on this algorithm; (2) use real-world data from mobile phone operators in Afghanistan and Rwanda to show how this algorithm can enable the use of private mobility data in two high-stakes policy decisions: pandemic response and the distribution of humanitarian aid; and (3) discuss practical decisions that need to be made when implementing this approach, such as how to optimally balance privacy and accuracy. Taken together, these results can help enable the responsible use of private mobility data in humanitarian response.
Machine learning (ML) is increasingly deployed in real world contexts, supplying actionable insights and forming the basis of automated decision-making systems. While issues resulting from biases ...pre-existing in training data have been at the center of the fairness debate, these systems are also affected by technical and emergent biases, which often arise as context-specific artifacts of implementation. This position paper interprets technical bias as an epistemological problem and emergent bias as a dynamical feedback phenomenon. In order to stimulate debate on how to change machine learning practice to effectively address these issues, we explore this broader view on bias, stress the need to reflect on epistemology, and point to value-sensitive design methodologies to revisit the design and implementation process of automated decision-making systems.
The explosion in the use of software in important sociotechnical systems has renewed focus on the study of the way technical constructs reflect policies, norms, and human values. This effort requires ...the engagement of scholars and practitioners from many disciplines. And yet, these disciplines often conceptualize the operative values very differently while referring to them using the same vocabulary. The resulting conflation of ideas confuses discussions about values in technology at disciplinary boundaries. In the service of improving this situation, this paper examines the value of shared vocabularies, analytics, and other tools that facilitate conversations about values in light of these disciplinary specific conceptualizations, the role such tools play in furthering research and practice, outlines different conceptions of "fairness" deployed in discussions about computer systems, and provides an analytic tool for interdisciplinary discussions and collaborations around the concept of fairness. We use a case study of risk assessments in criminal justice applications to both motivate our effort--describing how conflation of different concepts under the banner of "fairness" led to unproductive confusion--and illustrate the value of the fairness analytic by demonstrating how the rigorous analysis it enables can assist in identifying key areas of theoretical, political, and practical misunderstanding or disagreement, and where desired support alignment or collaboration in the absence of consensus.