Location privacy-preserving mechanisms (LPPMs) have been extensively studied for protecting users' location privacy by releasing a perturbed location to third parties such as location-based service ...providers. However, when a user's perturbed locations are released continuously, existing LPPMs may not protect the sensitive information about the user's real-world activities, such as "visited hospital in the last week" or "regularly commuting between location A and location B every weekday" (it is easy to infer that location A and location B may be home and office), which we call it spatiotemporal event . In this paper, we first formally define spatiotemporal event as Boolean expressions between location and time predicates, and then we define <inline-formula><tex-math notation="LaTeX"> \epsilon</tex-math> <mml:math><mml:mi>ε</mml:mi></mml:math><inline-graphic xlink:href="cao-ieq1-2963312.gif"/> </inline-formula>- spatiotemporal event privacy by extending the notion of differential privacy. Second, to understand how much spatiotemporal event privacy that existing LPPMs can provide, we design computationally efficient algorithms to quantify the spatiotemporal event privacy leakage of state-of-the-art LPPMs. It turns out that the existing LPPMs may not adequately protect spatiotemporal event privacy. Third, we propose a framework, PriSTE, to transform an existing LPPM into one protecting spatiotemporal event privacy by calibrating the LPPM's privacy budgets. Our experiments on real-life and synthetic data verified that the proposed method is effective and efficient.
The Shapley value (SV) is a fair and principled metric for contribution evaluation in cross-silo federated learning (cross-silo FL), wherein organizations, i.e., clients, collaboratively train ...prediction models with the coordination of a parameter server. However, existing SV calculation methods for FL assume that the server can access the raw FL models and public test data. This may not be a valid assumption in practice considering the emerging privacy attacks on FL models and the fact that test data might be clients' private assets. Hence, we investigate the problem of
secure SV calculation
for cross-silo FL. We first propose
HESV
, a one-server solution based solely on homomorphic encryption (HE) for privacy protection, which has limitations in efficiency. To overcome these limitations, we propose
SecSV
, an efficient two-server protocol with the following novel features. First, SecSV utilizes a hybrid privacy protection scheme to avoid ciphertext-ciphertext multiplications between test data and models, which are extremely expensive under HE. Second, an efficient secure matrix multiplication method is proposed for SecSV. Third, SecSV strategically identifies and skips some test samples without significantly affecting the evaluation accuracy. Our experiments demonstrate that SecSV is 7.2--36.6× as fast as HESV, with a limited loss in the accuracy of calculated SVs.
In recent years, concerns about location privacy are increasing with the spread of location-based services (LBSs). Many methods to protect location privacy have been proposed in the past decades. ...Especially, perturbation methods based on Geo-Indistinguishability (GeoI), which randomly perturb a true location to a pseudolocation, are getting attention due to its strong privacy guarantee inherited from differential privacy. However, GeoI is based on the Euclidean plane even though many LBSs are based on road networks (e.g. ride-sharing services). This causes unnecessary noise and thus an insufficient tradeoff between utility and privacy for LBSs on road networks. To address this issue, we propose a new privacy notion, Geo-Graph-Indistinguishability (GeoGI), for locations on a road network to achieve a better tradeoff. We propose Graph-Exponential Mechanism (GEM), which satisfies GeoGI. Moreover, we formalize the optimization problem to find the optimal GEM in terms of the tradeoff. However, the computational complexity of a naive method to find the optimal solution is prohibitive, so we propose a greedy algorithm to find an approximate solution in an acceptable amount of time. Finally, our experiments show that our proposed mechanism outperforms GeoI mechanisms, including optimal GeoI mechanism, with respect to the tradeoff.
The significance of individuals' location information has been increasing recently, and the utilization of such data has become indispensable for businesses and society. The possible uses of location ...information include personalized services (maps, restaurant searches and weather forecast services) and business decisions (deciding where to open a store). However, considering that the data could be exploited, users should add random noise using their terminals before providing location data to collectors. In numerous instances, the level of privacy protection a user requires depends on their location. Therefore, in our framework, we assume that users can specify different privacy protection requirements for each location utilizing the adversarial error (AE), and the system computes a mechanism to satisfy these requirements. To guarantee some utility for data analysis, the maximum error in outputting the location should also be output. In most privacy frameworks, the mechanism for adding random noise is public; however, in this problem setting, the privacy protection requirements and the mechanism must be confidential because this information includes sensitive information. We propose two mechanisms to address privacy personalization. The first mechanism is the individual exponential mechanism, which uses the exponential mechanism in the differential privacy framework. However, in the individual exponential mechanism, the maximum error for each output can be used to narrow down candidates of the actual location by observing outputs from the same location multiple times. The second mechanism improves on this deficiency and is called the donut mechanism, which uniformly outputs a random location near the location where the distance from the user's actual location is at the user-specified AE distance. Considering the potential attacks against the idea of donut mechanism that utilize the maximum error, we extended the mechanism to counter these attacks. We compare these two mechanisms by experiments using maps constructed from artificial and real world data.
Peer assessment in education has pedagogical benefits and is a promising method for grading a large number of submissions. At the same time, student reliability has been regarded as a problem; ...consequently, various methods of estimating highly reliable grades from scores given by multiple students have been proposed. Under most of the existing methods, a nonadaptive allocation pattern, which performs allocation in advance, is assumed. In this study, we analyze the effect of student-submission allocation on score estimation in peer assessment under a nonadaptive allocation setting. We examine three types of nonadaptive allocation methods, random allocation, circular allocation and group allocation, which are considered the commonly used approaches among the existing nonadaptive peer assessment methods. Through simulation experiments, we show that circular allocation and group allocation tend to yield lower accuracy than random allocation. Then, we utilize this result to improve the existing adaptive allocation method, which performs allocation and assessment in parallel and tends to make similar allocation result to circular allocation. We propose the method to replace part of the allocation with random allocation, and show that the method is effective through experiments.
How can we release a massive volume of sensitive data while mitigating privacy risks? Privacy-preserving data synthesis enables the data holder to outsource analytical tasks to an untrusted third ...party. The state-of-the-art approach for this problem is to build a generative model under differential privacy, which offers a rigorous privacy guarantee. However, the existing method cannot adequately handle high dimensional data. In particular, when the input dataset contains a large number of features, the existing techniques require injecting a prohibitive amount of noise to satisfy differential privacy, which results in the outsourced data analysis meaningless. To address the above issue, this paper proposes privacy-preserving phased generative model (P3GM), which is a differentially private generative model for releasing such sensitive data. P3GM employs the two-phase learning process to make it robust against the noise, and to increase learning efficiency (e.g., easy to converge). We give theoretical analyses about the learning complexity and privacy loss in P3GM. We further experimentally evaluate our proposed method and demonstrate that P3GM significantly outperforms existing solutions. Compared with the state-of-the-art methods, our generated samples look fewer noises and closer to the original data in terms of data diversity. Besides, in several data mining tasks with synthesized data, our model outperforms the competitors in terms of accuracy.
To classify and search various kinds of scientific data, it is useful to annotate those data with keywords from a controlled vocabulary. Data providers, such as researchers, annotate their own data ...with keywords from the provided vocabulary. However, for the selection of suitable keywords, extensive knowledge of both the research domain and the controlled vocabulary is required. Therefore, the annotation of scientific data with keywords from a controlled vocabulary is a time-consuming task for data providers. In this paper, we discuss methods for recommending relevant keywords from a controlled vocabulary for the annotation of scientific data through their metadata. Many previous studies have proposed approaches based on keywords in similar existing metadata; we call this the
indirect method
. However, when the quality of the existing metadata set is insufficient, the indirect method tends to be ineffective. Because the controlled vocabularies for scientific data usually provide definition sentences for each keyword, it is also possible to recommend keywords based on the target metadata and the keyword definitions; we call this the
direct method
. The direct method does not utilize the existing metadata set and therefore is independent of its quality. Also, for the evaluation of keyword recommendation methods, we propose evaluation metrics based on a hierarchical vocabulary structure, which is a distinctive feature of most controlled vocabularies. Using our proposed evaluation metrics, we can evaluate keyword recommendation methods with an emphasis on keywords that are more difficult for data providers to select. In experiments using real earth science datasets, we compare the direct and indirect methods to verify their effectiveness, and observe how the indirect method depends on the quality of the existing metadata set. The results show the importance of metadata quality in recommending keywords.
Recent emerging mobile and wearable technologies make it easy to collect personal spatiotemporal data such as activity trajectories in daily life. Releasing real-time statistics over trajectory ...streams produced by crowds of people is expected to be valuable for both academia and business, answering questions such as "How many people are in Central Station now?" However, analyzing these raw data will entail risks of compromising individual privacy. ϵ-Differential Privacy has emerged as a de facto standard for private statistics publishing because of its guarantee of being rigorous and mathematically provable. Since user trajectories will be generated infinitely, it is difficult to protect every trajectory under ϵ-differential privacy. To this end, we propose a flexible privacy model of ℓ-trajectory privacy to ensure every length of ℓ trajectories under protection of ϵ-differential privacy. Then we hierarchically design algorithms to satisfy ℓ-trajectory privacy. Experiments using four real-life datasets show that our proposed algorithms are effective and efficient.
Recent emerging mobile and wearable technologies make it easy to collect personal spatiotemporal data such as activity trajectories in daily life. Publishing real-time statistics over trajectory ...streams produced by crowds of people is expected to be valuable for both academia and business, answering questions such as “How many people are in Kyoto Station now?” However, analyzing these raw data will entail risks of compromising individual privacy. ε-Differential Privacy has emerged as a well-known standard for private statistics publishing because of its guarantee of being rigorous and mathematically provable. However, since user trajectories will be generated infinitely, it is difficult to protect every trajectory under ε-differential privacy. On the other hand, in real life, not all users require the same level of privacy. To this end, we propose a flexible privacy model of l-trajectory privacy to ensure every desired length of trajectory under protection of ε-differential privacy. We also design an algorithmic framework to publish l-trajectory private data in real time. Experiments using four real-life datasets show that our proposed algorithms are effective and efficient.