Federated learning (FL) enables data owners to train a joint global model without sharing private data. However, it is vulnerable to Byzantine attackers that can launch poisoning attacks to destroy ...model training. Existing defense strategies rely on the additional datasets to train trustable server models or trusted execution environments to mitigate attacks. Besides, these strategies can only tolerate a small number of malicious users or resist a few types of poisoning attacks. To address these challenges, we design a novel federated learning method TDFL , T ruth D iscovery based F ederated L earning, which can defend against multiple poisoning attacks without additional datasets even when the Byzantine users are <inline-formula><tex-math notation="LaTeX">\geq 50\%</tex-math></inline-formula>. Specifically, the TDFL considers different scenarios with different malicious proportions. For Honest-majority setting (Byzantine <inline-formula><tex-math notation="LaTeX">< 50\%</tex-math></inline-formula>), we design a special robust truth discovery aggregation scheme to remove malicious model updates, which can assign weights according to users' contribution; for Byzantine-majority setting (Byzantine <inline-formula><tex-math notation="LaTeX">\geq 50\%</tex-math></inline-formula>), we use maximum clique-based filter to guarantee global model quality. To the best of our knowledge, this is the first study that uses truth discovery to defend against poisoning attacks. It is also the first scheme which can achieve strong robustness under multiple kinds of attacks launched by high proportion attackers without root datasets. Extensive comparative experiments are designed with five state-of-the-art aggregation rules under five types of classical poisoning attacks on different datasets. The experimental results demonstrate that TDFL is practical and achieves reasonable Byzantine-robustness.
Nowadays, an increasing number of applications exploit users who act as intelligent sensors and can quickly provide high-level information. These users generate valuable data that, if mishandled, ...could potentially reveal sensitive information. Protecting user privacy is thus of paramount importance for crowdsensing systems. In this paper, we propose BLIND, an innovative open-source truth discovery system designed to improve the quality of information (QoI) through the use of privacy-preserving computation techniques in mobile crowdsensing scenarios. The uniqueness of BLIND lies in its ability to preserve user privacy by ensuring that none of the parties involved are able to identify the source of the information provided. The system uses homomorphic encryption to implement a novel privacy-preserving version of the well-known K-Means clustering algorithm, which directly groups encrypted user data. Outliers are then removed privately without revealing any useful information to the parties involved. We extensively evaluate the proposed system for both server-side and client-side scalability, as well as truth discovery accuracy, using a real-world dataset and a synthetic one, to test the system under challenging conditions. Comparisons with four state-of-the-art approaches show that BLIND optimizes QoI by effectively mitigating the impact of four different security attacks, with higher accuracy and lower communication overhead than its competitors. With the optimizations proposed in this paper, BLIND is up to three times faster than the baseline system, and the obtained Root Mean Squared Error (RMSE) values are up to 42% lower than other state-of-the-art approaches.
Crowdsourcing has been proven to be a useful tool for the tasks which are hard for computers. Unfortunately, workers with uneven expertise are likely to provide low-quality or even deliberately wrong ...data. A reliability model that precisely describes workers' performance on the tasks can benefit the development of both task assignment mechanism and truth discovery method. However, existing methods cannot model workers' fine-grained reliability levels accurately. In this paper, we consider dividing tasks into clusters (i.e., topics) based on workers' behaviors and propose a novel latent topic model to describe the topic structure and workers' topical-level expertise. Then, we develop two online task assignment mechanisms that dynamically assign each incoming worker a set of tasks where he can achieve the Maximum Expected Gain (MEG) or Maximum Expected and Potential Gain (MEPG). The experimental results demonstrate that our methods can significantly decrease the number of task assignments and achieve higher accuracy and macro-averaging F1-score than the state-of-the-art approaches.
Truth discovery in mobile crowdsensing has recently received wide attention. It refers to the procedure for estimating the unknown user reliability from collected sensory data and inferring truthful ...information via reliability-aware data aggregation. Though widely studied in the plaintext domain, truth discovery remains largely under-explored in privacy-aware mobile crowdsensing. Existing works either do not consider user reliability issue or fall short of achieving practical cost efficiency, due to iterative transmission and computation over large ciphertexts from homomorphic cryptosystem. In this paper, we propose two new privacy-aware crowdsensing designs with truth discovery that significantly improve the bandwidth and computation performance on individual users. Our insight is to identify the core atomic operation in the iterative truth discovery procedure, and carefully craft security designs accordingly to enable efficient truth discovery in the ciphertext domain. Our first design is highly customized for the single-server setting, while our second design under the two-server model further shifts most of user workloads to the cloud server side. Both our designs protect individual sensory data and reliability degrees throughout the truth discovery procedure. Experiments show that compared with the prior result, our designs gain at least 30x and 10x savings on user communication and computation, respectively.
•This paper is to find the truth Z of events in social sensing with information flows.•A truth discovery model is presented which considers the individuals’ reliability R.•The model also considers ...the dependency D between individuals’ observations.•An iterative expectation maximization is presented to infer both Z, R and D.•Extensive experiments demonstrate the effectiveness of the presented algorithm.
Social sensing relies on a large number of observations reported by different, possibly unreliable, agents to determine if an event has occurred or not. In this paper, we consider the truth discovery problem in social sensing, in which an agent may receive another agent’s observation (known as an information flow), and may change its observation to match the observation it receives. If an agent’s observation is influenced by another agent, we say that the former is a dependent agent. We propose an Iterative Expectation Maximization algorithm for Truth Discovery (IEMTD) in social sensing with dependent agents. Compared with other popular truth discovery approaches, which assume either the agents’ observations are independent, or their dependency is known a priori, IEMTD allows to infer each agent’s reliability, the observations’ dependency and the events’ truth jointly. Simulation results on synthetic data and three real world data sets demonstrate that in almost all our experiments, IEMTD achieves a higher truth discovery accuracy than the existing algorithms when dependencies exist between agents’ observations.
Vehicle-based mobile crowdsensing has gained widespread attention due to its low cost and efficient data collection mode. One common method to improve the accuracy of sensing data in this context is ...truth discovery. However, the emergence of privacy leakage and data misuse has reduced users' motivation to participate in sensing tasks. Meanwhile, existing solutions for privacy-preserving truth discovery generally suffer from low computational efficiency and frequent interactions between users and servers. Hence, this paper proposes a novel privacy-preserving truth discovery scheme based on secure multi-party computation. For the purpose of high efficiency and strong privacy protection, we utilize the Secret Sharing method to securely decompose data and construct a Secure Multi-party Computation protocol to compute the ground truth. In addition, the weight value generated by truth discovery is employed as a quantitative data quality indicator that dynamically adjusts the user's rewards and constructs a data quality-driven incentive mechanism. Finally, we demonstrate the high performance of our method through a detailed analysis, showing its effectiveness even in scenarios with numerous users.
Mobile crowdsensing has emerged as a popular platform to solve many challenging problems by utilizing users' wisdom and resources. Due to user diversity, the data provided by different individuals ...may vary significantly, and thus it is important to analyze data quality during data aggregation. Truth discovery is effective in capturing data quality and obtaining accurate mobile crowdsensing results. Existing works on truth discovery either cannot protect both task privacy and data privacy, or introduce tremendous computational costs. In this paper, we propose an efficient and strong privacy-preserving truth discovery scheme, named EPTD, to protect users' task privacy and data privacy simultaneously in the truth discovery procedure. In EPTD, we first exploit the randomizable matrix to express users' tasks and sensory data. Then, based on the matrix computation properties, we design key derivation and (re-)encryption mechanisms to enable truth discovery to be performed in an efficient and privacy-preserving manner. Through a detailed security analysis, we demonstrate that data privacy and task privacy are well preserved. Extensive experiments based on real-world and simulated mobile crowdsensing applications show EPTD has practical efficiency in terms of computational cost and communication overhead.
In urban fields, Mobile Wireless Sensor Networks (MWSNs) become ubiquitous. Accurate GPS positioning for sensors is a fundamental problem for MWSNs. To solve this problem, this paper proposes a ...Crowdsourcing-Aided Positioning scheme, which takes an ideal situation and a more realistic situation into account. In the ideal situation, all participants are considered accurate. Then, two optimization objectives are addressed for the efficient Crowdsourcing-Aided Positioning task. Their utility functions are proven to be submodular and a greedy algorithm is given to solve them. In the more realistic situation, randomly selected participants cannot guarantee the accuracy of the data. We propose a data-accuracy-calibration-based participant selection framework to solve this dilemma. Through data accuracy calibration, participants gain their data accuracy and reliability with the help of wireless sensor networks. First, we design three kinds of data accuracy calibration methods based on probabilistic models. Then, we propose a Truthful-Data-Driven Participant Selection problem, which tends to raise the data accuracy and reliability. The optimization problem is proved to be NP-hard and its optimization function has submodular property. We give a greedy algorithm with <inline-formula> <tex-math notation="LaTeX">1-\frac {1}{e} </tex-math></inline-formula> approximation ratio to solve this problem. Simulation experiments are conducted to validate the algorithmic effectiveness at last.
The problem of estimating event truths from conflicting agent opinions in a social network is investigated. An autoencoder learns the complex relationships between event truths, agent reliabilities ...and agent observations. A Bayesian network model is proposed to guide the learning process by modeling the relationship of the autoencoder's outputs with different variables. At the same time, it also models the social relationships between agents in the network. The proposed approach is unsupervised and is applicable when ground truth labels of events are unavailable. A variational inference method is used to jointly estimate the hidden variables in the Bayesian network and the parameters in the autoencoder. Experiments on three real datasets demonstrate that our proposed approach is competitive with, and in most cases better than, several state-of-the-art benchmark methods.
Two goals of network science are to (i) uncover fundamental properties of phenomena modeled as networks, and to (ii) explore novel use of networks as models for a diverse range of systems and ...phenomena in order to improve our understanding of such systems and phenomena. This paper advances the latter direction by casting credibility estimation in social sensing applications as a network science problem, and by presenting a network model that helps understand the fundamental accuracy trade-offs of a credibility estimator. Social sensing refers to data collection scenarios, where observations are collected from (possibly unvetted) human sources. We call such observations claims to emphasize that we do not know whether or not they are factually correct. Predictable, scalable and robust estimation of both source reliability and claim correctness, given neither in advance, becomes a key challenge given the unvetted nature of sources and lack of means to verify their claims. In a previous conference publication, we proposed a maximum likelihood approach to jointly estimate both source reliability and claim correctness. We also derived confidence bounds to quantify the accuracy of such estimation. In this paper, we cast credibility estimation as a network science problem and offer systematic sensitivity analysis of the optimal estimator to understand its fundamental accuracy trade-offs as a function of an underlying network topology that describes key problem space parameters. It enables assured social sensing, where not only source reliability and claim correctness are estimated, but also the accuracy of such estimates is correctly predicted for the problem at hand.