The recent proliferation of human-carried mobile devices has given rise to mobile crowd sensing (MCS) systems that outsource sensory data collection to the public crowd. In order to identify truthful ...values from (crowd) workers' noisy or even conflicting sensory data, truth discovery algorithms, which jointly estimate workers' data quality and the underlying truths through quality-aware data aggregation, have drawn significant attention. However, the power of these algorithms could not be fully unleashed in MCS systems, unless workers' strategic reduction of their sensing effort is properly tackled. To address this issue, in this paper, we propose a payment mechanism, named Theseus, that deals with workers' such strategic behavior, and incentivizes high-effort sensing from workers. We ensure that, at the Bayesian Nash Equilibrium of the non-cooperative game induced by Theseus, all participating workers will spend their maximum possible effort on sensing, which improves their data quality. As a result, the aggregated results calculated subsequently by truth discovery algorithms based on workers' data will be highly accurate. Additionally, Theseus bears other desirable properties, including individual rationality and budget feasibility. We validate the desirable properties of Theseus through theoretical analysis, as well as extensive simulations.
This article proposes unsupervised truth-finding algorithms that combine consideration of multi-modal content features with analysis of propagation patterns to evaluate the veracity of observations ...in social sensing applications. A key social sensing challenge is to develop effective algorithms for estimating both the reliability of sources and the veracity of their observations without prior knowledge. In contrast to prior solutions that use labeled examples to learn content features that are correlated with veracity, our approach is entirely unsupervised. Hence, given no prior training data, we jointly learn the importance of different content features together with the veracity of observations using propagation patterns as an indicator of perceived content reliability. A novel penalized expectation maximization (PEM) algorithm is proposed to improve the quality of estimation results for observations bolstered by multiple features. In addition, we develop a constrained expectation maximum likelihood with multiple features (CEM-MultiF) that introduces a novel constraint to boost the probability of correctness of some claims. Finally, we evaluate the performance of the proposed algorithms, called EM-Multi, CEM-Multi and PEM-MultiF, respectively, on real-world data sets collected from Twitter. The evaluation results demonstrate that the proposed algorithms outperform the existing fact-finding approaches, and offer tunable knobs for controlling robustness/performance trade-offs in the presence of malicious sources.
Truth Discovery in Crowdsourced Detection of Spatial Events Ouyang, Robin Wentao; Srivastava, Mani; Toniolo, Alice ...
IEEE transactions on knowledge and data engineering,
2016-April-1, 2016-4-1, 20160401, Volume:
28, Issue:
4
Journal Article
Peer reviewed
Open access
The ubiquity of smartphones has led to the emergence of mobile crowdsourcing tasks such as the detection of spatial events when smartphone users move around in their daily lives. However, the ...credibility of those detected events can be negatively impacted by unreliable participants with low-quality data. Consequently, a major challenge in mobile crowdsourcing is truth discovery, i.e., to discover true events from diverse and noisy participants' reports. This problem is uniquely distinct from its online counterpart in that it involves uncertainties in both participants' mobility and reliability. Decoupling these two types of uncertainties through location tracking will raise severe privacy and energy issues, whereas simply ignoring missing reports or treating them as negative reports will significantly degrade the accuracy of truth discovery. In this paper, we propose two new unsupervised models, i.e., Truth finder for Spatial Events (TSE) and Personalized Truth finder for Spatial Events (PTSE), to tackle this problem. In TSE, we model location popularity, location visit indicators, truths of events, and three-way participant reliability in a unified framework. In PTSE, we further model personal location visit tendencies. These proposed models are capable of effectively handling various types of uncertainties and automatically discovering truths without any supervision or location tracking. Experimental results on both real-world and synthetic datasets demonstrate that our proposed models outperform existing state-of-the-art truth discovery approaches in the mobile crowdsourcing environment.
In many applications, one can obtain descriptions about the same objects or events from a variety of sources. As a result, this will inevitably lead to data or information conflicts. One important ...problem is to identify the true information (i.e., the truths) among conflicting sources of data. It is intuitive to trust reliable sources more when deriving the truths, but it is usually unknown which one is more reliable a priori. Moreover, each source possesses a variety of properties with different data types. An accurate estimation of source reliability has to be made by modeling multiple properties in a unified model. Existing conflict resolution work either does not conduct source reliability estimation, or models multiple properties separately. In this paper, we propose to resolve conflicts among multiple sources of heterogeneous data types. We model the problem using an optimization framework where truths and source reliability are defined as two sets of unknown variables. The objective is to minimize the overall weighted deviation between the truths and the multi-source observations where each source is weighted by its reliability. Different loss functions can be incorporated into this framework to recognize the characteristics of various data types, and efficient computation approaches are developed. The proposed framework is further adapted to deal with streaming data in an incremental fashion and large-scale data in MapReduce model. Experiments on real-world weather, stock, and flight data as well as simulated multi-source data demonstrate the advantage of jointly modeling different data types in the proposed framework.
Truth discovery is a reliable and effective technique to resolve conflicts of heterogeneous data and estimate user reliability in mobile crowdsensing systems. Despite its effectiveness, the ...widespread adoption of truth discovery requires solid privacy preservation against users’ sensory data and reliability information. Existing works of private truth discovery are primarily based on conventional cryptographic primitives, which introduce tremendous workloads on the system. In this work, we first propose an efficient and privacy-preserving truth discovery framework (EPTD-I) by adopting a novel data perturbation mechanism. EPTD-I not only protects users’ privacy but also introduces little overhead on the user side. Moreover, for high mobility environments, we improve the design with a user non-interactive scheme named EPTD-II to shift all encrypted truth discovery operations to cloud platforms. In EPTD-II, each user’s sensitive information is also kept private during the complete truth discovery procedure. Thorough security analysis demonstrates that our proposed schemes are secure and offer a high level of privacy preservation. Extensive experiments conducted on practical and simulated crowdsensing applications demonstrate the effectiveness and efficiency of the proposed schemes.
With the proliferation of mobile devices, mobile crowd sensing (MCS) has emerged as a new data collection paradigm, which allows the crowd to act as sensors and contribute their observations about ...entities. Unfortunately, users with varied skills and motivations may provide conflicting information for the same entity. Existing work solves this problem by estimating user reliability and inferring the correct observations (i.e., truths). However, these methods assume that users' expertise degrees are dependent on the truths, but ignore the finer clusters that exist even in the entities with the same truths. To capture users' fine-grained reliability on different entity clusters, we propose a novel Bayesian co-clustering truth discovery model for the task of observation aggregation. This model enables us to produce a more precise estimation while taking into account the entity clusters and the user clusters. Experiments on four real-world datasets reveal that our method outperforms the state-of-the-art approaches in terms of accuracy and F1-score.
Crowdsourcing has become an effective tool to utilize human intelligence to perform tasks that are challenging for machines. Many truth discovery methods and incentive mechanisms for crowdsourcing ...have been proposed. However, most of them cannot deal with the crowdsourcing with copiers, who copy a part (or all) of data from other workers. This article aims at designing crowdsourcing incentive mechanism for truth discovery of textual answers with copiers. We formulate the problem of maximizing the social welfare such that all tasks can be completed with the least confidence for truth discovery and design an three-stage incentive mechanism. In contextual embedding and clustering stage, we construct and cluster the content vector representations of textual crowdsourced answers at the semantic level. In truth discovery stage, we estimate the truth for each task based on the dependence and accuracy of workers. In reverse auction stage, we design a greedy algorithm to select the winners and determine the payment. Through both rigorous theoretical analysis and extensive simulations, we demonstrate that the proposed mechanisms achieve computational efficiency, individual rationality, truthfulness, and guaranteed approximation. Moreover, our truth discovery methods show prominent advantage in terms of precision when there are copiers in the crowdsourcing systems.
Benefitingfrom the rapid development of communication technology and Internet of Things (IoT) devices, crowdsensing is on the rise. Sensor data from IoT devices can be requested for data analysis and ...utilization, however, the collected data of an object from multiple devices are usually different. Therefore, how to extract the most reliable data from numerous data has become an important topic, and truth discovery receives great attention. These collected data often contain personal sensitive information, if users' privacy cannot be protected, many users are unwilling to contribute their data, and the usability of the published data will be greatly reduced. In this article, a robust privacy-preserving truth discovery scheme is proposed to simultaneously achieve the reliability and privacy of data. Specifically, the data are collected and encrypted before it is sent from the user. Compared with the existing works, there are two additional benefits, trusted third party and noncolluding platforms are not necessary anymore, hence the robustness is improved and single-point failure bottlenecks are eliminated. Besides, the proposed RPPTD is secure against many known attacks in open wireless networks, and the human-factor-aware differential aggregation attack. Finally, the performance evaluation indicates that our scheme is efficient and suitable for the practical environment.