Human intelligence tasks (HITs) are widely utilized for crowdsourcing human knowledge, such as labeling images for machine learning. Centralized crowdsourcing platforms face challenges of a single ...point of failure and a lack of service transparency. Existing blockchain-based crowdsourcing approaches overlook the low scalability problem of permissionless blockchains or inconveniently rely on existing ground-truth data as the root of trust to evaluate quality of workers' answers. We propose a blockchain-based crowdsourcing scheme for ensuring dual fairness (i.e., preventing false-reporting and free-riding) and improving on-chain efficiency concerning on-chain storage and smart contract computation. The proposed scheme does not rely on trusted authorities but rather depends on a public blockchain to guarantee the dual fairness. An efficient and publicly verifiable truth discovery scheme is designed based on majority voting and cryptographic accumulators. This truth discovery scheme aims at inferring ground truth from workers' answers. The ground truth is further utilized to estimate the quality of workers' answers. Additionally, a novel blockchain-based protocol is designed to further reduce on-chain costs while ensuring truthfulness. The scheme has O(n) complexity for both on-chain storage and smart contract computation, regardless of the number of questions, where n denotes the number of workers. Formal security analysis is provided, and extensive experiments are conducted to evaluate effectiveness and performance.
With the proliferation of social sensing, large amounts of observation are contributed by people or devices. However, these observations contain disinformation. Disinformation can propagate across ...online social networks at a relatively low cost, but result in a series of major problems in our society. In this survey, we provide a comprehensive overview of disinformation and truth discovery in social sensing under a unified perspective, including basic concepts and the taxonomy of existing methodologies. Furthermore, we summarize the mechanism of disinformation from four different perspectives (i.e., text only, text with image/multi-modal, text with propagation, and fusion models). In addition, we review existing solutions based on these requirements and compare their pros and cons and give a sort of guide to usage based on a detailed lesson learned. To facilitate future studies in this field, we summarize related publicly accessible real-world data sets and open source codes. Last but the most important, we emphasize potential future research topics and challenges in this domain through a deep analysis of most recent methods.
•A recognition and inference scheme for workers' reputations to distinguish the attributes of workers.•A novel scheme to calibrate workers' systematic bias, expanding the workers' selection group and ...improving the accuracy.•A learning-based approach to derive workers' preferences, considering the heterogeneity of tasks in MCS.•A new MAB-based framework for recruitment about the workers.
Accurate data collection from workers is crucial for the success of Mobile Crowd Sensing (MCS) applications. However, Current studies exhibit several drawbacks. Firstly, the workers' sensing qualities remain unknown even after the platform acquires the data submitted by the workers, known as the Post Unknown Worker Selection (PUWS) problem. Secondly, systematic deviations between worker data and the Ground Truth Data (GTD) reduce the quality of MCS applications. Thirdly, the data collected by workers for different tasks may vary in accuracy, resulting in low-quality data collection. To address these challenges, we propose a novel Multi-armit-based worker selection scheme with reputation and preference (MAB-RP). The proposed scheme aims to select credible workers for high-quality data collection through trust identification, thus addressing the PUWS issue after worker recruitment. Additionally, the scheme employs a learning-based approach to identify and correct the gaps between the sensed data and the GTD, ultimately improving the accuracy of data collection. Lastly, a matching-based approach is used to identify workers' sensing qualities for different tasks, further enhancing the accuracy of data collection in MCS. Extensive simulations on real-world datasets demonstrate that the proposed MAB-RP scheme outperforms previous strategies in terms of both data quality and cost.
Truth discovery is an effective way to eliminate data inconsistency by integrating different worker-provided values. Although directly conducting non-private truth discovery approaches based on ...uploaded noisy values after adding Laplace noise for continuous inputs guarantees rigorous local differential privacy (LDP), it may result in poor performance due to the lot of contained noise. First, the injected noise for privacy protection randomly sampled from Laplace distribution may be excessive even with a large privacy budget, as the above distribution is unbounded and drops sharply with respect to the x-axis. Built-in Gaussian noise also usually exists within these uploaded noisy values, which may also have a negative effect on the aggregated truths under LDP and makes the problem investigated in this paper far more challenging. In this paper, we focus on obtaining accurate truths in the above cases under rigorous LDP for continuous inputs, and present a novel solution TESLA. The key idea of this solution is that we let injected noise for privacy protection and inherent Gaussian noise only weakly negatively affect the weight estimation and true aggregation. In particular, we design a runtime filtering mechanism (RFM) to obtain the supremum and infimum for the values after adding Laplace noise by considering these two types of noise together. Moreover, we develop a probabilistic fusion mechanism (PFM) to get the fused values by adaptively using the obtained supremum and infimum. Furthermore, we devise a probabilistic weight mechanism (PWM) to obtain a more accurate weight for each worker. Therefore, truth discovery can be conducted based on the new weight of each worker and the filtered values. We provide theoretical analyses of TESLA’s utility, privacy and complexity. Experimental results demonstrate the effectiveness and efficiency of TESLA. We also extend and verify TESLA over typical mean estimation as well as standard deviation calculation, and various machine learning tasks (e.g., logistic regression, support vector machine (SVM) and neural network). Experimental results also demonstrate its superiority.
Benefiting from the fast development of human-carried mobile devices, crowd sensing has become an emerging paradigm to sense and collect data. However, reliability of sensory data provided by ...participating users is still a major concern. To address this reliability challenge, truth discovery is an effective technology to improve data accuracy, and has garnered significant attention. Nevertheless, many of state of art works in truth discovery, either failed to address the protection of participants’ privacy or incurred tremendous overhead on the user side. In this paper, we first propose a privacy-preserving truth discovery scheme, named PPTDS-I, which is implemented on two non-colluding cloud platforms. By capitalizing on properties of modular arithmetic, this scheme is able to protect both users’ sensory data and reliability information, and simultaneously achieve high efficiency and fault-tolerance. Additionally, for the scenarios with resource constrained devices, an efficient truth discovery scheme, named PPTDS-II, is presented. It can not only protect users’ sensory data, but also avoids user participation in the iterative truth discovery procedure. Detailed security analysis shows that the proposed schemes are secure under a comprehensive threat model. Furthermore, extensive experimental analysis has been conducted, which proves the efficiency of the proposed schemes.
Geographic information system-based mineral prospectivity mapping (MPM) aims to generate targets by combining multiple proxy layers containing geology, geochemistry, and geophysics information based ...on an available understanding of geological processes and translating it into critical targeting criteria. However, factors such as an imperfect geological understanding with numerous heuristic natures and intrinsic biases, the inaccuracy and sparsity of datasets, and multiple selections of predictive methods adversely affect the results of MPM and jeopardize the reliability of decision-making in exploration. Thus, a series of knowledge-driven and data-driven MPM approaches to counterbalance these disadvantages have been proposed. Uncertainty is defined as a metric of the various scales of the likelihood and consequences, which is helpful to quantitatively represent the above risks. In this paper, the uncertainty in the final three-dimensional prospectivity map is analyzed and interpreted in terms of quantification, visualization, and comparison of different predictive approaches, and a novel technology is proposed based on a (1 +
ε
) approximate global optimum strategy derived from the truth discovery society. It outputs the globally optimal truth (
p*
) as the overall mathematical expectation and a set of weights
w
i
as the representation of reliability. Here, a previous study of the Haoyaoerhudong gold deposit, which is one of the largest black-rock-series-type gold mines in China, was reevaluated. The method demonstrated the following advantages: (i) sorted the reliability of potential models built by multiple predictive variables and different mathematical methods, (ii) provided a “best-guess-decision” prospectivity result by combining the best reliability model from
w
i
and the signal-to-noise ratio (SNR) of the risk–return model, and (iii) provided a statistically final uncertainty model by combining
p*
and risk information.
Many Multi-Armed Bandit (MAB) based workers selection schemes have been proposed to select high-quality workers to enhance the quality of tasks. However, in Mobile Crowd Sensing (MCS), a complex ...mutual effect exists among task requestors, the MCS platforms, and workers. Only considering the interaction of two sides doesn’t make MCS a balanced ecosystem. Therefore, it is urgent to establish a tripartite mutual incentive mechanism to make the MCS system a balanced ecosystem. In this paper, a truth based Three-tier Combinatorial Multi-Armed Bandit (TCMAB) incentive mechanism is proposed for selecting each other to maximize their revenues in MCS. In TCMAB, there exists a three-tier and two-way MAB-based incentive scheme. For the mutual interaction between the platform and the worker, a truth-based CMAB scheme is established for the platform to select high-quality workers, and also a CMAB scheme is proposed for workers to select a “good” platform to report data in order to maximize their revenue. Besides, for the mutual interaction between the task requestor and the platform, the platforms adopt the CMAB-based scheme to select a high-payment task requestor that gives high payment. And a platform selection scheme base on CMAB is also established for task requestors to select the platforms which have lower fees and higher quality. What’s more, we don’t adopt the assumption that platforms get the data quality as soon as they get data, but propose a data quality acquisition scheme based on the truth data discovery and cooperation frequency, which is the base to instruct the three-tier interaction, thus establish a kind of truth-based MCS interaction ecosystems. Simulation results show that the proposed TCMAB provides an effective solution for the problem of information elicitation without verification (IEWV) in MCS, and can improve the utilities, data quality, and applications quality for MCS, which is not achieved in the previous studies significantly.
With the development of communications, networking, and information technology, Crowdsensed Data Trading (CDT) becomes a novel data trading paradigm. In CDT, the data requesters publish crowdsensing ...tasks with specific data requirements, and then workers complete these tasks, upload the data and obtain corresponding rewards. To efficiently deal with data trading, most of the existing CDT systems assume a trusted centralized platform. However, we argue that the platform may collude with workers or requesters to trick others for achieving more benefits. For example, according to the workers' uploaded data, the platform can modify the reward functions by colluding with the requester. Similarly, the platform might collude with workers to let them know the reward function, then workers could forge data. Meanwhile, requesters and workers may also be malicious. For example, requesters may post tasks but fail to pay and workers can upload wrong data to mislead the system. To solve the above problems, we combine the Crowdsensed Data Trading system with intelligent Blockchain (CDT-B), which contains a smart contract called CDToken. As a credible third-party, the CDToken is used to record the requesters' reward function and workers' data uploading function to avoid targeted trick. At the same time, we not only design a Data Uploading and Preprocessing (DUP) mechanism in CDToken to collect and process the workers' sensed data, but also propose a Grouping Truth Discovery (GTD) to evaluate their data quality for determining the payments. Moreover, to hold a large number of requesters and workers in CDT-B, we propose a Layered Sharding blockchain based on Membership Degree (LSMD) to solve the blockchain inefficiency problem. Finally, we deploy CDToken to an experimental environment based on Ethereum and demonstrate its efficient performance and practicability.
Crowdsensing systems collect various types of data from sensors embedded on mobile devices owned by individuals. These individuals are commonly referred to as workers that complete tasks published by ...crowdsensing systems. Because of the relative lack of control over worker identities, crowdsensing systems are susceptible to data poisoning attacks which interfering with data analysis results by injecting fake data conflicting with ground truth. Frameworks like TruthFinder can resolve data conflicts by evaluating the trustworthiness of the data providers. These frameworks somehow make crowdsensing systems more robust since they can limit the impact of dirty data by reducing the value of unreliable workers. However, previous work has shown that TruthFinder may also be affected by the data poisoning attack when the malicious workers have access to global information. In this article, we focus on partially observable data poisoning attacks in crowdsensing systems. We show that even if the malicious workers only have access to local information, they can find effective data poisoning attack strategies to interfere with crowdsensing systems with TruthFinder. First, we formally model the problem of partially observable data poisoning attack against crowdsensing systems. Then, we propose a data poisoning attack method based on deep reinforcement learning, which helps malicious workers jeopardize with TruthFinder while hiding themselves. Based on the method, the malicious workers can learn from their attack attempts and evolve the poisoning strategies continuously. Finally, we conduct experiments on real-life data sets to verify the effectiveness of the proposed method.
Constrained Truth Discovery Ye, Chen; Wang, Hongzhi; Zheng, Kangjie ...
IEEE transactions on knowledge and data engineering,
2022-Jan.-1, 2022-1-1, Volume:
34, Issue:
1
Journal Article
Peer reviewed
To aggregate useful information among diversified sources, a hotspot research topic called truth discovery has emerged in recent years. Existing truth discovery methods attempt to infer the true ...attribute values for the entities by identifying and trusting reliable data sources. That is, the values provided by reliable sources are more likely to be the true values. However, all these methods neglect the relations among different entities, which play important roles in truth discovery task. When reliable data sources cannot provide sufficient information of entities, the true attribute values of these entities can still be inferred by propagating trustworthy information from related entities. Motivated by this, in this paper, we introduce the constrained truth discovery problem. We incorporate denial constraints, a universally quantified first-order logic formalism which can express a large number of effective and widely existing relations among entities, into the process of truth discovery. We formulate it as a constrained optimization problem and analyze its hardness. To address the problem, we propose algorithms to partition the entities into disjoint groups, and generate arithmetic constraints for each disjoint group separately. Then, the true attribute values of the entities in each disjoint group are derived by minimizing the objective function under the corresponding arithmetic constraints. Experimental results on both real-world and synthetic datasets demonstrate that the proposed approach achieves good performance even with very few constraints and reliable sources.