Truth Inference on Sparse Crowdsourcing Data with Local Differential Privacy
Crowdsourcing is a new problem-solving paradigm for tasks that are difficult for computers but easy for humans. Since the answers collected from the recruited participants (workers) may contain sensitive information, crowdsourcing raises serious privacy concerns. In this paper, we investigate the problem of protecting user privacy under local differential privacy (LDP), where individual workers randomize their answers independently and send the perturbed answers to the task requester. The utility goal is to ensure high accuracy of the inferred true answers (i.e., truth) from the perturbed data. One of the challenges of LDP perturbation is the sparsity of worker answers (i.e., each worker only answers a small number of tasks). Simple extension of existing approaches (e.g., Laplace perturbation and randomized response) may incur large errors in truth inference on sparse data. Thus we design a new matrix factorization (MF) algorithm under LDP that addresses the trade-off between privacy and utility (i.e., accuracy of truth inference). We prove that our MF algorithm can provide both LDP guarantee and small error of truth inference, regardless of the sparsity of worker answers. We perform extensive experiments on real-world and synthetic datasets, and demonstrate that the MF algorithm performs better than the existing LDP algorithms on sparse crowdsourcing data.
MSU Digital Commons Citation
Sun, Haipei; Dong, Boxiang; Wang, Hui Wendy; Yu, Ting; and Qin, Zhan, "Truth Inference on Sparse Crowdsourcing Data with Local Differential Privacy" (2019). Department of Computer Science Faculty Scholarship and Creative Works. 601.