Human-Powered Data Cleaning for Probabilistic Reachability Queries on Uncertain Graphs

被引:10
|
作者
Lin, Xin [1 ,2 ]
Peng, Yun [2 ]
Choi, Byron [2 ]
Xu, Jianliang [2 ]
机构
[1] East China Normal Univ, Shanghai Key Lab Multidimens Informat Proc, Shanghai 200241, Peoples R China
[2] Hong Kong Baptist Univ, Dept Comp Sci, Kowloon Tong, Hong Kong, Peoples R China
关键词
Crowdsourcing;
D O I
10.1109/TKDE.2017.2684166
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Uncertain graph models are widely used in real-world applications such as knowledge graphs and social networks. To capture the uncertainty, each edge in an uncertain graph is associated with an existential probability that signifies the likelihood of the existence of the edge. One notable issue of querying uncertain graphs is that the results are sometimes uninformative because of the edge uncertainty. In this paper, we consider probabilistic reachability queries, which are one of the fundamental classes of graph queries. To make the results more informative, we adopt a crowdsourcing-based approach to clean the uncertain edges. However, considering the time and monetary cost of crowdsourcing, it is a problem to efficiently select a limited set of edges for cleaning that maximizes the quality improvement. We prove that the edge selection problem is #P-hard. In light of the hardness of the problem, we propose a series of edge selection algorithms, followed by a number of optimization techniques and pruning heuristics for reducing the computation time. Our experimental results demonstrate that our proposed techniques outperform a random selection by up to 27 times in terms of the result quality improvement and the brute-force solution by up to 60 times in terms of the elapsed time.
引用
收藏
页码:1452 / 1465
页数:14
相关论文
共 28 条
  • [1] Human-Powered Data Cleaning for Probabilistic Reachability Queries on Uncertain Graphs
    Lin, Xin
    Peng, Yun
    Xu, Jianliang
    Choi, Byron
    [J]. 2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 1755 - 1756
  • [2] Label and Distance-Constraint Reachability Queries in Uncertain Graphs
    Chen, Minghan
    Gu, Yu
    Bao, Yubin
    Yu, Ge
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2014, PT I, 2014, 8421 : 188 - 202
  • [3] Probabilistic MaxRS Queries on Uncertain Data
    Nakayama, Yuki
    Amagata, Daichi
    Hara, Takahiro
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2017, PT I, 2017, 10438 : 111 - 119
  • [4] Answering threshold-based reachability queries over probabilistic graphs
    Yuan Y.
    Wang G.-R.
    [J]. Jisuanji Xuebao/Chinese Journal of Computers, 2010, 33 (12): : 2219 - 2228
  • [5] Uncertain probabilistic range queries on multidimensional data
    Bernad, Jorge
    Bobed, Carlos
    Mena, Eduardo
    [J]. INFORMATION SCIENCES, 2020, 537 (334-367) : 334 - 367
  • [6] Uncertain Data Queries Processing in a Probabilistic Framework
    He, Ming
    Du, Yong-ping
    [J]. JOURNAL OF COMPUTERS, 2010, 5 (11) : 1663 - 1669
  • [7] Probabilistic spatial queries on existentially uncertain data
    Dai, XY
    Yiu, ML
    Mamoulis, N
    Tao, YF
    Vaitis, M
    [J]. ADVANCES IN SPATIAL AND TEMPORAL DATABASES, PROCEEDINGS, 2005, 3633 : 400 - 417
  • [8] Cleaning Uncertain Data for Top-k Queries
    Mo, Luyi
    Cheng, Reynold
    Li, Xiang
    Cheung, David W.
    Yang, Xuan S.
    [J]. 2013 IEEE 29TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2013, : 134 - 145
  • [9] Probabilistic Inverse Ranking Queries over Uncertain Data
    Lian, Xiang
    Chen, Lei
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2009, 5463 : 35 - 50
  • [10] Probabilistic Convex Hull Queries over Uncertain Data
    Yan, Da
    Zhao, Zhou
    Ng, Wilfred
    Liu, Steven
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (03) : 852 - 865