Modeling Topic-Based Human Expertise for Crowd Entity Resolution

被引:4
|
作者
Gong, Sai-Sai [1 ]
Hu, Wei [1 ]
Ge, Wei-Yi [2 ]
Qu, Yu-Zhong [1 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210023, Jiangsu, Peoples R China
[2] Sci & Technol Informat Syst Engn Lab, Nanjing 210007, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
entity resolution; crowdsourcing; human expertise; topic modeling; task similarity;
D O I
10.1007/s11390-018-1882-8
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Entity resolution (ER) aims to identify whether two entities in an ER task refer to the same real-world thing. Crowd ER uses humans, in addition to machine algorithms, to obtain the truths of ER tasks. However, inaccurate or erroneous results are likely to be generated when humans give unreliable judgments. Previous studies have found that correctly estimating human accuracy or expertise in crowd ER is crucial to truth inference. However, a large number of them assume that humans have consistent expertise over all the tasks, and ignore the fact that humans may have varied expertise on different topics (e.g., music versus sport). In this paper, we deal with crowd ER in the Semantic Web area. We identify multiple topics of ER tasks and model human expertise on different topics. Furthermore, we leverage similar task clustering to enhance the topic modeling and expertise estimation. We propose a probabilistic graphical model that computes ER task similarity, estimates human expertise, and infers the task truths in a unified framework. Our evaluation results on real-world and synthetic datasets show that, compared with several state-of-the-art approaches, our proposed model achieves higher accuracy on the task truth inference and is more consistent with the human real expertise.
引用
收藏
页码:1204 / 1218
页数:15
相关论文
共 50 条
  • [11] Entity Resolution with Crowd Errors
    Verroios, Vasilis
    Garcia-Molina, Hector
    [J]. 2015 IEEE 31ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2015, : 219 - 230
  • [12] Topic-Based Hierarchical Segmentation
    Chien, Jen-Tzung
    Chueh, Chuang-Hua
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01): : 55 - 66
  • [13] Topic-based heterogeneous rank
    Amjad, Tehmina
    Ding, Ying
    Daud, Ali
    Xu, Jian
    Malic, Vincent
    [J]. SCIENTOMETRICS, 2015, 104 (01) : 313 - 334
  • [14] Topic-based heterogeneous rank
    Tehmina Amjad
    Ying Ding
    Ali Daud
    Jian Xu
    Vincent Malic
    [J]. Scientometrics, 2015, 104 : 313 - 334
  • [15] Topic-Based Sentiment Analysis
    Buddhitha, Prasadith
    Inkpen, Diana
    [J]. INFORMATION MANAGEMENT AND BIG DATA, 2017, 656 : 95 - 107
  • [16] Signaling Context in Topic-Based Writing
    Swarts, Jason
    [J]. TECHNICAL COMMUNICATION, 2022, 69 (01) : 40 - 53
  • [17] Topic-based Indexing of Federated Datasets
    Sorrentino, Ciro
    Giallonardo, Ester
    Zimeo, Eugenio
    [J]. SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING, 2019, : 1090 - 1098
  • [18] Question Selection for Crowd Entity Resolution
    Whang, Steven Euijong
    Lofgren, Peter
    Garcia-Molina, Hector
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (06): : 349 - 360
  • [19] Topic-Based Communication Between Agents
    Galimullin, Rustam
    Velazquez-Quesada, Fernando R.
    [J]. STUDIA LOGICA, 2024,
  • [20] Topic-based Video Analysis: A Survey
    Pal, Ratnabali
    Sekh, Arif Ahmed
    Dogra, Debi Prosad
    Kar, Samarjit
    Roy, Partha Pratim
    Prasad, Dilip K.
    [J]. ACM COMPUTING SURVEYS, 2021, 54 (06)