Inconsistency-driven approach for human-in-the-loop entity matching

被引:0
|
作者
Ito, Hiroyoshi [1 ]
Koizumi, Takahiro [2 ]
Yoshimoto, Ryuji [3 ]
Fukushima, Yukihiro [4 ]
Harada, Takashi [5 ]
Morishima, Atsuyuki [1 ]
机构
[1] Univ Tsukuba, Inst Lib Informat & Media Sci, Tsukuba, Japan
[2] Univ Tsukuba, Grad Sch Comprehens Human Sci, Tsukuba, Japan
[3] CARLIL Inc, Tokyo, Japan
[4] Keio Univ, Fac Pharm, Keio, Japan
[5] Doshisha Univ, Ctr License & Qualificat, Kyoto, Japan
关键词
D O I
10.47989/ir30iConf47140
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
Introduction. Entity matching is a fundamental operation in a wide range of information management applications and a tremendous number of methods have been proposed to address the problem. Human-in-the-loop entity matching is a human-AI collaborative approach which is effective when the data for entity matching is incomplete or requires domain knowledge. A typical human-in-the- loop approach is to allow a machine-learning-based matcher to ask humans to match entities when it cannot match them with high confidence. However, ML- based matchers cannot avoid the unknown-unknown problem, i.e., they can resolve the entities incorrectly with high confidence. Method. This paper addresses an inconsistency-based method to deal with this problem. The method asks humans to resolve the entities when we find inconsistency in the transitivity property behind entity matching. For example, if a matcher returns a positive result only for two combinations among three entities, the result is inconsistent. Analysis. This paper shows an implementation of our idea in similarity-based blocking method and Bayesian inference and explains the result of an extensive set of experiments that reveals how and when the method is effective. Results. The result showed that the inconsistency-based sampling selects very different entity pairs compared to other sampling strategies and that a simple hybrid strategy performs well in many practical situations. Conclusion. The results indicate our approach complements any existing matcher that can cause the unknown-unknown problem in entity matching.
引用
收藏
页码:1024 / 1038
页数:15
相关论文
共 50 条
  • [1] BUBBLE : A Quality-Aware Human-in-the-loop Entity Matching Framework
    Osawa, Naofumi
    Ito, Hiroyoshi
    Fukushima, Yukihiro
    Harada, Takashi
    Morishima, Atsuyuki
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 3557 - 3565
  • [2] Human-in-the-Loop Based Named Entity Recognition
    Zhao, Yunpeng
    Liu, Ji
    2021 INTERNATIONAL CONFERENCE ON BIG DATA ENGINEERING AND EDUCATION (BDEE 2021), 2021, : 170 - 176
  • [3] SystemER: A Human-in-the-loop System for Explainable Entity Resolution
    Qian, Kun
    Popa, Lucian
    Sen, Prithviraj
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2019, 12 (12): : 1794 - 1797
  • [4] PARTNER: Human-in-the-Loop Entity Name Understanding with Deep Learning
    Qian, Kun
    Raman, Poornima Chozhiyath
    Li, Yunyao
    Popa, Lucian
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13634 - 13635
  • [5] Learning-Based Methods with Human-in-the-Loop for Entity Resolution
    Gurajada, Sairam
    Popa, Lucian
    Qian, Kun
    Sen, Prithviraj
    PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, : 2969 - 2970
  • [6] Sensor-driven, human-in-the-loop lighting control
    Tan, F.
    Caicedo, D.
    Pandharipande, A.
    Zuniga, M.
    LIGHTING RESEARCH & TECHNOLOGY, 2018, 50 (05) : 660 - 680
  • [7] A Human-in-the-Loop Approach to Malware Author Classification
    Kim, Eujeanne
    Park, Sung-Jun
    Chae, Dong-Kyu
    Choi, Seokwoo
    Kim, Sang-Wook
    CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 3289 - 3292
  • [8] Value Driven Representation for Human-in-the-Loop Reinforcement Learning
    Keramati, Ramtin
    Brunskill, Emma
    ACM UMAP '19: PROCEEDINGS OF THE 27TH ACM CONFERENCE ON USER MODELING, ADAPTATION AND PERSONALIZATION, 2019, : 176 - 180
  • [9] Satisficing approach to human-in-the-loop safeguarded control
    Ren, W
    Beard, RW
    ACC: PROCEEDINGS OF THE 2005 AMERICAN CONTROL CONFERENCE, VOLS 1-7, 2005, : 4985 - 4990
  • [10] Human-in-the-Loop Approach in Thermostatically Controlled Loads
    Firouznia, Mehdi
    Hui, Qing
    2019 AMERICAN CONTROL CONFERENCE (ACC), 2019, : 4727 - 4732