Inconsistency-driven approach for human-in-the-loop entity matching

被引:0
|
作者
Ito, Hiroyoshi [1 ]
Koizumi, Takahiro [2 ]
Yoshimoto, Ryuji [3 ]
Fukushima, Yukihiro [4 ]
Harada, Takashi [5 ]
Morishima, Atsuyuki [1 ]
机构
[1] Univ Tsukuba, Inst Lib Informat & Media Sci, Tsukuba, Japan
[2] Univ Tsukuba, Grad Sch Comprehens Human Sci, Tsukuba, Japan
[3] CARLIL Inc, Tokyo, Japan
[4] Keio Univ, Fac Pharm, Keio, Japan
[5] Doshisha Univ, Ctr License & Qualificat, Kyoto, Japan
关键词
D O I
10.47989/ir30iConf47140
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
Introduction. Entity matching is a fundamental operation in a wide range of information management applications and a tremendous number of methods have been proposed to address the problem. Human-in-the-loop entity matching is a human-AI collaborative approach which is effective when the data for entity matching is incomplete or requires domain knowledge. A typical human-in-the- loop approach is to allow a machine-learning-based matcher to ask humans to match entities when it cannot match them with high confidence. However, ML- based matchers cannot avoid the unknown-unknown problem, i.e., they can resolve the entities incorrectly with high confidence. Method. This paper addresses an inconsistency-based method to deal with this problem. The method asks humans to resolve the entities when we find inconsistency in the transitivity property behind entity matching. For example, if a matcher returns a positive result only for two combinations among three entities, the result is inconsistent. Analysis. This paper shows an implementation of our idea in similarity-based blocking method and Bayesian inference and explains the result of an extensive set of experiments that reveals how and when the method is effective. Results. The result showed that the inconsistency-based sampling selects very different entity pairs compared to other sampling strategies and that a simple hybrid strategy performs well in many practical situations. Conclusion. The results indicate our approach complements any existing matcher that can cause the unknown-unknown problem in entity matching.
引用
收藏
页码:1024 / 1038
页数:15
相关论文
共 50 条
  • [41] Human-in-the-loop active electrosense
    Fang, Sandra
    Peshkin, Michael
    MacIver, Malcolm A.
    BIOINSPIRATION & BIOMIMETICS, 2017, 12 (01)
  • [42] Web Engineering with Human-in-the-Loop
    Ustalov, Dmitry
    Pavlichenko, Nikita
    Tseytlin, Boris
    Baidakova, Daria
    Drutsa, Alexey
    WEB ENGINEERING (ICWE 2022), 2022, 13362 : 505 - 508
  • [43] Human-in-the-Loop Vehicle ReID
    Li, Zepeng
    Zhang, Dongxiang
    Shen, Yanyan
    Chen, Gang
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 5, 2023, : 6048 - 6055
  • [44] HuLP: Human-in-the-Loop for Prognosis
    Ridzuan, Muhammad
    Shaaban, Mai A.
    Saeed, Numan
    Sobirov, Ikboljon
    Yaqub, Mohammad
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT V, 2024, 15005 : 328 - 338
  • [45] Human-in-the-Loop Schema Induction
    Zhang, Tianyi
    Than, Isaac
    Hou, Zhaoyi
    Ren, Jiaxuan
    Zhou, Liyang
    Xu, Hainiu
    Zhang, Li
    Martin, Lara J.
    Dror, Rotem
    Li, Sha
    Ji, Hang
    Palmer, Martha
    Brown, Susan
    Suchocki, Reece
    Callison-Burch, Chris
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-DEMO 2023, VOL 3, 2023, : 1 - 10
  • [46] Human-in-the-Loop Interpretability Prior
    Lage, Isaac
    Ross, Andrew Slavin
    Kim, Been
    Gershman, Samuel J.
    Doshi-Velez, Finale
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [47] Human-in-the-loop Outlier Detection
    Chai, Chengliang
    Cao, Lei
    Li, Guoliang
    Li, Jian
    Luo, Yuyu
    Madden, Samuel
    SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, : 19 - 33
  • [48] Extended Reality for Enhanced Human-Robot Collaboration: a Human-in-the-Loop Approach
    Karpichev, Yehor
    Charter, Todd
    Hong, Jayden
    Enayati, Amir M. Soufi
    Honari, Homayoun
    Tamizi, Mehran Ghafarian
    Najjaran, Homayoun
    2024 33RD IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, ROMAN 2024, 2024, : 1991 - 1998
  • [49] Data-driven Forward Stochastic Reachability Analysis for Human-in-the-Loop Systems
    Choi, Joonwon
    Byeon, Sooyung
    Hwang, Inseok
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 1730 - 1735
  • [50] Using a human-in-the-loop evolutionary algorithm to create data-driven music
    Bryden, Kris
    2006 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-6, 2006, : 2050 - 2056