A Confidence-based Entity Resolution Approach with Incomplete Information

被引:0
|
作者
Gu, Qi [1 ,2 ]
Zhang, Yan [1 ]
Cao, Jian [1 ]
Xu, Guandong [3 ]
Cuzzocrea, Alfredo [4 ,5 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai, Peoples R China
[2] Nantong Univ, Sch Comp Sci & Technol, Nantong, Peoples R China
[3] Univ Technol Sydney, Sydney, NSW, Australia
[4] ICAR CNR, Cosenza, Italy
[5] Univ Calabria, Cosenza, Italy
关键词
Entity Resolution; Cover Rate; Confidence; Accuracy;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Entity resolution identifies entities from different data sources that refer to the same real-world entity and it is an important prerequisite for integrating data from multiple sources. Entity resolution mainly relies on similarity measures on data records. Unfortunately, the data quality of data sources is not so good in practice. Especially web data sources often only provide incomplete information, which leads to the difficulties of direct applying similarity measures to identify the same entities. In order to address this problem, the concept of confidence is introduced to measure the trustworthy of the similarity calculation. An adaptive rule-based approach is used to calculate the similarity between records and its confidence is also derived. Then the similarity and confidence are propagated on the entity relational graph until fix point is reached. Finally, any pair of two records can be determined as matched or unmatched based on a threshold. We performed a series of experiments on real data sets and experiment results show that our approach has a better performance comparing with others.
引用
收藏
页码:97 / 103
页数:7
相关论文
共 50 条
  • [1] Cascaded classifiers for confidence-based chemical named entity recognition
    Corbett, Peter
    Copestake, Ann
    BMC BIOINFORMATICS, 2008, 9 (Suppl 11)
  • [2] Cascaded classifiers for confidence-based chemical named entity recognition
    Peter Corbett
    Ann Copestake
    BMC Bioinformatics, 9
  • [3] Mitigating Bias with Incomplete Sensitive Labels: A Confidence-Based Randomization Framework
    Hu, Zirui
    Zhang, Zheng
    Liu, Qi
    Bi, Haoyang
    Huang, Zhenya
    Mao, Qingyang
    Gao, Weibo
    Feng, Wenjun
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2024, PT IV, 2024, 14853 : 139 - 155
  • [4] CONFIDENCE IN JUDGMENTS BASED ON INCOMPLETE INFORMATION
    LEVIN, IP
    JOHNSON, RD
    BULLETIN OF THE PSYCHONOMIC SOCIETY, 1986, 24 (05) : 351 - 351
  • [5] A CONFIDENCE-BASED APPROACH FOR IMPROVING KEYWORD HYPOTHESIS SCORES
    Seigel, M. S.
    Woodland, P. C.
    Gales, M. J. F.
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 8565 - 8569
  • [6] An SVM Confidence-Based Approach to Medical Image Annotation
    Tommasi, Tatiana
    Orabona, Francesco
    Caputo, Barbara
    EVALUATING SYSTEMS FOR MULTILINGUAL AND MULTIMODAL INFORMATION ACCESS, 2009, 5706 : 696 - 703
  • [7] CAW: Confidence-Based Adaptive Weighted Model for Multi-modal Entity Linking
    Tang, Yongtao
    Li, Shasha
    Yu, Jie
    Ma, Jun
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT VI, 2024, 15021 : 34 - 51
  • [8] A new confidence-based recommendation approach: Combining trust and certainty
    Gohari, Faezeh Sadat
    Aliee, Fereidoon Shams
    Haghighi, Hassan
    INFORMATION SCIENCES, 2018, 422 : 21 - 50
  • [9] A confidence-based approach to enhancing underwater acoustic image formation
    Murino, V
    Trucco, A
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 1999, 8 (02) : 270 - 285
  • [10] Confidence-based active learning
    Li, Mingkun
    Sethi, Ishwar K.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2006, 28 (08) : 1251 - 1261