Enhancing domain-aware multi-truth data fusion using copy-based source authority and value similarity

被引:0
|
作者
Azzalini, Fabio [1 ,2 ]
Piantella, Davide [1 ]
Rabosio, Emanuele [2 ]
Tanca, Letizia [1 ]
机构
[1] Politecn Milan, Dipartimento Elettron Informaz & Bioingn, Via G Ponzio 34-5, I-20133 Milan, Italy
[2] Human Technopole, Ctr Hlth Data Sci, Viale R Levi Montalcini 1, I-20157 Milan, Italy
来源
VLDB JOURNAL | 2023年 / 32卷 / 03期
关键词
Data integration; Multi-truth data fusion; Source authority; Copy detection; Value similarity; DISCOVERY; WEB;
D O I
10.1007/s00778-022-00757-x
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data fusion, within the data integration pipeline, addresses the problem of discovering the true values of a data item when multiple sources provide different values for it. An important contribution to the solution of the problem can be given by assessing the quality of the involved sources and relying more on the values coming from trusted sources. State-of-the-art data fusion systems define source trustworthiness on the basis of the accuracy of the provided values and on the dependence on other sources, and recently it has been also recognized that the trustworthiness of the same source may vary with the domain of interest. In this paper we propose STORM, a novel domain-aware algorithm for data fusion designed for the multi-truth case, that is, when a data item can also have multiple true values. Like many other data-fusion techniques, STORM relies on Bayesian inference. However, differently from the other Bayesian approaches to the problem, it determines the trustworthiness of sources by taking into account their authority: Here, we define authoritative sources as those that have been copied by many other ones, assuming that, when source administrators decide to copy data from other sources, they choose the ones they perceive as the most reliable. To group together the values that have been recognized as variants representing the same real-world entity, STORM provides also a value-reconciliation step, thus reducing the possibility of making mistakes in the remaining part of the algorithm. The experimental results on multi-truth synthetic and real-world datasets show that STORM represents a solid step forward in data-fusion research.
引用
收藏
页码:475 / 500
页数:26
相关论文
共 33 条
  • [1] Enhancing domain-aware multi-truth data fusion using copy-based source authority and value similarity
    Fabio Azzalini
    Davide Piantella
    Emanuele Rabosio
    Letizia Tanca
    The VLDB Journal, 2023, 32 : 475 - 500
  • [2] Domain-Aware Multi-Truth Discovery from Conflicting Sources
    Lin, Xueling
    Chen, Lei
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (05): : 635 - 647
  • [3] A Situation Analysis Method for Specific Domain Based on Multi-source Data Fusion
    Wang, Haijian
    Zhang, Zhaohui
    Wang, Pengwei
    INTELLIGENT COMPUTING THEORIES AND APPLICATION, PT I, 2018, 10954 : 160 - 171
  • [4] A multi-source heterogeneous spatial big data fusion method based on multiple similarity and voting decision
    Chen, Zeqiu
    Zhou, Jianghui
    Sun, Ruizhi
    SOFT COMPUTING, 2023, 27 (05) : 2479 - 2492
  • [5] A multi-source heterogeneous spatial big data fusion method based on multiple similarity and voting decision
    Zeqiu Chen
    Jianghui Zhou
    Ruizhi Sun
    Soft Computing, 2023, 27 : 2479 - 2492
  • [6] Similarity-based health risk prediction using Domain Fusion and electronic health records data
    Guo, Jia
    Yuan, Chi
    Shang, Ning
    Zheng, Tian
    Bello, Natalie A.
    Kiryluk, Krzysztof
    Weng, Chunhua
    Wang, Shuang
    JOURNAL OF BIOMEDICAL INFORMATICS, 2021, 116
  • [7] Multi-source Data Ground Object Extraction Based on Knowledge-Aware and Multi-scale Feature Fusion Network
    Gong J.
    Zhang Z.
    Jia H.
    Zhou H.
    Zhao Y.
    Xiong H.
    Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics and Information Science of Wuhan University, 2022, 47 (10): : 1546 - 1554
  • [8] A belief logarithmic similarity measure based on Dempster-Shafer theory and its application in multi-source data fusion
    Huang, Haojian
    Liu, Zhe
    Han, Xue
    Yang, Xiangli
    Liu, Lusi
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (03) : 4935 - 4947
  • [9] RESIDENTIAL EXTRACTION BASED ON WEAKLY-SUPERVISED SIMILARITY-AWARE MULTI-SOURCE ALIGNMENT STRATEGY WITH LIMITED SAR DATA
    Ma, Sijia
    Zhang, Libao
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2915 - 2919
  • [10] A Solar Power Prediction Using Support Vector Machines Based on Multi-source Data Fusion
    Wang Buwei
    Che Jianfeng
    Wang Bo
    Feng Shuanglei
    2018 INTERNATIONAL CONFERENCE ON POWER SYSTEM TECHNOLOGY (POWERCON), 2018, : 4573 - 4577