Multi-Truth Discovery While Being Aware of Unbalanced Data Distribution

被引:0
|
作者
Fang, Xiu Susie [1 ]
Sheng, Quan Z. [2 ]
Sun, Guohao [1 ]
Chang, Shan [1 ]
Wang, Hongya [1 ]
Yang, Jian [2 ]
机构
[1] Donghua Univ, Shanghai, Peoples R China
[2] Macquarie Univ, Sydney, NSW, Australia
基金
上海市自然科学基金;
关键词
truth discovery; multi-value objects; unbalanced data; confidence interval; object uncertainty; MODEL;
D O I
10.1109/IJCNN54540.2023.10191906
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to information explosion, conflicting data on the same object among multiple sources is ubiquitous on the Web. To solve those conflicts while estimating source reliability, truth discovery has become a hot topic. However, when considering multi-value objects, the inevitable unbalanced data distribution is overlooked by the existing approaches. In particular, only a few sources make lots of claims while most sources only provide a few claims, which renders the source reliability estimated for "small" sources totally random; Some objects are covered by plenty of sources while some objects are claimed by only a few sources, which causes the value correctness calculated for "cold" objects unreasonable. To tackle the unbalanced data where multi-value objects exist, we propose a confidence interval based approach (CIMTD). We estimate source reliability from two aspects, i.e., the ability to claim the correct number of value(s) and specific value(s) on an object. To reflect the real reliability for both "big" and "small" sources, confidence intervals of enriched estimation are considered. While estimating source reliability, uncertainty degrees are introduced to model object differences. Confidence intervals are also considered to reflect the real uncertainty for both "hot" and "cold" objects. Experimental results on two realworld datasets demonstrate the effectiveness of our approach.
引用
收藏
页数:10
相关论文
共 26 条
  • [1] Empowering Truth Discovery with Multi-Truth Prediction
    Wang, Xianzhi
    Sheng, Quan Z.
    Yao, Lina
    Li, Xue
    Fang, Xiu Susie
    Xu, Xiaofei
    Benatallah, Boualem
    CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 881 - 890
  • [2] Domain-Aware Multi-Truth Discovery from Conflicting Sources
    Lin, Xueling
    Chen, Lei
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (05): : 635 - 647
  • [3] Generalizing truth discovery by incorporating multi-truth features
    Fang, Xiu Susie
    Wang, Xianzhi
    Sheng, Quan Z.
    Yao, Lina
    COMPUTING, 2024, 106 (05) : 1557 - 1583
  • [4] Multi-Truth Discovery Method Based on Attribute Fusion
    Haolin, Yang
    Yongquan, Dong
    Huafeng, Chen
    Guoxi, Zhang
    Data Analysis and Knowledge Discovery, 2022, 6 (11) : 52 - 60
  • [5] A Data-Semantic-Conflict-Based Multi-Truth Discovery Algorithm for a Programming Site
    Xu, Haitao
    Zhang, Haiwang
    Li, Qianqian
    Qin, Tao
    Zhang, Zhen
    CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 68 (02): : 2681 - 2691
  • [6] Enhancing domain-aware multi-truth data fusion using copy-based source authority and value similarity
    Azzalini, Fabio
    Piantella, Davide
    Rabosio, Emanuele
    Tanca, Letizia
    VLDB JOURNAL, 2023, 32 (03): : 475 - 500
  • [7] Enhancing domain-aware multi-truth data fusion using copy-based source authority and value similarity
    Fabio Azzalini
    Davide Piantella
    Emanuele Rabosio
    Letizia Tanca
    The VLDB Journal, 2023, 32 : 475 - 500
  • [8] Truth Discovery of Multi-Source Text Data
    Chang, Chen
    Cao, Jianjun
    Feng, Qin
    Weng, Nianfeng
    Shang, Yuling
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (11): : 2249 - 2252
  • [9] A Confidence-Aware Approach for Truth Discovery on Long-Tail Data
    Li, Qi
    Li, Yaliang
    Gao, Jing
    Su, Lu
    Zhao, Bo
    Demirbas, Murat
    Fan, Wei
    Han, Jiawei
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 8 (04): : 425 - 436
  • [10] Truth Discovery With Multi-Modal Data in Social Sensing
    Shao, Huajie
    Sun, Dachun
    Yao, Shuochao
    Su, Lu
    Wang, Zhibo
    Liu, Dongxin
    Liu, Shengzhong
    Kaplan, Lance
    Abdelzaher, Tarek
    IEEE TRANSACTIONS ON COMPUTERS, 2021, 70 (09) : 1325 - 1337