A probabilistic ranking framework for web-based relational data imputation

被引:2
|
作者
Chen, Zhaoqiang [1 ]
Chen, Qun [1 ]
Li, Jiajun [1 ]
Li, Zhanhuai [1 ]
Chen, Lei [2 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci & Engn, Xian 710072, Peoples R China
[2] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Web-based relational data imputation; Missing attribute values; Probabilistic ranking;
D O I
10.1016/j.ins.2016.03.036
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Due to richness of information on web, there is an increasing interest to search for missing attribute values in relational data on web. Web-based relational data imputation has to first extract multiple candidate values from web and then rank them by their matching probabilities. However, effective candidate ranking remains challenging because web documents are unstructured and popular search engines can only provide with relevant but not necessarily semantically matching information. In this paper, we propose a novel probabilistic approach for ranking the web-retrieved candidate values. It can integrate various influence factors, e.g. snippet rank order, occurrence frequency, occurrence pattern, and keyword proximity, in a single framework by semantic reasoning. The proposed framework consists of snippet influence model and semantic matching model. The snippet influence model measures the influence of a snippet, and the semantic matching model measures the semantic similarity between a candidate value in a snippet and a missing relational value in a tuple. We also present effective probabilistic estimation solutions for both models. Finally, we empirically evaluate the performance of the proposed framework on real datasets. Our extensive experiments demonstrate that it outperforms the state-of-the-art techniques by considerable margins on imputation accuracy. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:152 / 168
页数:17
相关论文
共 50 条
  • [1] Automatic Web-based relational data imputation
    Hailong Liu
    Zhanhuai Li
    Qun Chen
    Zhaoqiang Chen
    [J]. Frontiers of Computer Science, 2018, 12 : 1125 - 1139
  • [2] Automatic Web-based relational data imputation
    Liu, Hailong
    Li, Zhanhuai
    Chen, Qun
    Chen, Zhaoqiang
    [J]. FRONTIERS OF COMPUTER SCIENCE, 2018, 12 (06) : 1125 - 1139
  • [3] A context-aware entity ranking method for web-based data imputation
    Chen, Zhao-Qiang
    Li, Jia-Jun
    Jiang, Chuan
    Liu, Hai-Long
    Chen, Qun
    Li, Zhan-Huai
    [J]. Jisuanji Xuebao/Chinese Journal of Computers, 2015, 38 (09): : 1755 - 1766
  • [4] A web-based approach to data imputation
    Zhixu Li
    Mohamed A. Sharaf
    Laurianne Sitbon
    Shazia Sadiq
    Marta Indulska
    Xiaofang Zhou
    [J]. World Wide Web, 2014, 17 : 873 - 897
  • [5] A web-based approach to data imputation
    Li, Zhixu
    Sharaf, Mohamed A.
    Sitbon, Laurianne
    Sadiq, Shazia
    Indulska, Marta
    Zhou, Xiaofang
    [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2014, 17 (05): : 873 - 897
  • [6] Cost Reduction for Web-Based Data Imputation
    Li, Zhixu
    Shang, Shuo
    Xie, Qing
    Zhang, Xiangliang
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2014, PT II, 2014, 8422 : 438 - 452
  • [7] Efficient Web-Based Data Imputation with Graph Model
    Tang, Yiwen
    Wang, Hongzhi
    Zhang, Shiwei
    Zhang, Huijun
    Shi, Ruoxi
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2017), 2017, 10179 : 213 - 226
  • [8] Improving the Quality of Web-Based Data Imputation With Crowd Intervention
    Gu, Binbin
    Li, Zhixu
    Liu, An
    Xu, Jiajie
    Zhao, Lei
    Zhou, Xiaofang
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (06) : 2534 - 2547
  • [9] INTRODUCING OPENPRA: A WEB-BASED FRAMEWORK FOR COLLABORATIVE PROBABILISTIC RISK ASSESSMENT
    Earthperson, Arjun
    Aras, Egemen M.
    Farag, Asmaa S.
    Diaconeasa, Mihai A.
    [J]. PROCEEDINGS OF ASME 2023 INTERNATIONAL MECHANICAL ENGINEERING CONGRESS AND EXPOSITION, IMECE2023, VOL 13, 2023,
  • [10] Web-Based Visualization and Analysis Framework for Graph Data
    Senturk, Fatmana
    Bilici, Mehmet Ali
    Tanisman, Sezercan
    Aytac, Vecdi
    [J]. SEMANTIC INTELLIGENCE, ISIC 2022, 2023, 964 : 13 - 28