A probabilistic ranking framework for web-based relational data imputation

被引:2
|
作者
Chen, Zhaoqiang [1 ]
Chen, Qun [1 ]
Li, Jiajun [1 ]
Li, Zhanhuai [1 ]
Chen, Lei [2 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci & Engn, Xian 710072, Peoples R China
[2] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Web-based relational data imputation; Missing attribute values; Probabilistic ranking;
D O I
10.1016/j.ins.2016.03.036
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Due to richness of information on web, there is an increasing interest to search for missing attribute values in relational data on web. Web-based relational data imputation has to first extract multiple candidate values from web and then rank them by their matching probabilities. However, effective candidate ranking remains challenging because web documents are unstructured and popular search engines can only provide with relevant but not necessarily semantically matching information. In this paper, we propose a novel probabilistic approach for ranking the web-retrieved candidate values. It can integrate various influence factors, e.g. snippet rank order, occurrence frequency, occurrence pattern, and keyword proximity, in a single framework by semantic reasoning. The proposed framework consists of snippet influence model and semantic matching model. The snippet influence model measures the influence of a snippet, and the semantic matching model measures the semantic similarity between a candidate value in a snippet and a missing relational value in a tuple. We also present effective probabilistic estimation solutions for both models. Finally, we empirically evaluate the performance of the proposed framework on real datasets. Our extensive experiments demonstrate that it outperforms the state-of-the-art techniques by considerable margins on imputation accuracy. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:152 / 168
页数:17
相关论文
共 50 条
  • [21] ISCASMC: A Web-Based Probabilistic Model Checker
    Hahn, Ernst Moritz
    Li, Yi
    Schewe, Sven
    Turrini, Andrea
    Zhang, Lijun
    [J]. FM 2014: FORMAL METHODS, 2014, 8442 : 312 - 317
  • [22] Online Parameter Selection for Web-based Ranking Problems
    Agarwal, Deepak
    Basu, Kinjal
    Ghosh, Souvik
    Xuan, Ying
    Yang, Yang
    Zhang, Liang
    [J]. KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 23 - 32
  • [23] Web-based data analytics framework for well forecasting and groundwater quality
    Sit, Muhammed
    Langel, Richard J.
    Thompson, Darrin
    Cwiertny, David M.
    Demir, Ibrahim
    [J]. SCIENCE OF THE TOTAL ENVIRONMENT, 2021, 761
  • [24] BigExcel: A Web-Based Framework for Exploring Big Data in Social Sciences
    Saleem, Muhammed Asif
    Varghese, Blesson
    Barker, Adam
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014,
  • [25] Development of web-based framework oriented heterogeneous geochemistry data integration
    Dong, S.
    Yin, H.
    Wang, Z.
    [J]. GEOCHIMICA ET COSMOCHIMICA ACTA, 2008, 72 (12) : A224 - A224
  • [26] A Design for Availability Process Framework with Field Data and Web-based Tools
    van der Burgt, Olaf
    Okaro, Ikenna Anthony
    [J]. 2023 ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM, RAMS, 2023,
  • [27] FuseViz: A Framework for Web-based Data Fusion and Visualization in Smart Environments
    Ghidini, Giacomo
    Das, Sajal K.
    Gupta, Vipul
    [J]. 9TH IEEE INTERNATIONAL CONFERENCE ON MOBILE AD-HOC AND SENSOR SYSTEMS (MASS 2012), 2012, : 468 - 472
  • [28] Towards a hypermedia-enabled and web-based data analysis Framework
    Chiu, CM
    [J]. JOURNAL OF INFORMATION SCIENCE, 2004, 30 (01) : 60 - 72
  • [29] On the Construction of Framework of Web-based Atlas
    Liu, Yi
    Rop, Mark K.
    Wimberly, Michael C.
    [J]. PROCEEDINGS OF THE 48TH ANNUAL SOUTHEAST REGIONAL CONFERENCE (ACM SE 10), 2010, : 433 - 434
  • [30] A reusable framework for Web-based teleoperation
    Ghiasi, S
    Keaton, D
    Seidl, M
    Zorn, B
    [J]. TELEMANIPULATOR AND TELEPRESENCE TECHNOLOGIES V, 1998, 3524 : 218 - 229