The Ranking of Deep Web Sources Based on Data Quality

被引:0
|
作者
Yin, Hu [1 ]
Lv, Yunfei [2 ]
Wang, Weiwei [2 ]
机构
[1] 719 Inst Technol Wuhan, Wuhan, Peoples R China
[2] Wuhan Second Ship Design Inst, Wuhan, Peoples R China
关键词
Sampling estimates; Data quality; Quality Vector; Deep Web ranking;
D O I
10.4028/www.scientific.net/AMM.303-306.2437
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Deep Web technology makes a large number of useful information which hidden behind the interface easier to be found by users. However, with the increase of data source, how to find a suitable result quickly from a number of sources is becoming more and more important. In this paper, we start discussing from the quality of the data, setting 6 quality standards for the data source and giving the method of calculation. Meanwhile, we solve corresponding weight vector of quality standards by the feeling of the users; and based on this quality standards, we calculate a random data source according to weight vector to gain a general score. Then this paper discusses the sampling theory and proposes a reasonable sampling method for the experiment. The experiment result shows that it is of good veracity and operability to evaluate and score the data quality of data source according to sampling analysis.
引用
收藏
页码:2437 / +
页数:2
相关论文
共 50 条
  • [21] A survey of approaches for ranking on the web of data
    Antonio J. Roa-Valverde
    Miguel-Angel Sicilia
    Information Retrieval, 2014, 17 : 295 - 325
  • [22] Web data as academic and business quality estimates: A comparison of three data sources
    Vaughan, Liwen
    Yang, Rongbin
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2012, 63 (10): : 1960 - 1972
  • [23] A survey of approaches for ranking on the web of data
    Roa-Valverde, Antonio J.
    Sicilia, Miguel-Angel
    INFORMATION RETRIEVAL, 2014, 17 (04): : 295 - 325
  • [24] Extracting Output Metadata from Scientific Deep Web Data Sources
    Wang, Fan
    Agrawal, Gagan
    2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2009, : 552 - 561
  • [25] Selecting queries from sample to crawl deep web data sources
    Wang, Yan
    Lu, Jianguo
    Liang, Jie
    Chen, Jessica
    Liu, Jiming
    Web Intelligence and Agent Systems, 2012, 10 (01): : 75 - 88
  • [26] Partitioning and Ranking Tagged Data Sources
    Eftekhar, Milad
    Koudas, Nick
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (04): : 229 - 240
  • [27] SEEDEEP: A System for Exploring and Querying Scientific Deep Web Data Sources
    Wang, Fan
    Agrawal, Gagan
    SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, PROCEEDINGS, 2009, 5566 : 74 - 82
  • [28] A probabilistic ranking framework for web-based relational data imputation
    Chen, Zhaoqiang
    Chen, Qun
    Li, Jiajun
    Li, Zhanhuai
    Chen, Lei
    INFORMATION SCIENCES, 2016, 355 : 152 - 168
  • [29] Ontology-based focused crawling of Deep Web sources
    Fang, Wei
    Cui, Zhiming
    Zhao, Pengpeng
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, 2007, 4798 : 514 - 519
  • [30] Heterogeneous Data Sources Integration based on Web Services
    Zhu Tao
    Chang Guo-cen
    Guo Rong-xiao
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE INFORMATION COMPUTING AND AUTOMATION, VOLS 1-3, 2008, : 1458 - 1461