Leveraging COUNT Information in Sampling Hidden Databases

被引:20
|
作者
Dasgupta, Arjun [1 ]
Zhang, Nan [2 ]
Das, Gautam [1 ]
机构
[1] Univ Texas Arlington, Arlington, TX 76019 USA
[2] George Washington Univ, Washington, DC 20052 USA
基金
美国国家科学基金会;
关键词
D O I
10.1109/ICDE.2009.112
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
A large number of online databases are hidden behind form-like interfaces which allow users to execute search queries by specifying selection conditions in the interface. Most of these interfaces return restricted answers (e.g., only top-k of the selected tuples), while many of them also accompany each answer with the COUNT of the selected tuples. In this paper, we propose techniques which leverage the COUNT information to efficiently acquire unbiased samples of the hidden database. We also discuss variants for interfaces which do not provide COUNT information. We conduct extensive experiments to illustrate the efficiency and accuracy of our techniques.
引用
收藏
页码:329 / +
页数:2
相关论文
共 50 条
  • [1] Sampling, information extraction and summarisation of Hidden Web databases
    Hedley, Yih-Ling
    Younas, Muhammad
    James, Anne
    Sanderson, Mark
    [J]. DATA & KNOWLEDGE ENGINEERING, 2006, 59 (02) : 213 - 230
  • [2] Probe, count, and classify: Categorizing hidden-web databases
    Ipeirotis, PG
    Gravano, L
    Sahami, M
    [J]. SIGMOD RECORD, 2001, 30 (02) : 67 - 78
  • [3] Probability Model Based Hidden Databases Sampling Approach
    Tian Jian-Wei
    Li Shi-Jun
    Lu Qi
    [J]. 2008 4TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-31, 2008, : 11072 - 11075
  • [4] Leveraging Graph Databases for Automated OPC UA Information Model Construction
    Yu, Rongdong
    Zhong, Yaoyi
    Xu, Yunliang
    Wen, Jie
    Pan, Quanhong
    Sha, Wanli
    Wang, Zhan
    [J]. PROCEEDINGS OF 2023 THE 12TH INTERNATIONAL CONFERENCE ON NETWORKS, COMMUNICATION AND COMPUTING, ICNCC 2023, 2023, : 294 - 299
  • [5] Accelerating Information Retrieval from Profile Hidden Markov Model Databases
    Tamimi, Ahmad
    Ashhab, Yaqoub
    Tamimi, Hashem
    [J]. PLOS ONE, 2016, 11 (11):
  • [6] GENE MAPPERS COUNT ON DATABASES
    WATTS, S
    [J]. NEW SCIENTIST, 1989, 124 (1688) : 33 - 33
  • [7] COUNT SAMPLING IN FORESTRY
    SCHREUDER, HT
    [J]. FOREST SCIENCE, 1978, 24 (02) : 267 - 272
  • [8] A two-phase sampling technique to improve the accuracy of text similarities in the categorisation of hidden web databases
    Hedley, YL
    Younas, M
    James, A
    Sanderson, M
    [J]. WEB INFORMATION SYSTEMS - WISE 2004, PROCEEDINGS, 2004, 3306 : 516 - 527
  • [9] TO COUNT OR NOT TO COUNT - STOCKTAKING BY RANDOM STATISTICAL SAMPLING
    SMITH, RM
    BLACKWELL, MTR
    [J]. MANAGEMENT SERVICES IN GOVERNMENT, 1980, 35 (01): : 41 - 50
  • [10] Leveraging the Information from Markov State Models To Improve the Convergence of Umbrella Sampling Simulations
    Jo, Sunhwan
    Suh, Donghyuk
    He, Ziwei
    Chipot, Christophe
    Roux, Benoit
    [J]. JOURNAL OF PHYSICAL CHEMISTRY B, 2016, 120 (33): : 8733 - 8742