Extracting Output Metadata from Scientific Deep Web Data Sources

被引:0
|
作者
Wang, Fan [1 ]
Agrawal, Gagan [1 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
关键词
deep web; schema extraction;
D O I
10.1109/ICDM.2009.41
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Increasingly, many data sources appear as online databases, hidden behind query forms, thus forming the deep web. The popularity of this new medium for data dissemination is leading to new problems in data integration. Particularly, to enable data integration from multiple deep web data sources, one needs to obtain the metadata for each of the data sources. Obtaining the metadata, particularly, the output schema, can be very challenging. This is because, given an input query, many deep web data sources only return a subset of the output schema attributes, i.e, the ones that have a non-NULL value for the corresponding input. In this paper, we propose two approaches, which are the sampling model approach and the mixture model approach, respectively, to efficiently obtain an approximately complete set of output schema attributes from a deep web data source. Our experiments show while each of the above two approaches has limitations, a hybrid strategy, where we combine the two approaches, achieves high recall with good precision for most data sources.
引用
收藏
页码:552 / 561
页数:10
相关论文
共 50 条
  • [1] Visually Extracting Data Records from the Deep Web
    Anderson, Neil
    Hong, Jun
    [J]. PROCEEDINGS OF THE 22ND INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'13 COMPANION), 2013, : 1233 - 1238
  • [2] Hidden Web Query Technique for Extracting the Data From Deep Web Data Base
    Das, Nripendra Narayan
    Kumar, Ela
    [J]. WORLD CONGRESS ON ENGINEERING AND COMPUTER SCIENCE, WCECS 2012, VOL I, 2012, : 410 - 414
  • [3] SEEDEEP: A System for Exploring and Querying Scientific Deep Web Data Sources
    Wang, Fan
    Agrawal, Gagan
    [J]. SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, PROCEEDINGS, 2009, 5566 : 74 - 82
  • [4] Extracting metadata from biological experimental data
    Al-Daihani, Badr
    Gray, Alex
    Kille, Peter
    [J]. SEVENTEENTH INTERNATIONAL CONFERENCE ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2006, : 216 - +
  • [5] Automated metadata extraction from web sources
    Yahaya, Nor Adnan
    Buang, Rosiza
    [J]. 2006 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, Workshops Proceedings, 2006, : 176 - 179
  • [6] Semantic Deep Web: Automatic Attribute Extraction from the Deep Web Data Sources
    An, Yoo Jung
    Geller, James
    Wu, Yi-Ta
    Chun, Soon Ae
    [J]. APPLIED COMPUTING 2007, VOL 1 AND 2, 2007, : 1667 - 1672
  • [7] A Novel Method for Extracting Entity Data from Deep Web Precisely
    Yu Hai-tao
    Guo Jian-yi
    Yu Zheng-tao
    Xian Yan-tuan
    Yan Xin
    [J]. 26TH CHINESE CONTROL AND DECISION CONFERENCE (2014 CCDC), 2014, : 5049 - 5053
  • [8] A Framework for Extracting Information from Semi-Structured Web Data Sources
    Shaker, Malunoud
    Ibrahim, Hamidah
    Mustapha, Aida
    Abdullah, Lili Nurliyana
    [J]. THIRD 2008 INTERNATIONAL CONFERENCE ON CONVERGENCE AND HYBRID INFORMATION TECHNOLOGY, VOL 1, PROCEEDINGS, 2008, : 27 - 31
  • [9] UNIVERSALEXTRACT - EXTRACTING DEEP WEB DATA USING ONTOLOGY
    Hong, Jer Lang
    Yin, Brian Ho Hoe
    [J]. UNCERTAINTY MODELLING IN KNOWLEDGE ENGINEERING AND DECISION MAKING, 2016, 10 : 377 - 383
  • [10] Intelligent search technology for extracting scientific data/documents on web
    Lu, ZY
    Rahman, U
    [J]. IC'04: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INTERNET COMPUTING, VOLS 1 AND 2, 2004, : 220 - 226