Extracting Output Metadata from Scientific Deep Web Data Sources

被引:0
|
作者
Wang, Fan [1 ]
Agrawal, Gagan [1 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
关键词
deep web; schema extraction;
D O I
10.1109/ICDM.2009.41
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Increasingly, many data sources appear as online databases, hidden behind query forms, thus forming the deep web. The popularity of this new medium for data dissemination is leading to new problems in data integration. Particularly, to enable data integration from multiple deep web data sources, one needs to obtain the metadata for each of the data sources. Obtaining the metadata, particularly, the output schema, can be very challenging. This is because, given an input query, many deep web data sources only return a subset of the output schema attributes, i.e, the ones that have a non-NULL value for the corresponding input. In this paper, we propose two approaches, which are the sampling model approach and the mixture model approach, respectively, to efficiently obtain an approximately complete set of output schema attributes from a deep web data source. Our experiments show while each of the above two approaches has limitations, a hybrid strategy, where we combine the two approaches, achieves high recall with good precision for most data sources.
引用
收藏
页码:552 / 561
页数:10
相关论文
共 50 条
  • [21] Metadata based framework for extracting and using web sites structures
    Information Systems Lab, Hiroshima, Japan
    Int Conf Multimedia Comput Syst Proc, (51-56):
  • [22] The World Conversation: Web Page Metadata Generation From Social Sources
    Alonso, Omar
    Khandelwal, Kartikay
    Bannur, Sushma
    Kalyanaraman, Shankar
    WWW'15 COMPANION: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2015, : 385 - 395
  • [23] A metadata based framework for extracting and using Web sites structures
    Liechti, O
    Sifer, M
    Ichikawa, T
    IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS, PROCEEDINGS VOL 2, 1999, : 51 - 56
  • [24] Automatic Data Records Extraction from List Page in Deep Web Sources
    Chen Hong-ping
    Fang Wei
    Yang Zhou
    Zhuo Lin
    Cui Zhi-Ming
    2009 ASIA-PACIFIC CONFERENCE ON INFORMATION PROCESSING (APCIP 2009), VOL 1, PROCEEDINGS, 2009, : 370 - 373
  • [25] DSSM: A Data Sources Selection Model for Deep Web
    Qu, Zhendong
    Shen, Derong
    Yu, Ge
    Kou, Yue
    Nie, Tiezheng
    2009 SIXTH WEB INFORMATION SYSTEMS AND APPLICATIONS CONFERENCE, PROCEEDINGS, 2009, : 163 - 168
  • [26] The Ranking of Deep Web Sources Based on Data Quality
    Yin, Hu
    Lv, Yunfei
    Wang, Weiwei
    SENSORS, MEASUREMENT AND INTELLIGENT MATERIALS, PTS 1-4, 2013, 303-306 : 2437 - +
  • [27] YConcept of metadata in scientific publications and the way from data to information
    Bögel, H
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2003, 225 : U560 - U560
  • [28] Data extraction from Web data sources
    Robinson, J
    15TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2004, : 282 - 288
  • [29] An automatic web wrapper for extracting information from web sources, using clustering techniques
    Papadakis, N
    Skoutas, D
    Raftopoulos, K
    Varvarigou, T
    2005 SYMPOSIUM ON APPLICATIONS AND THE INTERNET, PROCEEDINGS, 2005, : 24 - 30
  • [30] Extracting users' interests from web log data
    Murata, Tsuyoshi
    Saito, Kota
    2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 343 - +