Extracting Output Metadata from Scientific Deep Web Data Sources

被引：0

作者：

Wang, Fan ^{[1
]}

Agrawal, Gagan ^{[1
]}

机构：

[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA

来源：

2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING | 2009年

关键词：

deep web; schema extraction;

D O I：

10.1109/ICDM.2009.41

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Increasingly, many data sources appear as online databases, hidden behind query forms, thus forming the deep web. The popularity of this new medium for data dissemination is leading to new problems in data integration. Particularly, to enable data integration from multiple deep web data sources, one needs to obtain the metadata for each of the data sources. Obtaining the metadata, particularly, the output schema, can be very challenging. This is because, given an input query, many deep web data sources only return a subset of the output schema attributes, i.e, the ones that have a non-NULL value for the corresponding input. In this paper, we propose two approaches, which are the sampling model approach and the mixture model approach, respectively, to efficiently obtain an approximately complete set of output schema attributes from a deep web data source. Our experiments show while each of the above two approaches has limitations, a hybrid strategy, where we combine the two approaches, achieves high recall with good precision for most data sources.

引用

页码：552 / 561

页数：10

共 50 条

[21] Metadata based framework for extracting and using web sites structures
Information Systems Lab, Hiroshima, Japan
Int Conf Multimedia Comput Syst Proc, (51-56):
[22] The World Conversation: Web Page Metadata Generation From Social Sources
Alonso, Omar
Khandelwal, Kartikay
Bannur, Sushma
Kalyanaraman, Shankar
WWW'15 COMPANION: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2015, : 385 - 395
[23] A metadata based framework for extracting and using Web sites structures
Liechti, O
Sifer, M
Ichikawa, T
IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS, PROCEEDINGS VOL 2, 1999, : 51 - 56
[24] Automatic Data Records Extraction from List Page in Deep Web Sources
Chen Hong-ping
Fang Wei
Yang Zhou
Zhuo Lin
Cui Zhi-Ming
2009 ASIA-PACIFIC CONFERENCE ON INFORMATION PROCESSING (APCIP 2009), VOL 1, PROCEEDINGS, 2009, : 370 - 373
[25] DSSM: A Data Sources Selection Model for Deep Web
Qu, Zhendong
Shen, Derong
Yu, Ge
Kou, Yue
Nie, Tiezheng
2009 SIXTH WEB INFORMATION SYSTEMS AND APPLICATIONS CONFERENCE, PROCEEDINGS, 2009, : 163 - 168
[26] The Ranking of Deep Web Sources Based on Data Quality
Yin, Hu
Lv, Yunfei
Wang, Weiwei
SENSORS, MEASUREMENT AND INTELLIGENT MATERIALS, PTS 1-4, 2013, 303-306 : 2437 - +
[27] YConcept of metadata in scientific publications and the way from data to information
Bögel, H
ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2003, 225 : U560 - U560
[28] Data extraction from Web data sources
Robinson, J
15TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2004, : 282 - 288
[29] An automatic web wrapper for extracting information from web sources, using clustering techniques
Papadakis, N
Skoutas, D
Raftopoulos, K
Varvarigou, T
2005 SYMPOSIUM ON APPLICATIONS AND THE INTERNET, PROCEEDINGS, 2005, : 24 - 30
[30] Extracting users' interests from web log data
Murata, Tsuyoshi
Saito, Kota
2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 343 - +

← 1 2 3 4 5 →