Extracting Output Metadata from Scientific Deep Web Data Sources

被引：0

作者：

Wang, Fan ^{[1
]}

Agrawal, Gagan ^{[1
]}

机构：

[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA

来源：

2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING | 2009年

关键词：

deep web; schema extraction;

D O I：

10.1109/ICDM.2009.41

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Increasingly, many data sources appear as online databases, hidden behind query forms, thus forming the deep web. The popularity of this new medium for data dissemination is leading to new problems in data integration. Particularly, to enable data integration from multiple deep web data sources, one needs to obtain the metadata for each of the data sources. Obtaining the metadata, particularly, the output schema, can be very challenging. This is because, given an input query, many deep web data sources only return a subset of the output schema attributes, i.e, the ones that have a non-NULL value for the corresponding input. In this paper, we propose two approaches, which are the sampling model approach and the mixture model approach, respectively, to efficiently obtain an approximately complete set of output schema attributes from a deep web data source. Our experiments show while each of the above two approaches has limitations, a hybrid strategy, where we combine the two approaches, achieves high recall with good precision for most data sources.

引用

页码：552 / 561

页数：10

共 50 条

[41] Disambiguation data: Extracting information from anonymized sources
Dreiseitl, S
Vinterbo, S
Ohno-Machado, L
[J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2001, : 144 - 148
[42] Mining the Web for generating thematic metadata from textual data
Huang, CC
Chuang, SL
Chien, LF
[J]. 20TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2004, : 834 - 834
[43] Research metadata on the Web: Selected geospatial data and metadata directories
Haas, S
[J]. ELECTRONIC INFORMATION AND PUBLICATIONS: LOOKING TO THE ELECTRONIC FUTURE, LET'S NOT FORGET THE ARCHIVAL PAST, 1999, : 131 - 148
[44] Extracting Provenance Metadata from Privacy Policies
Pandit, Harshvardhan Jitendra
O'Sullivan, Declan
Lewis, Dave
[J]. PROVENANCE AND ANNOTATION OF DATA AND PROCESSES, IPAW 2018, 2018, 11017 : 262 - 265
[45] Extracting Greater Value From Scientific Data: An Optimized Approach
Brown, Frank
[J]. AMERICAN LABORATORY, 2009, 41 (10) : 18 - +
[46] Extracting Material Property Measurement Data from Scientific Articles
Panapitiya, Gihan
Parks, Fred
Sepulveda, Jonathan
Saldanha, Emily
[J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 5393 - 5402
[47] Automatic generation of data types for classification of Deep Web sources
Ngu, AHH
Buttler, D
Critchlow, T
[J]. DATA INTEGRATION IN THE LIFE SCIENCES, PROCEEDINGS, 2005, 3615 : 266 - 274
[48] Web-Scale Normalization of Geospatial Metadata Based on Semantics-Aware Data Sources
Fugazza, Cristiano
Tagliolato, Paolo
Frigerio, Luca
Carrara, Paola
[J]. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2017, 6 (11)
[49] Ontology-Based Deep Web Data Sources Selection
Fang, Wei
Hu, Pengyu
Zhao, Pengpeng
Cui, Zhiming
[J]. HYBRID ARTIFICIAL INTELLIGENCE SYSTEMS, 2008, 5271 : 483 - 490
[50] A duplicate records identification model for deep web data sources
Shen, De-Rong
Liu, Li-Nan
Kou, Yue
Nie, Tie-Zheng
Yu, Ge
[J]. Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2010, 38 (02): : 275 - 281

← 1 2 3 4 5 →