Building web information extraction tasks

被引：1

作者：

Habegger, B ^{[1
]}

Quafafou, M ^{[1
]}

机构：

[1] Lab Informat Nantes Atlantique, F-44322 Nantes 3, France

来源：

IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2004), PROCEEDINGS | 2004年

关键词：

D O I：

10.1109/WI.2004.10116

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Most recent research in the field of information extraction from the Web has concentrated on the task of extracting the underlying content of a set of similarly structured web pages. However in order to build real-world web information extraction applications this is not sufficient. Indeed, building such applications requires fully automating the access to web sources. This does not just involve the extraction of the data from web pages. There is a need to set up the necessary, infrastructure allowing to query a source, retrieve the result pages, extract the results from these pages and filter out the unwanted results. In this paper we show how such an infrastructure can be set up. We propose to build a web information extraction application by decomposing it into sub-tasks and describing it in an XML based language named WetDL. Each of the sub-tasks consists in applying a web information extraction specific operation onto its input, one of these operators being the application of an extractor By connecting such operations together it is possible to simply define complex applications. This is shown in the paper by applying this approach to real-world information extraction tasks such as extracting DVD listings front Ama-Zon.com, extracting addresses from online telephone directories superpages.corn, etc.

引用

页码：349 / 355

页数：7

共 50 条

[31] Open Information Extraction from the Web
Banko, Michele
Cafarella, Michael J.
Soderland, Stephen
Broadhead, Matt
Etzioni, Oren
[J]. 20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 2670 - 2676
[32] Web Information Extraction for content augmentation
Janevski, A
Dimitrova, N
[J]. IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, : A389 - A392
[33] On validating web information extraction proposals
Jimenez, Patricia
Corchuelo, Rafael
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2022, 199
[34] Web Information Extraction Based on IEBIDTech
Ren, Xiaoyan
Fu, Yunxia
[J]. 2012 WORLD AUTOMATION CONGRESS (WAC), 2012,
[35] Shallow Information Extraction for the Knowledge Web
Barbosa, Denilson
Wang, Haixun
Yu, Cong
[J]. 2013 IEEE 29TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2013, : 1264 - 1267
[36] Open Information Extraction from the Web
Etzioni, Oren
Banko, Michele
Soderland, Stephen
Weld, Daniel S.
[J]. COMMUNICATIONS OF THE ACM, 2008, 51 (12) : 68 - 74
[37] Metabrain: Web Information Extraction and Visualization
Teixeira, Joao
Barata, Gabriel
Goncalves, Daniel
[J]. PROCEEDINGS OF THE INTERNATIONAL WORKING CONFERENCE ON ADVANCED VISUAL INTERFACES, 2012, : 534 - 537
[38] Extraction of structural information from the web
Murata, T
[J]. FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PT 2, PROCEEDINGS, 2005, 3614 : 1204 - 1207
[39] Extraction of building product image from the Web
Nakapan, W
Halin, G
Bignon, JC
Wagner, M
[J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2004, 19 (1-2) : 65 - 78
[40] Users, tasks and the Web: Their impact on the information seeking behavior
Kim, KS
[J]. NATIONAL ONLINE MEETING, PROCEEDINGS 2000, 2000, : 189 - 198

← 1 2 3 4 5 →