Building web information extraction tasks

被引：1

作者：

Habegger, B ^{[1
]}

Quafafou, M ^{[1
]}

机构：

[1] Lab Informat Nantes Atlantique, F-44322 Nantes 3, France

来源：

IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2004), PROCEEDINGS | 2004年

关键词：

D O I：

10.1109/WI.2004.10116

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Most recent research in the field of information extraction from the Web has concentrated on the task of extracting the underlying content of a set of similarly structured web pages. However in order to build real-world web information extraction applications this is not sufficient. Indeed, building such applications requires fully automating the access to web sources. This does not just involve the extraction of the data from web pages. There is a need to set up the necessary, infrastructure allowing to query a source, retrieve the result pages, extract the results from these pages and filter out the unwanted results. In this paper we show how such an infrastructure can be set up. We propose to build a web information extraction application by decomposing it into sub-tasks and describing it in an XML based language named WetDL. Each of the sub-tasks consists in applying a web information extraction specific operation onto its input, one of these operators being the application of an extractor By connecting such operations together it is possible to simply define complex applications. This is shown in the paper by applying this approach to real-world information extraction tasks such as extracting DVD listings front Ama-Zon.com, extracting addresses from online telephone directories superpages.corn, etc.

引用

页码：349 / 355

页数：7

共 50 条

[1] Mining information from sentences through Semantic Web data and Information Extraction tasks
Martinez-Rodriguez, Jose L.
Lopez-Arevalo, Ivan
Rios-Alvarado, Ana B.
[J]. JOURNAL OF INFORMATION SCIENCE, 2022, 48 (01) : 3 - 20
[2] Building Robust Geospatial Web Services for Agricultural Information Extraction and Sharing
Sun, Ziheng
Di, Liping
Zhang, Chen
Fang, Hui
Yu, Eugene
Lin, Li
Tang, Junmei
Tan, Xicheng
Liu, Ziao
Jiang, Lili
Guo, Liying
Chen, Zhongxin
Yue, Peng
[J]. 2017 6TH INTERNATIONAL CONFERENCE ON AGRO-GEOINFORMATICS, 2017, : 280 - 283
[3] A Method of Web Information Extraction Based on Building Different Sub Trees
Wang, Yuanlong
Jiang, Hong
Bing, Zhaohong
Zhang, Li
[J]. MANUFACTURING PROCESS AND EQUIPMENT, PTS 1-4, 2013, 694-697 : 2513 - +
[4] Building Web information systems
White, C
[J]. BYTE, 1998, 23 (07): : A1 - +
[5] A method for web information extraction
Lam, Man I.
Gong, Zhiguo
Muyeba, Maybin
[J]. PROGRESS IN WWW RESEARCH AND DEVELOPMENT, PROCEEDINGS, 2008, 4976 : 383 - +
[6] Web Services for information extraction from the Web
Habegger, B
Quafafou, M
[J]. IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES, PROCEEDINGS, 2004, : 279 - 286
[7] Information extraction for the semantic web
Baumgartner, R
Eiter, T
Gottlob, G
Herzog, M
Koch, C
[J]. REASONING WEB, 2005, 3564 : 275 - 289
[8] Building web information systems using Web services
Frasincar, Flavius
Houben, Geert-Jan
Barna, Peter
[J]. 2006 SEVENTH INTERNATIONAL BALTIC CONFERENCE ON DATABASES AND INFORMATION SYSTEMS - PROCEEDINGS, 2006, : 187 - +
[9] Web text corpus extraction system for linguistic tasks
Cadavid Rengifo, Hector Fabio
Gomez Perdomo, Jonatan
[J]. INGENIERIA E INVESTIGACION, 2009, 29 (03): : 54 - 60
[10] An architecture for building user-driven web tasks via web services
Lu, J
Chen, LH
[J]. E-COMMERCE AND WEB TECHNOLOGIES, PROCEEDINGS, 2002, 2455 : 77 - 86

← 1 2 3 4 5 →