Building web information extraction tasks

被引:1
|
作者
Habegger, B [1 ]
Quafafou, M [1 ]
机构
[1] Lab Informat Nantes Atlantique, F-44322 Nantes 3, France
关键词
D O I
10.1109/WI.2004.10116
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most recent research in the field of information extraction from the Web has concentrated on the task of extracting the underlying content of a set of similarly structured web pages. However in order to build real-world web information extraction applications this is not sufficient. Indeed, building such applications requires fully automating the access to web sources. This does not just involve the extraction of the data from web pages. There is a need to set up the necessary, infrastructure allowing to query a source, retrieve the result pages, extract the results from these pages and filter out the unwanted results. In this paper we show how such an infrastructure can be set up. We propose to build a web information extraction application by decomposing it into sub-tasks and describing it in an XML based language named WetDL. Each of the sub-tasks consists in applying a web information extraction specific operation onto its input, one of these operators being the application of an extractor By connecting such operations together it is possible to simply define complex applications. This is shown in the paper by applying this approach to real-world information extraction tasks such as extracting DVD listings front Ama-Zon.com, extracting addresses from online telephone directories superpages.corn, etc.
引用
收藏
页码:349 / 355
页数:7
相关论文
共 50 条
  • [1] Mining information from sentences through Semantic Web data and Information Extraction tasks
    Martinez-Rodriguez, Jose L.
    Lopez-Arevalo, Ivan
    Rios-Alvarado, Ana B.
    [J]. JOURNAL OF INFORMATION SCIENCE, 2022, 48 (01) : 3 - 20
  • [2] Building Robust Geospatial Web Services for Agricultural Information Extraction and Sharing
    Sun, Ziheng
    Di, Liping
    Zhang, Chen
    Fang, Hui
    Yu, Eugene
    Lin, Li
    Tang, Junmei
    Tan, Xicheng
    Liu, Ziao
    Jiang, Lili
    Guo, Liying
    Chen, Zhongxin
    Yue, Peng
    [J]. 2017 6TH INTERNATIONAL CONFERENCE ON AGRO-GEOINFORMATICS, 2017, : 280 - 283
  • [3] A Method of Web Information Extraction Based on Building Different Sub Trees
    Wang, Yuanlong
    Jiang, Hong
    Bing, Zhaohong
    Zhang, Li
    [J]. MANUFACTURING PROCESS AND EQUIPMENT, PTS 1-4, 2013, 694-697 : 2513 - +
  • [4] Building Web information systems
    White, C
    [J]. BYTE, 1998, 23 (07): : A1 - +
  • [5] A method for web information extraction
    Lam, Man I.
    Gong, Zhiguo
    Muyeba, Maybin
    [J]. PROGRESS IN WWW RESEARCH AND DEVELOPMENT, PROCEEDINGS, 2008, 4976 : 383 - +
  • [6] Web Services for information extraction from the Web
    Habegger, B
    Quafafou, M
    [J]. IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES, PROCEEDINGS, 2004, : 279 - 286
  • [7] Information extraction for the semantic web
    Baumgartner, R
    Eiter, T
    Gottlob, G
    Herzog, M
    Koch, C
    [J]. REASONING WEB, 2005, 3564 : 275 - 289
  • [8] Building web information systems using Web services
    Frasincar, Flavius
    Houben, Geert-Jan
    Barna, Peter
    [J]. 2006 SEVENTH INTERNATIONAL BALTIC CONFERENCE ON DATABASES AND INFORMATION SYSTEMS - PROCEEDINGS, 2006, : 187 - +
  • [9] Web text corpus extraction system for linguistic tasks
    Cadavid Rengifo, Hector Fabio
    Gomez Perdomo, Jonatan
    [J]. INGENIERIA E INVESTIGACION, 2009, 29 (03): : 54 - 60
  • [10] An architecture for building user-driven web tasks via web services
    Lu, J
    Chen, LH
    [J]. E-COMMERCE AND WEB TECHNOLOGIES, PROCEEDINGS, 2002, 2455 : 77 - 86