Information extraction from the Web: System and techniques

被引:14
|
作者
Xiao, L [1 ]
Wissmann, D
Brown, M
Jablonski, S
机构
[1] Siemens AG, CT SE 5, D-8520 Erlangen, Germany
[2] Global Transact Ltd, Berlin, Germany
[3] Univ Erlangen Nurnberg, Dept Comp Sci 6, D-8520 Erlangen, Germany
关键词
information extraction; machine learning; knowledge acquisition; internet applications; methodology and design;
D O I
10.1023/B:APIN.0000033637.51909.04
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Information Extraction (IE) systems that can exploit the vast source of textual information that is the internet would provide a revolutionary step forward in terms of delivering large volumes of content cheaply and precisely, thus enabling a wide range of new knowledge driven applications and services. However, despite this enormous potential, few IE systems have successfully made the transition from laboratory to commercial application. The reason may be a purely practical one - to build useable, scaleable IE systems requires bringing together a range of different technologies as well as providing clear and reproducible guidelines as to how to collectively configure and deploy those technologies. This paper is an attempt to address these issues. The paper focuses on two primary goals. Firstly, we show that an information extraction system which is used for real world applications and different domains can be built using some autonomous, corporate components ( agents). Such a system has some advanced properties: clear separation to different extraction tasks and steps, portability to multiple application domain, trainability, extensibility, etc. Secondly, we show that machine learning and, in particular, learning in different ways and at different levels, can be used to build practical IE systems. We show that carefully selecting the right machine learning technique for the right task and selective sampling can be used to reduce the human effort required to annotate examples for building such systems.
引用
收藏
页码:195 / 224
页数:30
相关论文
共 50 条
  • [1] Information Extraction from the Web: System and Techniques
    Luo Xiao
    Dieter Wissmann
    Michael Brown
    Stephan Jablonski
    [J]. Applied Intelligence, 2004, 21 : 195 - 224
  • [2] Profile generation from web sources: an information extraction system
    Ranjan, Rishabh
    Vathsala, H.
    Koolagudi, Shashidhar G.
    [J]. SOCIAL NETWORK ANALYSIS AND MINING, 2022, 12 (01)
  • [3] Profile generation from web sources: an information extraction system
    Rishabh Ranjan
    H. Vathsala
    Shashidhar G. Koolagudi
    [J]. Social Network Analysis and Mining, 2022, 12
  • [4] Web Services for information extraction from the Web
    Habegger, B
    Quafafou, M
    [J]. IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES, PROCEEDINGS, 2004, : 279 - 286
  • [5] STAVIES: A system for information extraction from unknown Web data sources through automatic Web wrapper generation using clustering techniques
    Papadakis, NK
    Skoutas, D
    Raftopoulos, K
    Varvarigou, TA
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (12) : 1638 - 1652
  • [6] Spoken Dialogue System Based on Information Extraction from Web Text
    Yoshino, Koichiro
    Kawahara, Tatsuya
    [J]. SPOKEN DIALOGUE SYSTEMS FOR AMBIENT ENVIRONMENTS, 2010, 6392 : 196 - 197
  • [7] Information Extraction from Web pages
    Novotny, Robert
    Vojtas, Peter
    Maruscak, Dusan
    [J]. 2009 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 3, 2009, : 121 - +
  • [8] Open Information Extraction from the Web
    Banko, Michele
    Cafarella, Michael J.
    Soderland, Stephen
    Broadhead, Matt
    Etzioni, Oren
    [J]. 20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 2670 - 2676
  • [9] Open Information Extraction from the Web
    Etzioni, Oren
    Banko, Michele
    Soderland, Stephen
    Weld, Daniel S.
    [J]. COMMUNICATIONS OF THE ACM, 2008, 51 (12) : 68 - 74
  • [10] Extraction of structural information from the web
    Murata, T
    [J]. FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PT 2, PROCEEDINGS, 2005, 3614 : 1204 - 1207