Learning knowledge bases for information extraction from multiple text based web sites

被引:0
|
作者
Gao, XY [1 ]
Zhang, MJ [1 ]
机构
[1] Victoria Univ Wellington, Sch Math & Comp Sci, Wellington, New Zealand
关键词
information extraction; learning; knowledge unit frame; text-based web sites; semi-structured data;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We describe a learning approach to automatically building knowledge bases for information extraction from multiple text based web pages. A frame based representation is introduced to represent domain knowledge as knowledge unit frames. A frame learning algorithm is developed to automatically learn knowledge unit frames from training examples. Some training examples can be obtained by automatically parsing a number of tabular web pages in the same domain, which greatly reduced the time consuming manual work. This approach was investigated on ten web sites of real estate advertisements and car advertisements and nearly all the information was successfully extracted with very few false alarms. These results suggest that both the knowledge unit frame representation and the frame learning algorithm work well, domain specific knowledge base can be learned from training examples, and the domain specific knowledge base can be used for information extraction from flexible text-based semi-structured Web pages on multiple Web sites.
引用
收藏
页码:119 / 125
页数:7
相关论文
共 50 条
  • [1] Adapting information extraction knowledge for unseen web sites
    Wong, TL
    Lam, W
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2002, : 506 - 513
  • [2] Action Knowledge Extraction from Web Text
    Ge, Ansheng
    Mao, Wenji
    Zeng, Daniel
    Wang, Lei
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS: BIG DATA, EMERGENT THREATS, AND DECISION-MAKING IN SECURITY INFORMATICS, 2013, : 368 - 370
  • [3] Spoken Dialogue System Based on Information Extraction from Web Text
    Yoshino, Koichiro
    Kawahara, Tatsuya
    [J]. SPOKEN DIALOGUE SYSTEMS FOR AMBIENT ENVIRONMENTS, 2010, 6392 : 196 - 197
  • [4] Web text information extraction based on wrapper model
    Wang, Jingpu
    Lin, Yaping
    Zhou, Shunxian
    [J]. 2005 International Symposium on Computer Science and Technology, Proceedings, 2005, : 607 - 612
  • [5] Information Extraction based on Information Fusion from Multiple News Sources from the Web
    Lv, Yang
    Ng, Wing W. Y.
    Lee, John W. T.
    Sun, Binbin
    Yeung, Daniel S.
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), VOLS 1-6, 2008, : 1470 - +
  • [6] Earthquake Information Extraction and Comparison from Different Sources Based on Web Text
    Han, Xuehua
    Wang, Juanle
    [J]. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2019, 8 (06)
  • [7] Automatic Extraction of Product Information from Multiple e-Commerce Web Sites
    Nasti, Samiah Jan
    Asger, M.
    Butt, Muheet Ahmad
    [J]. PROCEEDINGS OF RECENT INNOVATIONS IN COMPUTING, ICRIC 2019, 2020, 597 : 739 - 747
  • [8] Learning Knowledge Bases for Text and Multimedia
    Xie, Lexing
    Wang, Haixun
    [J]. PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 1235 - 1236
  • [9] Challenges in information extraction from text for knowledge management
    Ciravegna, F
    [J]. IEEE INTELLIGENT SYSTEMS, 2001, 16 (06) : 88 - 90
  • [10] A Template-Based Information Extraction from Web Sites with Unstable Markup
    Kolchin, Maxim
    Kozlov, Fedor
    [J]. SEMANTIC WEB EVALUATION CHALLENGE, 2014, 475 : 89 - 94