Learning knowledge bases for information extraction from multiple text based web sites

被引：0

作者：

Gao, XY ^{[1
]}

Zhang, MJ ^{[1
]}

机构：

[1] Victoria Univ Wellington, Sch Math & Comp Sci, Wellington, New Zealand

来源：

IEEE/WIC INTERNATIONAL CONFERENCE ON INTELLIGENT AGENT TECHNOLOGY, PROCEEDINGS | 2003年

关键词：

information extraction; learning; knowledge unit frame; text-based web sites; semi-structured data;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We describe a learning approach to automatically building knowledge bases for information extraction from multiple text based web pages. A frame based representation is introduced to represent domain knowledge as knowledge unit frames. A frame learning algorithm is developed to automatically learn knowledge unit frames from training examples. Some training examples can be obtained by automatically parsing a number of tabular web pages in the same domain, which greatly reduced the time consuming manual work. This approach was investigated on ten web sites of real estate advertisements and car advertisements and nearly all the information was successfully extracted with very few false alarms. These results suggest that both the knowledge unit frame representation and the frame learning algorithm work well, domain specific knowledge base can be learned from training examples, and the domain specific knowledge base can be used for information extraction from flexible text-based semi-structured Web pages on multiple Web sites.

引用

页码：119 / 125

页数：7

共 50 条

[1] Adapting information extraction knowledge for unseen web sites
Wong, TL
Lam, W
[J]. 2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2002, : 506 - 513
[2] Action Knowledge Extraction from Web Text
Ge, Ansheng
Mao, Wenji
Zeng, Daniel
Wang, Lei
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS: BIG DATA, EMERGENT THREATS, AND DECISION-MAKING IN SECURITY INFORMATICS, 2013, : 368 - 370
[3] Spoken Dialogue System Based on Information Extraction from Web Text
Yoshino, Koichiro
Kawahara, Tatsuya
[J]. SPOKEN DIALOGUE SYSTEMS FOR AMBIENT ENVIRONMENTS, 2010, 6392 : 196 - 197
[4] Web text information extraction based on wrapper model
Wang, Jingpu
Lin, Yaping
Zhou, Shunxian
[J]. 2005 International Symposium on Computer Science and Technology, Proceedings, 2005, : 607 - 612
[5] Information Extraction based on Information Fusion from Multiple News Sources from the Web
Lv, Yang
Ng, Wing W. Y.
Lee, John W. T.
Sun, Binbin
Yeung, Daniel S.
[J]. 2008 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), VOLS 1-6, 2008, : 1470 - +
[6] Earthquake Information Extraction and Comparison from Different Sources Based on Web Text
Han, Xuehua
Wang, Juanle
[J]. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2019, 8 (06)
[7] Automatic Extraction of Product Information from Multiple e-Commerce Web Sites
Nasti, Samiah Jan
Asger, M.
Butt, Muheet Ahmad
[J]. PROCEEDINGS OF RECENT INNOVATIONS IN COMPUTING, ICRIC 2019, 2020, 597 : 739 - 747
[8] Learning Knowledge Bases for Text and Multimedia
Xie, Lexing
Wang, Haixun
[J]. PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 1235 - 1236
[9] Challenges in information extraction from text for knowledge management
Ciravegna, F
[J]. IEEE INTELLIGENT SYSTEMS, 2001, 16 (06) : 88 - 90
[10] A Template-Based Information Extraction from Web Sites with Unstable Markup
Kolchin, Maxim
Kozlov, Fedor
[J]. SEMANTIC WEB EVALUATION CHALLENGE, 2014, 475 : 89 - 94

← 1 2 3 4 5 →