Web Information Extraction based on similar patterns

被引:0
|
作者
Ye, N [1 ]
Wu, XJ [1 ]
Zhu, JB [1 ]
Chen, WL [1 ]
Yao, TS [1 ]
机构
[1] Northeastern Univ, Natl Language Proc Lab, Shenyang, Peoples R China
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Information Extraction is an important research topic in data mining. In this paper we introduce a web information extraction approach based on similar patterns, in which the construction of pattern library is a knowledge acquisition bottleneck. We use a method based on similarity computation to automatically acquire patterns from large-scale corpus. According to the given seed patterns, relevant patterns can be learned from unlabeled training web pages. The generated patterns can be put to. use after little manual correction. Compared to other algorithms, our approach requires much less human intervention and avoids the necessity-of hand-tagging training corpus. Experimental results show that the acquired patterns achieve IE precision of 79.45% and recall of 66.51% in open test.
引用
收藏
页码:646 / 651
页数:6
相关论文
共 50 条
  • [1] Web Information Extraction Based on IEBIDTech
    Ren, Xiaoyan
    Fu, Yunxia
    [J]. 2012 WORLD AUTOMATION CONGRESS (WAC), 2012,
  • [2] Information Extraction from the Web by Matching Visual Presentation Patterns
    Burget, Radek
    [J]. KNOWLEDGE GRAPHS AND LANGUAGE TECHNOLOGY, 2017, 10579 : 10 - 26
  • [3] Web-Based Information Extraction Technology
    孙铁利
    教巍巍
    刘淑华
    [J]. Journal of Donghua University(English Edition), 2007, (02) : 288 - 292
  • [4] Ontology-Based Web Information Extraction
    Mo, Qian
    Chen, Yi-hong
    [J]. COMMUNICATIONS AND INFORMATION PROCESSING, PT 1, 2012, 288 : 118 - 126
  • [5] Web information extraction based on genetic algorithm
    Guo, Yin-Rui
    Chen, Rong
    [J]. Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2011, 24 (03): : 385 - 390
  • [6] Web-based Geospatial Information Extraction
    Kahler, Bart
    Jones, K. C.
    Bacher, Brian
    [J]. PROCEEDINGS OF THE 2012 IEEE NATIONAL AEROSPACE AND ELECTRONICS CONFERENCE (NAECON), 2012, : 46 - 50
  • [7] Web information extraction based on probabilistic model
    Wang, Jing
    Liu, Zhi-Jing
    [J]. Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2010, 23 (06): : 847 - 855
  • [8] Logic-based web information extraction
    Gottlob, G
    Koch, C
    [J]. SIGMOD RECORD, 2004, 33 (02) : 87 - 94
  • [9] Research on Web Information Extraction Based on XML
    Hu, Yan
    Xuan, Yanyan
    [J]. SECOND INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTING: WGEC 2008, PROCEEDINGS, 2008, : 201 - 204
  • [10] Automatic Web Information Extraction Based on Rules
    Hu, Fanghuai
    Ruan, Tong
    Shao, Zhiqing
    Ding, Jun
    [J]. WEB INFORMATION SYSTEMS ENGINEERING - WISE 2011, 2011, 6997 : 265 - 272