The Research of automatic extraction dynamic web data

被引:0
|
作者
Qu Jubao [1 ]
机构
[1] Wuyi Univ, Dept Comp Sci & Engn, Nanping 354300, Fujian, Peoples R China
关键词
dynamic; automatic; extraction; templatec; web;
D O I
10.1109/IFITA.2009.211
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The rapid development of the World Wide Web makes it become more and more important sources for people to look for useful data. A substantial fraction of the Web consists of pages that are dynamically generated using a common template populated with data from databases. This paper proposed a novel approach to automatically detecting templates from a set of example pages and extracting data infield level. The objective of the proposed research is to automatically detect the template behind these pages and extract embedded data. The template defection problem is formalized and an analysis of the underlying structure of template-generated pages is made. A template detection approach is presented and the detected templates are used to extract data from instance pages. Experimental results on two large third-party test beds show that the approach can achieve high extraction accuracyt.
引用
收藏
页码:143 / 146
页数:4
相关论文
共 50 条
  • [1] Research on Automatic Extraction of Web Metadata
    Hu Changxia
    Liu Xiaoxing
    [J]. 2009 WRI WORLD CONGRESS ON SOFTWARE ENGINEERING, VOL 1, PROCEEDINGS, 2009, : 449 - 452
  • [2] Automatic Extraction of Complex Web Data
    Zhang, Ming
    Zhou, Ying
    Patrick, Jon
    [J]. PACIFIC ASIA CONFERENCE ON INFORMATION SYSTEMS 2006, SECTIONS 1-8, 2006, : 1436 - 1449
  • [3] Research on the Automatic Extraction Method of Web Data Objects Based on Deep Learning
    Peng, Hao
    Li, Qiao
    [J]. INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2020, 26 (03): : 609 - 616
  • [4] On the automatic extraction of data from the hidden web
    Liddle, SW
    Yau, SH
    Embley, DW
    [J]. CONCEPTUAL MODELING FOR NEW INFORMATION SYSTEMS TECHNOLOGIES, 2002, 2465 : 212 - 226
  • [5] Automatic generation of wrapper for data extraction from the Web
    Zhang, SZ
    Lu, ZD
    [J]. WEB ENGINEERING, PROCEEDINGS, 2003, 2722 : 390 - 394
  • [6] Automatic Data Extraction from Web Discussion Forums
    Li, Suke
    Tang, Liyong
    Hu, Jianbin
    Chen, Zhong
    [J]. FCST 2009: PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON FRONTIER OF COMPUTER SCIENCE AND TECHNOLOGY, 2009, : 219 - 225
  • [7] An Automatic Semantic Extraction Method for Web Data Interchange
    Yao, Yuangang
    Liu, Hui
    Yi, Jin
    Chen, Haiqiang
    Zhao, Xianghui
    Ma, Xiaoyu
    [J]. 2014 6TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY (CSIT), 2014, : 148 - 152
  • [8] Automatic Extraction of Structured Web Data with Domain Knowledge
    Derouiche, Nora
    Cautis, Bogdan
    Abdessalem, Talel
    [J]. 2012 IEEE 28TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2012, : 726 - 737
  • [9] Data extraction and annotation for dynamic web pages
    Song, H
    Giri, S
    Ma, FY
    [J]. 2004 IEEE INTERNATIONAL CONFERNECE ON E-TECHNOLOGY, E-COMMERE AND E-SERVICE, PROCEEDINGS, 2004, : 499 - 502
  • [10] Automatic data extraction from data-rich web pages
    Hu, DD
    Meng, XF
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2005, 3453 : 828 - 839