Automatic Data Records Extraction from List Page in Deep Web Sources

被引:0
|
作者
Chen Hong-ping [1 ]
Fang Wei [1 ]
Yang Zhou [1 ]
Zhuo Lin [1 ]
Cui Zhi-Ming [1 ]
机构
[1] Soochow Univ, Inst Intelligent Informat Proc & Applicat, Suzhou, Jiangsu, Peoples R China
关键词
Data record; Data Extraction; Deep Web;
D O I
10.1109/APCIP.2009.100
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
with the explosive growth and popularity of the World Wide Web, a wealth of online e-commerce information resources become available. List pages in these web sites are usually automatically generated from the back-end DBMS using scripts. In order to provide value-added services and convenience for users, it is very necessary to integrate web sources of the same domain. Given the huge number of these web pages, it is difficult and even impossible to use a manual approach to extract data records from these list pages on a large scale. According to characteristics of the template-based list pages, in this paper, we propose a LBDRF algorithm to solve the problem of automatic data records extraction from web pages in Deep Web. Our experimental results show that the proposed method performs well.
引用
收藏
页码:370 / 373
页数:4
相关论文
共 50 条
  • [21] A Method of Automatic Web Information Extraction Based on Page Clustering
    Yang, Tianqi
    Qiu, Taofen
    [J]. 2011 9TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA 2011), 2011, : 390 - 393
  • [22] Extraction of flat and nested data records from web pages
    Algur, Siddu P.
    Hiremath, P.S.
    [J]. Conferences in Research and Practice in Information Technology Series, 2006, 61 : 163 - 168
  • [23] OntoBuilder: Fully automatic extraction and consolidation of ontologies from Web sources
    Gal, A
    Modica, G
    Jamil, H
    [J]. 20TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2004, : 853 - 853
  • [24] Deep Web Data Extraction
    Hong, Jer Lang
    [J]. IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2010), 2010, : 3420 - 3427
  • [25] Automatic data extraction from data-rich web pages
    Hu, DD
    Meng, XF
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2005, 3453 : 828 - 839
  • [26] A Web data extraction approach to harvesting data from online sources
    Nayak, Richi
    Haugaasen, Magnus
    [J]. Advances in Intelligent IT: Active Media Technology 2006, 2006, 138 : 323 - 326
  • [27] Automatic Extraction of Complex Web Data
    Zhang, Ming
    Zhou, Ying
    Patrick, Jon
    [J]. PACIFIC ASIA CONFERENCE ON INFORMATION SYSTEMS 2006, SECTIONS 1-8, 2006, : 1436 - 1449
  • [28] Research on the Automatic Extraction Method of Web Data Objects Based on Deep Learning
    Peng, Hao
    Li, Qiao
    [J]. INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2020, 26 (03): : 609 - 616
  • [29] Unstructured data extraction of Chinese expert web page
    Hong, Xudong
    Shen, Tao
    Shen, Longhua
    Yu, Zhengtao
    Guo, Jianyi
    [J]. International Journal of Wireless and Mobile Computing, 2014, 7 (02) : 132 - 136
  • [30] Deep Web navigation in Web data extraction
    Baumgartner, Robert
    Ceresna, Michal
    Ledermueller, Gerald
    [J]. INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE FOR MODELLING, CONTROL & AUTOMATION JOINTLY WITH INTERNATIONAL CONFERENCE ON INTELLIGENT AGENTS, WEB TECHNOLOGIES & INTERNET COMMERCE, VOL 2, PROCEEDINGS, 2006, : 698 - +