Automatic Data Records Extraction from List Page in Deep Web Sources

被引:0
|
作者
Chen Hong-ping [1 ]
Fang Wei [1 ]
Yang Zhou [1 ]
Zhuo Lin [1 ]
Cui Zhi-Ming [1 ]
机构
[1] Soochow Univ, Inst Intelligent Informat Proc & Applicat, Suzhou, Jiangsu, Peoples R China
关键词
Data record; Data Extraction; Deep Web;
D O I
10.1109/APCIP.2009.100
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
with the explosive growth and popularity of the World Wide Web, a wealth of online e-commerce information resources become available. List pages in these web sites are usually automatically generated from the back-end DBMS using scripts. In order to provide value-added services and convenience for users, it is very necessary to integrate web sources of the same domain. Given the huge number of these web pages, it is difficult and even impossible to use a manual approach to extract data records from these list pages on a large scale. According to characteristics of the template-based list pages, in this paper, we propose a LBDRF algorithm to solve the problem of automatic data records extraction from web pages in Deep Web. Our experimental results show that the proposed method performs well.
引用
收藏
页码:370 / 373
页数:4
相关论文
共 50 条
  • [1] Semantic Deep Web: Automatic Attribute Extraction from the Deep Web Data Sources
    An, Yoo Jung
    Geller, James
    Wu, Yi-Ta
    Chun, Soon Ae
    [J]. APPLIED COMPUTING 2007, VOL 1 AND 2, 2007, : 1667 - 1672
  • [2] A Visual Based Page Segmentation for Deep Web Data Extraction
    Palekar, Vikas R.
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SOFT COMPUTING FOR PROBLEM SOLVING (SOCPROS 2011), VOL 2, 2012, 131 : 791 - 804
  • [3] Data extraction from Web data sources
    Robinson, J
    [J]. 15TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2004, : 282 - 288
  • [4] Automatic Summarization and Keyword Extraction from Web Page or Text File
    You, Xiangdong
    [J]. 2019 IEEE 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION ENGINEERING TECHNOLOGY (CCET), 2019, : 154 - 158
  • [5] Automatic generation of data types for classification of Deep Web sources
    Ngu, AHH
    Buttler, D
    Critchlow, T
    [J]. DATA INTEGRATION IN THE LIFE SCIENCES, PROCEEDINGS, 2005, 3615 : 266 - 274
  • [6] On the automatic extraction of data from the hidden web
    Liddle, SW
    Yau, SH
    Embley, DW
    [J]. CONCEPTUAL MODELING FOR NEW INFORMATION SYSTEMS TECHNOLOGIES, 2002, 2465 : 212 - 226
  • [7] Enhancing Automatic Extraction of Top-K List from Web
    Patil, Dipali S.
    Dhawas, N. A.
    [J]. 2014 INTERNATIONAL CONFERENCE FOR CONVERGENCE OF TECHNOLOGY (I2CT), 2014,
  • [8] Automatic extraction and verification of page transitions in a Web application
    Kubo, Atsuto
    Washizaki, Hironori
    Fukazawa, Yoshiaki
    [J]. 14TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE, PROCEEDINGS, 2007, : 350 - +
  • [9] Visually Extracting Data Records from the Deep Web
    Anderson, Neil
    Hong, Jun
    [J]. PROCEEDINGS OF THE 22ND INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'13 COMPANION), 2013, : 1233 - 1238
  • [10] Data extraction from Deep Web pages
    Yang, Jufeng
    Shi, Guangshun
    Zheng, Yan
    Wang, Qingren
    [J]. CIS: 2007 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY, PROCEEDINGS, 2007, : 237 - 241