Deep web data extraction based on visual information processing

被引:3
|
作者
Liu J. [1 ]
Lin L. [1 ]
Cai Z. [1 ]
Wang J. [2 ,3 ]
Kim H.-J. [4 ]
机构
[1] College of Information Engineering, Shanghai Maritime University, Shanghai
[2] Key Laboratory of Broadband Wireless Communication and Sensor Network Technology (Nanjing University of Posts and Telecommunications), Ministry of Education, Nanjing
[3] College of Information Engineering, Yangzhou University, Yangzhou
[4] Business Administration Research Institute, Sungshin W. University, Seoul
关键词
CNN; Data extraction; Deep web; Visual information;
D O I
10.1007/s12652-017-0587-0
中图分类号
学科分类号
摘要
With the rapid development of technology, the Web has become the largest encyclopedic database. Although users can get information conveniently on the surface web by using applications such as browsers, it is hard to retrieve information in the deep web. Deep web requires a user submit a query to the server to get information from its database to generate the result webpage. Thus methods different from traditional Web surfing are needed to conduct the data extraction in deep web. Most of the existing deep web data extraction methods are based on DOM tree analysis. In this paper, to fully utilize the visual information contained in a webpage, a data region locating method based on convolutional neural network and a visual information based segmentation algorithm are proposed. In order to verify the efficiency of the proposed method, we apply it to real world commercial websites to perform data extraction. Experiments of data region location model, data extraction, and data item alignment verify that our proposed method can effectively improve the accuracy of data region location and the efficiency of data extraction. © Springer-Verlag GmbH Germany 2017.
引用
收藏
页码:1481 / 1491
页数:10
相关论文
共 50 条
  • [31] Information Extraction from the Web by Matching Visual Presentation Patterns
    Burget, Radek
    KNOWLEDGE GRAPHS AND LANGUAGE TECHNOLOGY, 2017, 10579 : 10 - 26
  • [32] Airborne lidar data processing and information extraction
    Chen, Qi
    PHOTOGRAMMETRIC ENGINEERING AND REMOTE SENSING, 2007, 73 (02): : 109 - 112
  • [33] INFORMATION EXTRACTION APPROACH TO SEISMIC DATA PROCESSING
    SUNDQUIS.JE
    GEOEXPLORATION, 1970, 8 (3-4): : 243 - &
  • [34] Dynamic Query Processing for Hidden Web Data Extraction
    Ahuja, Babita
    Anuradha
    Juneja, Dimple
    2015 2ND INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2015, : 1352 - 1356
  • [35] An Efficient Mechanism for Deep Web Data Extraction Based on Tree-Structured Web Pattern Matching
    Ahamed, B. Bazeer
    Yuvaraj, D.
    Shitharth, S.
    Mirza, Olfat M.
    Alsobhi, Aisha
    Yafoz, Ayman
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
  • [36] Client-side deep web data extraction
    Alvarez, M
    Pan, A
    Raposo, J
    Viña, A
    PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON E-COMMERCE TECHNOLOGY FOR DYNAMIC E-BUSINESS, 2004, : 158 - 161
  • [37] Research on Adaptive Wrapper in Deep Web Data Extraction
    Liu, Donglan
    Ma, Lei
    Liu, Xin
    INTERNET OF VEHICLES - SAFE AND INTELLIGENT MOBILITY, IOV 2015, 2015, 9502 : 409 - 423
  • [38] Web Page Information Extraction System by Using Deep Learning
    Pakyurek, Muhammet
    Sezgin, Mehmet Selman
    Kulac, Selman
    2019 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2019, : 145 - 149
  • [39] Web Page Information Extraction System by Using Deep Learning
    Pakyurek, Muhammet
    Sezgin, Mehmet Selman
    Kulac, Selman
    2019 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2019, : 361 - 365
  • [40] Data cross-locating in web information extraction
    School of Software Engineering, South China University of Technology, Guangzhou 510006, China
    Huanan Ligong Daxue Xuebao, 2008, 5 (43-47+52): : 43 - 47