Wrapper Approaches For Web Data Extraction : A Review

被引:0
|
作者
Azir, Mohd Amir Bin Mohd [1 ]
Ahmad, Kamsuriah Binti [1 ]
机构
[1] Univ Kebangsaan Malaysia, Fac Informat Sci & Technol, Bangi 43600, Selangor, Malaysia
关键词
Web Data Extraction; Extracted Data; Unstructured Data; Semantic Data; Wrapper Algorithm; INFORMATION EXTRACTION;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Relational databases are known as collections of structured data within the digital structure and are normally arranged in rows and columns. However, most business data are present in the form of unstructured. Data extraction is a process of extracting unstructured, semi-structured, and structured data from the user requirement upon the web pages on the internet, in any type of automation level. Web pages contain data region which is formally in a structured data format. Manipulating and analyzing data using tools always required massive computing server resources. This paper will review existing techniques on data extraction for heterogeneous data in the Big Data environment. This review is aimed to discuss different data extraction approaches together with the basic tools algorithm for extracting favored data from various web sources. The various types of approaches that will be examined are Information Extraction Approaches, Automatic Wrapper Generation, Semi Automatic Wrapper Generation, Wrapper Induction, and Wrapper Maintenance. Although, many required techniques from web sources have been tested and developed, but the reviews on these techniques are still lacking. This paper reviews data extraction using wrapper approaches and compares each to identify the best approach to extract data from online sites.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Automatic generation of wrapper for data extraction from the Web
    Zhang, SZ
    Lu, ZD
    [J]. WEB ENGINEERING, PROCEEDINGS, 2003, 2722 : 390 - 394
  • [2] Research on Adaptive Wrapper in Deep Web Data Extraction
    Liu, Donglan
    Ma, Lei
    Liu, Xin
    [J]. INTERNET OF VEHICLES - SAFE AND INTELLIGENT MOBILITY, IOV 2015, 2015, 9502 : 409 - 423
  • [3] An Efficient Wrapper for Web Data Extraction and its Application
    Zhang, Suzhi
    Shi, Peizhong
    [J]. ICCSSE 2009: PROCEEDINGS OF 2009 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION, 2009, : 1245 - 1250
  • [4] Web data extraction research based on wrapper and XPath technology
    Liu, Hong
    Ma, YinXiao
    [J]. ADVANCED MATERIALS AND INFORMATION TECHNOLOGY PROCESSING, PTS 1-3, 2011, 271-273 : 706 - 712
  • [5] A supervised visual wrapper generator for web-data extraction
    Meng, XF
    Wang, HY
    Hu, DD
    Li, C
    [J]. 27TH ANNUAL INTERNATIONAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE, PROCEEDINGS, 2003, : 657 - 662
  • [6] FastWrap: An Efficient Wrapper for Tabular Data Extraction from the Web
    Amin, Mohammad Shafkat
    Jamil, Hasa
    [J]. PROCEEDINGS OF THE 2009 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2008, : 354 - 359
  • [7] Wrapper generation for automatic data extraction from large web sites
    Jindal, N
    [J]. DATABASES IN NETWORKED INFORMATION SYSTEMS, PROCEEDINGS, 2005, 3433 : 34 - 53
  • [8] FAEW: Fully-automatic data extraction wrapper on deep web
    Sun, Gaoshang
    Shen, Derong
    Liu, Nan
    Nie, Tiezheng
    Kou, Yue
    Yu, Ge
    [J]. Journal of Information and Computational Science, 2009, 6 (03): : 1163 - 1171
  • [9] Wrapper maintenance for web-data extraction based on pages features
    Zhou, Shunxian
    Lin, Yaping
    Wang, Jingpu
    Yang, Xiaolin
    [J]. INTELLIGENT INFORMATION PROCESSING AND WEB MINING, PROCEEDINGS, 2006, : 317 - +
  • [10] Page-Level Wrapper Verification for Unsupervised Web Data Extraction
    Chang, Chia-Hui
    Lin, Yen-Ling
    Lin, Kuan-Chen
    Kayed, Mohammed
    [J]. WEB INFORMATION SYSTEMS ENGINEERING - WISE 2013, PT I, 2013, 8180 : 454 - 467