Automatically Extracting Web Data Records

被引:0
|
作者
Mundluru, Dheerendranath [1 ]
Raghavan, Vijay V. [1 ]
Wu, Zonghuan [1 ]
机构
[1] IMshopping Inc, Santa Clara, CA USA
来源
ACTIVE MEDIA TECHNOLOGY | 2010年 / 6335卷
关键词
Structured data extraction; Web content mining;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
It is essential for Web applications such as e-commerce portals to enrich their existing content offerings by aggregating relevant structured data (e.g., product reviews) from external Web resources. To meet this goal, in this paper, we present an algorithm for automatically extracting data records from Web pages. The algorithm uses a robust string matching technique for accurately identifying the records in the Webpage. Our experiments on diverse datasets (including datasets from third-party research projects) show that the proposed algorithm is highly effective and performs considerably better than two other state-of-the-art automatic data extraction systems. We made the proposed system publicly accessible in order for the readers to evaluate it.
引用
下载
收藏
页码:510 / +
页数:2
相关论文
共 50 条
  • [41] Extracting structured data from web pages (poster)
    Arasu, A
    Garcia-Molina, H
    19TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2003, : 698 - 698
  • [42] Automatically extracting ontologically specified data from HTML']HTML tables of unknown structure
    Embley, DW
    Tao, C
    Liddle, SW
    CONCEPTUAL MODELING - ER 2002, 2002, 2503 : 322 - 337
  • [43] TIMiner: Automatically extracting and analyzing categorized cyber threat intelligence from social data
    Zhao, Jun
    Yan, Qiben
    Li, Jianxin
    Shao, Minglai
    He, Zuti
    Li, Bo
    COMPUTERS & SECURITY, 2020, 95
  • [44] Estimating nitrogen in eucalypt foliage by automatically extracting tree spectra from HyMap™ data
    Huang, Zhi
    Jia, Xiuping
    Turner, Brian J.
    Dury, Stephen J.
    Wallis, Ian R.
    Foley, William J.
    PHOTOGRAMMETRIC ENGINEERING AND REMOTE SENSING, 2007, 73 (04): : 397 - 401
  • [45] Method for extracting knowledge of train rescheduling from data of operation records
    Tanaka, Shunichi
    Kato, Satoshi
    Sakaguchi, Takashi
    Takimoto, Tomoharu
    Quarterly Report of RTRI (Railway Technical Research Institute), 2021, 62 (04) : 269 - 274
  • [46] Extracting loosely structured data records through mining strict patterns
    Wu, Yipu
    Chen, Jing
    Li, Qing
    2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 1322 - +
  • [47] EXTRACTING SOCIAL NETWORKS FROM SEIZED SMARTPHONES AND WEB DATA
    Dellutri, Fabio
    Laura, Luigi
    Ottaviani, Vittorio
    Italiano, Giuseppe F.
    2009 FIRST IEEE INTERNATIONAL WORKSHOP ON INFORMATION FORENSICS AND SECURITY (WIFS), 2009, : 101 - +
  • [48] Extracting Web data using instance-based learning
    Zhai, Yanhong
    Liu, Bing
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2007, 10 (02): : 113 - 132
  • [49] Web Service for extracting stream networks from DEM data
    Luo W.
    Li X.
    Molloy I.
    Di L.
    Stepinski T.
    GeoJournal, 2014, 79 (2) : 183 - 193
  • [50] Web Service for Extracting Terrain Openness from DEM Data
    Luo, Wei
    Li, Xiaoyan
    Di, Liping
    Stepinski, Tomasz F.
    2009 17TH INTERNATIONAL CONFERENCE ON GEOINFORMATICS, VOLS 1 AND 2, 2009, : 695 - +