Automatically Extracting Web Data Records

被引:0
|
作者
Mundluru, Dheerendranath [1 ]
Raghavan, Vijay V. [1 ]
Wu, Zonghuan [1 ]
机构
[1] IMshopping Inc, Santa Clara, CA USA
来源
ACTIVE MEDIA TECHNOLOGY | 2010年 / 6335卷
关键词
Structured data extraction; Web content mining;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
It is essential for Web applications such as e-commerce portals to enrich their existing content offerings by aggregating relevant structured data (e.g., product reviews) from external Web resources. To meet this goal, in this paper, we present an algorithm for automatically extracting data records from Web pages. The algorithm uses a robust string matching technique for accurately identifying the records in the Webpage. Our experiments on diverse datasets (including datasets from third-party research projects) show that the proposed algorithm is highly effective and performs considerably better than two other state-of-the-art automatic data extraction systems. We made the proposed system publicly accessible in order for the readers to evaluate it.
引用
下载
收藏
页码:510 / +
页数:2
相关论文
共 50 条
  • [21] Extracting Data Records Based on Global Schema
    Chen, Kerui
    Zuo, Wanli
    Zhang, Fan
    He, Fenglin
    INFORMATION TECHNOLOGY FOR MANUFACTURING SYSTEMS, PTS 1 AND 2, 2010, : 553 - +
  • [22] Development and validation of a classification approach for extracting severity automatically from electronic health records
    Mary Regina Boland
    Nicholas P Tatonetti
    George Hripcsak
    Journal of Biomedical Semantics, 6
  • [23] Development and validation of a classification approach for extracting severity automatically from electronic health records
    Boland, Mary Regina
    Tatonetti, Nicholas P.
    Hripcsak, George
    JOURNAL OF BIOMEDICAL SEMANTICS, 2015, 6
  • [24] Extracting Records from the Web Using a Signal Processing Approach
    Velloso, Roberto Panerai
    Dorneles, Carina F.
    CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 197 - 206
  • [25] Algorithms of mining data records from website automatically
    Qiu, Yong
    Lan, Yongjie
    Journal of Southeast University (English Edition), 2006, 22 (03) : 423 - 425
  • [26] A Method Of Collecting and Analyzing Web Data Automatically
    Tran Trong Hoa
    Vu Quang Dung
    Nakajima, Nobuyasu
    2014 INTERNATIONAL CONFERENCE ON COMPUTING, MANAGEMENT AND TELECOMMUNICATIONS (COMMANTEL), 2014, : 269 - 274
  • [27] PROBLEMS EXTRACTING DATA FROM HOSPITAL MATERNITY RECORDS
    CARTWRIGHT, A
    JACOBY, A
    MARTIN, C
    COMMUNITY MEDICINE, 1987, 9 (03): : 286 - 293
  • [28] Automatically extracting form labels
    Nguyen, Hoa
    Kang, Eun Yong
    Freire, Juliana
    2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 1498 - +
  • [29] Crawling and Extracting Process Data from the Web
    Liu, Yaling
    Agah, Arvin
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2009, 5678 : 545 - 552
  • [30] Turning the Web into a Database: Extracting Data and Structure
    Hovy, Eduard H.
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, 2010, 5723 : 1 - 7