The WDC Gold Standards for Product Feature Extraction and Product Matching

被引:9
|
作者
Petrovski, Petar [1 ]
Primpeli, Anna [1 ]
Meusel, Robert [1 ]
Bizer, Christian [1 ]
机构
[1] Univ Mannheim, Data & Web Sci Grp, Mannheim, Germany
关键词
e-commerce; Product feature extraction; Product matching;
D O I
10.1007/978-3-319-53676-7_6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Finding out which e-shops offer a specific product is a central challenge for building integrated product catalogs and comparison shopping portals. Determining whether two offers refer to the same product involves extracting a set of features (product attributes) from the web pages containing the offers and comparing these features using a matching function. The existing gold standards for product matching have two shortcomings: (i) they only contain offers from a small number of e-shops and thus do not properly cover the heterogeneity that is found on the Web. (ii) they only provide a small number of generic product attributes and therefore cannot be used to evaluate whether detailed product attributes have been correctly extracted from textual product descriptions. To overcome these shortcomings, we have created two public gold standards: The WDC Product Feature Extraction Gold Standard consists of over 500 product web pages originating from 32 different websites on which we have annotated all product attributes (338 distinct attributes) which appear in product titles, product descriptions, as well as tables and lists. The WDC Product Matching Gold Standard consists of over 75 000 correspondences between 150 products (mobile phones, TVs, and headphones) in a central catalog and offers for these products on the 32 web sites. To verify that the gold standards are challenging enough, we ran several baseline feature extraction and matching methods, resulting in F-score values in the range 0.39 to 0.67. In addition to the gold standards, we also provide a corpus consisting of 13 million product pages from the same websites which might be useful as background knowledge for training feature extraction and matching methods.
引用
收藏
页码:73 / 86
页数:14
相关论文
共 50 条
  • [31] Multiresolution model based extraction of product feature lines in reverse engineering
    Zhiyang Chen
    Wei Peng
    Lili He
    Xiuzi Ye
    Engineering with Computers, 2004, 19 : 264 - 270
  • [32] Combining local and global information for product feature extraction in opinion documents
    Yang, Liang
    Liu, Bing
    Lin, Hongfei
    Lin, Yuan
    INFORMATION PROCESSING LETTERS, 2016, 116 (10) : 623 - 627
  • [33] Product spectrum matrix feature extraction and recognition of radar deception jamming
    Tian, Xiao
    Tang, Bin
    Gui, Guan
    INTERNATIONAL JOURNAL OF ELECTRONICS, 2013, 100 (12) : 1621 - 1629
  • [34] Fundamental Feature Extraction of the Battery Charge Phase from Product Data
    Bocca, Alberto
    Chen, Yukai
    Macii, Alberto
    Poncino, Massimo
    2018 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2018,
  • [35] An algorithm of online product feature extraction based on boundary average entropy
    Liu T.
    Zhang C.
    Wu M.
    1600, Systems Engineering Society of China (36): : 2416 - 2423
  • [36] A Maximum Entropy Model for Product Feature Extraction in Online Customer Reviews
    Somprasertsri, Gamgarn
    Lalitrojwong, Pattarachai
    2008 IEEE CONFERENCE ON CYBERNETICS AND INTELLIGENT SYSTEMS, VOLS 1 AND 2, 2008, : 786 - 791
  • [37] Multiresolution model based extraction of product feature lines in reverse engineering
    Chen, ZY
    Peng, W
    He, LL
    Ye, XZ
    ENGINEERING WITH COMPUTERS, 2004, 19 (04) : 264 - 270
  • [38] Supporting Product Line Adoption by Combining Syntactic and Textual Feature Extraction
    Kicsi, Andras
    Vidacs, Laszlo
    Csuvik, Viktor
    Horvath, Ferenc
    Beszedes, Arpad
    Kocsis, Ferenc
    NEW OPPORTUNITIES FOR SOFTWARE REUSE, 2018, 10826 : 148 - 163
  • [39] Feature Extraction and Retrieval of Ecommerce Product Images Based on Image Processing
    Wei, Zhenfeng
    Zhang, Xiaohua
    TRAITEMENT DU SIGNAL, 2021, 38 (01) : 181 - 190
  • [40] Generating Product Feature Hierarchy from Product Reviews
    Tian, Nan
    Xu, Yue
    Li, Yuefeng
    Abdel-Hafez, Ahmad
    Josang, Audun
    WEB INFORMATION SYSTEMS AND TECHNOLOGIES, WEBIST 2014, 2015, 226 : 264 - 278