Multi-component Similarity Method for Web Product Duplicate Detection

被引:12
|
作者
van Bezu, Ronald [1 ]
Borst, Sjoerd [1 ]
Rijkse, Rick [1 ]
Verhagen, Jim [1 ]
Vandic, Damir [1 ]
Frasincar, Flavius [1 ]
机构
[1] Erasmus Univ, POB 1738, NL-3000 DR Rotterdam, Netherlands
关键词
D O I
10.1145/2695664.2695818
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Due to the growing number of Web shops, aggregating product data from the Web is growing in importance. One of the problems encountered in product aggregation is duplicate detection. In this paper, we extend and significantly improve an existing state-of-the-art product duplicate detection method. Our approach employs a novel method for combining the titles' and the attributes' similarities into a final product similarity. We use q-grams to handle partial matching of words, such as abbreviations. Where existing methods cluster products of only two Web shops, we propose a hierarchical clustering method to handle multiple Web shops. Applying our new method to a dataset of TV's from four Web shops reveals that it significantly outperforms the Hybrid Similarity Method, the Title Model Words Method, and the well-known TF-IDF method, with an F-1 score of 0.475 compared to 0.287, 0.298, and 0.335, respectively.
引用
下载
收藏
页码:761 / 768
页数:8
相关论文
共 50 条
  • [1] Metal Detection by Multi-Component TEM Method
    Chen, Chow-Son
    Chiu, Wei-Hsuan
    Lin, Ching-Ren
    TERRESTRIAL ATMOSPHERIC AND OCEANIC SCIENCES, 2009, 20 (03): : 445 - 454
  • [2] The model of reusability of multi-component product
    Jodejko-Pietruczuk, A.
    Plewa, M.
    ADVANCES IN SAFETY, RELIABILITY AND RISK MANAGEMENT, 2012, : 2096 - 2102
  • [3] Novel detection method for multi-component radar emitter signals
    Rong, Hai-Na
    Zhang, Ge-Xiang
    Jin, Wei-Dong
    Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2009, 31 (09): : 2096 - 2100
  • [4] Remanufacturing of multi-component systems with product substitution
    Liu, Baolong
    Papier, Felix
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2022, 301 (03) : 896 - 911
  • [5] Multi-component Models for Object Detection
    Gu, Chunhui
    Arbelaez, Pablo
    Lin, Yuanqing
    Yu, Kai
    Malik, Jitendra
    COMPUTER VISION - ECCV 2012, PT IV, 2012, 7575 : 445 - 458
  • [6] Multi-component Gas Photoacoustic Detection
    Yun, Yuxin
    Jiang, Qiang
    2020 6TH INTERNATIONAL CONFERENCE ON ENERGY MATERIALS AND ENVIRONMENT ENGINEERING, 2020, 508
  • [7] Structural similarity in chiral-achiral multi-component crystals
    Scowen, Ian J.
    Alomar, Taghrid S.
    Munshi, Tasnim
    Seaton, Colin C.
    CRYSTENGCOMM, 2020, 22 (43) : 7334 - 7340
  • [8] Multi-component network product coding for cooperative relaying
    Gan, Ming
    Li, Hui
    Dai, Xu-Chu
    Gan, M., 1600, Editorial Board of Journal on Communications (34): : 108 - 113
  • [9] Component level signal segmentation method for multi-component fault detection in a wind turbine gearbox
    Praveen, Hemanth Mithun
    Sabareesh, G. R.
    Inturi, Vamsi
    Jaikanth, Akshay
    MEASUREMENT, 2022, 195
  • [10] Discriminative detection of neurotoxins in multi-component samples
    Simonian, AL
    Efremenko, EN
    Wild, JR
    ANALYTICA CHIMICA ACTA, 2001, 444 (02) : 179 - 186