Evolution of a Web-Scale Near Duplicate Image Detection System

被引:1
|
作者
Gusev, Andrey [1 ]
Xu, Jiajing [1 ]
机构
[1] Pinterest, San Francisco, CA 94107 USA
关键词
near-duplicate detection; recommendation systems; locality sensitive hashing; transfer learning; clustering;
D O I
10.1145/3366423.3380031
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Detecting near duplicate images is fundamental to the content ecosystem of photo sharing web applications. However, such a task is challenging when involving a web-scale image corpus containing billions of images. In this paper, we present an efficient system for detecting near duplicate images across 8 billion images. Our system consists of three stages: candidate generation, candidate selection, and clustering. We also demonstrate that this system can be used to greatly improve the quality of recommendations and search results across a number of real-world applications. In addition, we include the evolution of the system over the course of six years, bringing out experiences and lessons on how new systems are designed to accommodate organic content growth as well as the latest technology. Finally, we are releasing a human-labeled dataset of similar to 53,000 pairs of images introduced in this paper.
引用
收藏
页码:2733 / 2739
页数:7
相关论文
共 50 条
  • [31] Improving Image Recognition by Retrieving from Web-Scale Image-Text Data
    Iscen, Ahmet
    Fathi, Alireza
    Schmid, Cordelia
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19295 - 19304
  • [32] Benchmarking unsupervised near-duplicate image detection
    Morra, Lia
    Lamberti, Fabrizio
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 135 : 313 - 326
  • [33] Practical Application of Near Duplicate Detection for Image Database
    Eshkol, Adi
    Grega, Michal
    Leszczuk, Mikolaj
    Weintraub, Ofer
    MULTIMEDIA COMMUNICATIONS, SERVICES AND SECURITY, MCSS 2014, 2014, 429 : 73 - 82
  • [34] Fixing the Threshold for Effective Detection of Near Duplicate Web Documents in Web Crawling
    Narayana, V. A.
    Premchand, P.
    Govardhan, A.
    ADVANCED DATA MINING AND APPLICATIONS, ADMA 2010, PT I, 2010, 6440 : 169 - 180
  • [35] Near Duplicate Web Page Detection With Analytic Feature Weighting
    Naseem, Rasia
    Anees, Sheena
    Muneer, K.
    Farook, Syed K.
    2013 THIRD INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING AND COMMUNICATIONS (ICACC 2013), 2013, : 324 - 327
  • [36] Near-Duplicate Detection in Web App Model Inference
    Yandrapally, Rahulkrishna
    Stocco, Andrea
    Mesbah, Ali
    2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2020), 2020, : 186 - 197
  • [38] Web-Scale Media Recommendation Systems
    Dror, Gideon
    Koenigstein, Noam
    Koren, Yehuda
    PROCEEDINGS OF THE IEEE, 2012, 100 (09) : 2722 - 2736
  • [39] Web-Scale Extraction of Structured Data
    Cafarella, Michael J.
    Madhavan, Jayant
    Halevy, Alon
    SIGMOD RECORD, 2008, 37 (04) : 55 - 61
  • [40] Web-Scale Multimedia Information Networks
    Qi, Guo-Jun
    Tsai, Min-Hsuan
    Tsai, Shen-Fu
    Cao, Liangliang
    Huang, Thomas S.
    PROCEEDINGS OF THE IEEE, 2012, 100 (09) : 2688 - 2704