Evolution of a Web-Scale Near Duplicate Image Detection System

被引:1
|
作者
Gusev, Andrey [1 ]
Xu, Jiajing [1 ]
机构
[1] Pinterest, San Francisco, CA 94107 USA
关键词
near-duplicate detection; recommendation systems; locality sensitive hashing; transfer learning; clustering;
D O I
10.1145/3366423.3380031
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Detecting near duplicate images is fundamental to the content ecosystem of photo sharing web applications. However, such a task is challenging when involving a web-scale image corpus containing billions of images. In this paper, we present an efficient system for detecting near duplicate images across 8 billion images. Our system consists of three stages: candidate generation, candidate selection, and clustering. We also demonstrate that this system can be used to greatly improve the quality of recommendations and search results across a number of real-world applications. In addition, we include the evolution of the system over the course of six years, bringing out experiences and lessons on how new systems are designed to accommodate organic content growth as well as the latest technology. Finally, we are releasing a human-labeled dataset of similar to 53,000 pairs of images introduced in this paper.
引用
收藏
页码:2733 / 2739
页数:7
相关论文
共 50 条
  • [41] Web-Scale Image Retrieval Using Compact Tensor Aggregation of Visual Descriptors
    Negrel, Romain
    Picard, David
    Gosselin, Philippe-Henri
    IEEE MULTIMEDIA, 2013, 20 (03) : 24 - 33
  • [42] Web-Scale Human Task Management
    Schulte, Daniel
    SOFTWARE ARCHITECTURE, 2011, 6903 : 190 - 193
  • [43] Web-Scale Training for Face Identification
    Taigman, Yaniv
    Yang, Ming
    Ranzato, Marc'Aurelio
    Wolf, Lior
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 2746 - 2754
  • [44] Duplicate and near-duplicate documents in the web: detection by means of fuzzy-hash techniques
    Figuerola, Carlos G.
    Gomez Diaz, Raquel
    Alonso Berrocal, Jose L.
    Zazo Rodriguez, Angel F.
    SCIRE-REPRESENTACION Y ORGANIZACION DEL CONOCIMIENTO, 2011, 17 (01): : 49 - 54
  • [45] Social Web-Scale Provenance in the Cloud
    Simmhan, Yogesh
    Gomadam, Karthik
    PROVENANCE AND ANNOTATION OF DATA AND PROCESSES, 2010, 6378 : 298 - 300
  • [46] Maze: A Cost-Efficient Video Deduplication System at Web-scale
    Qin, An
    Xiao, Mengbai
    Huang, Ben
    Zhang, Xiaodong
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3163 - 3172
  • [47] Web-Scale Multimedia Processing and Applications
    Chang, Edward
    Chang, Shih-Fu
    Hauptmann, Alexander G.
    Huang, Thomas S.
    Slaney, Malcolm
    PROCEEDINGS OF THE IEEE, 2012, 100 (09) : 2580 - 2583
  • [48] Analysis of Web-Scale Cloud Services
    Noor, Talal H.
    Sheng, Quan Z.
    Ngu, Anne H. H.
    Dustdar, Schahram
    IEEE INTERNET COMPUTING, 2014, 18 (04) : 55 - 61
  • [49] Face recognition for web-scale datasets
    Ortiz, Enrique G.
    Becker, Brian C.
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2014, 118 : 153 - 170
  • [50] Web-scale semantic information processing
    Heflin, Jeff
    Stuckenschmidt, Heiner
    JOURNAL OF WEB SEMANTICS, 2012, 10 : 1 - 2