Evolution of a Web-Scale Near Duplicate Image Detection System

被引:1
|
作者
Gusev, Andrey [1 ]
Xu, Jiajing [1 ]
机构
[1] Pinterest, San Francisco, CA 94107 USA
关键词
near-duplicate detection; recommendation systems; locality sensitive hashing; transfer learning; clustering;
D O I
10.1145/3366423.3380031
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Detecting near duplicate images is fundamental to the content ecosystem of photo sharing web applications. However, such a task is challenging when involving a web-scale image corpus containing billions of images. In this paper, we present an efficient system for detecting near duplicate images across 8 billion images. Our system consists of three stages: candidate generation, candidate selection, and clustering. We also demonstrate that this system can be used to greatly improve the quality of recommendations and search results across a number of real-world applications. In addition, we include the evolution of the system over the course of six years, bringing out experiences and lessons on how new systems are designed to accommodate organic content growth as well as the latest technology. Finally, we are releasing a human-labeled dataset of similar to 53,000 pairs of images introduced in this paper.
引用
收藏
页码:2733 / 2739
页数:7
相关论文
共 50 条
  • [1] Web-Scale Near-Duplicate Search: Techniques and Applications
    Ngo, Chong-Wah
    Xu, Changsheng
    Kraaij, Wessel
    El Saddik, Abdulmotaleb
    IEEE MULTIMEDIA, 2013, 20 (03) : 10 - 12
  • [2] Duplicate-Search-Based Image Annotation Using Web-Scale Data
    Wang, Xin-Jing
    Zhang, Lei
    Ma, Wei-Ying
    PROCEEDINGS OF THE IEEE, 2012, 100 (09) : 2705 - 2721
  • [3] Web-Scale Image Annotation
    Liu, Jiakai
    Hu, Rong
    Wang, Meihong
    Wang, Yi
    Chang, Edward Y.
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2008, 9TH PACIFIC RIM CONFERENCE ON MULTIMEDIA, 2008, 5353 : 663 - 674
  • [4] Building a web-scale image similarity search system
    Michal Batko
    Fabrizio Falchi
    Claudio Lucchese
    David Novak
    Raffaele Perego
    Fausto Rabitti
    Jan Sedmidubsky
    Pavel Zezula
    Multimedia Tools and Applications, 2010, 47 : 599 - 629
  • [5] Building a web-scale image similarity search system
    Batko, Michal
    Falchi, Fabrizio
    Lucchese, Claudio
    Novak, David
    Perego, Raffaele
    Rabitti, Fausto
    Sedmidubsky, Jan
    Zezula, Pavel
    MULTIMEDIA TOOLS AND APPLICATIONS, 2010, 47 (03) : 599 - 629
  • [6] Pattern-Based Near-Duplicate Video Retrieval and Localization on Web-Scale Videos
    Chou, Chien-Li
    Chen, Hua-Tsung
    Lee, Suh-Yin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (03) : 382 - 395
  • [7] An Efficient Approach to Web Near-Duplicate Image Detection
    Li, Jun
    Thou, Shan
    Xing, Liang
    Sun, Changyin
    Hu, Weiming
    2013 SECOND IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR 2013), 2013, : 186 - 190
  • [8] Web-scale image clustering revisited
    Avrithis, Yannis
    Kalantidis, Yannis
    Anagnostopoulos, Evangelos
    Emiris, Ioannis Z.
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1502 - 1510
  • [9] Large-scale duplicate detection for web image search
    Wang, Bin
    Li, Zhiwei
    Li, Mingjing
    Ma, Wei-Ying
    2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 353 - +
  • [10] SVD-SIFT FOR WEB NEAR-DUPLICATE IMAGE DETECTION
    Liu, Hong
    Lu, Hong
    Xue, Xiangyang
    2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 2010, : 1445 - 1448