Building a web-scale image similarity search system

被引:10
|
作者
Batko, Michal [1 ]
Falchi, Fabrizio [2 ]
Lucchese, Claudio [2 ]
Novak, David [1 ]
Perego, Raffaele [2 ]
Rabitti, Fausto [2 ]
Sedmidubsky, Jan [1 ]
Zezula, Pavel [1 ]
机构
[1] Masaryk Univ, Fac Informat, Brno, Czech Republic
[2] CNR, ISTI, I-56100 Pisa, Italy
关键词
Similarity search; Content-based image retrieval; Metric space; MPEG-7; descriptors; Peer-to-peer search network; IMPLEMENTATION;
D O I
10.1007/s11042-009-0339-z
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As the number of digital images is growing fast and Content-based Image Retrieval (CBIR) is gaining in popularity, CBIR systems should leap towards Web-scale datasets. In this paper, we report on our experience in building an experimental similarity search system on a test collection of more than 50 million images. The first big challenge we have been facing was obtaining a collection of images of this scale with the corresponding descriptive features. We have tackled the non-trivial process of image crawling and extraction of several MPEG-7 descriptors. The result of this effort is a test collection, the first of such scale, opened to the research community for experiments and comparisons. The second challenge was to develop indexing and searching mechanisms able to scale to the target size and to answer similarity queries in real-time. We have achieved this goal by creating sophisticated centralized and distributed structures based purely on the metric space model of data. We have joined them together which has resulted in an extremely flexible and scalable solution. In this paper, we study in detail the performance of this technology and its evolvement as the data volume grows by three orders of magnitude. The results of the experiments are very encouraging and promising for future applications.
引用
收藏
页码:599 / 629
页数:31
相关论文
共 50 条
  • [31] Building a Web-Scale Dependency-Parsed Corpus from Common Crawl
    Panchenko, Alexander
    Ruppert, Eugen
    Faralli, Stefano
    Ponzetto, Simone P.
    Biemann, Chris
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 1816 - 1823
  • [33] Web-Scale Media Recommendation Systems
    Dror, Gideon
    Koenigstein, Noam
    Koren, Yehuda
    PROCEEDINGS OF THE IEEE, 2012, 100 (09) : 2722 - 2736
  • [34] Web-Scale Extraction of Structured Data
    Cafarella, Michael J.
    Madhavan, Jayant
    Halevy, Alon
    SIGMOD RECORD, 2008, 37 (04) : 55 - 61
  • [35] Web-Scale Multimedia Information Networks
    Qi, Guo-Jun
    Tsai, Min-Hsuan
    Tsai, Shen-Fu
    Cao, Liangliang
    Huang, Thomas S.
    PROCEEDINGS OF THE IEEE, 2012, 100 (09) : 2688 - 2704
  • [36] Web-Scale Image Retrieval Using Compact Tensor Aggregation of Visual Descriptors
    Negrel, Romain
    Picard, David
    Gosselin, Philippe-Henri
    IEEE MULTIMEDIA, 2013, 20 (03) : 24 - 33
  • [37] Web-Scale Human Task Management
    Schulte, Daniel
    SOFTWARE ARCHITECTURE, 2011, 6903 : 190 - 193
  • [38] Web-Scale Training for Face Identification
    Taigman, Yaniv
    Yang, Ming
    Ranzato, Marc'Aurelio
    Wolf, Lior
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 2746 - 2754
  • [39] Social Web-Scale Provenance in the Cloud
    Simmhan, Yogesh
    Gomadam, Karthik
    PROVENANCE AND ANNOTATION OF DATA AND PROCESSES, 2010, 6378 : 298 - 300
  • [40] Maze: A Cost-Efficient Video Deduplication System at Web-scale
    Qin, An
    Xiao, Mengbai
    Huang, Ben
    Zhang, Xiaodong
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3163 - 3172