Building a web-scale image similarity search system

被引:10
|
作者
Batko, Michal [1 ]
Falchi, Fabrizio [2 ]
Lucchese, Claudio [2 ]
Novak, David [1 ]
Perego, Raffaele [2 ]
Rabitti, Fausto [2 ]
Sedmidubsky, Jan [1 ]
Zezula, Pavel [1 ]
机构
[1] Masaryk Univ, Fac Informat, Brno, Czech Republic
[2] CNR, ISTI, I-56100 Pisa, Italy
关键词
Similarity search; Content-based image retrieval; Metric space; MPEG-7; descriptors; Peer-to-peer search network; IMPLEMENTATION;
D O I
10.1007/s11042-009-0339-z
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As the number of digital images is growing fast and Content-based Image Retrieval (CBIR) is gaining in popularity, CBIR systems should leap towards Web-scale datasets. In this paper, we report on our experience in building an experimental similarity search system on a test collection of more than 50 million images. The first big challenge we have been facing was obtaining a collection of images of this scale with the corresponding descriptive features. We have tackled the non-trivial process of image crawling and extraction of several MPEG-7 descriptors. The result of this effort is a test collection, the first of such scale, opened to the research community for experiments and comparisons. The second challenge was to develop indexing and searching mechanisms able to scale to the target size and to answer similarity queries in real-time. We have achieved this goal by creating sophisticated centralized and distributed structures based purely on the metric space model of data. We have joined them together which has resulted in an extremely flexible and scalable solution. In this paper, we study in detail the performance of this technology and its evolvement as the data volume grows by three orders of magnitude. The results of the experiments are very encouraging and promising for future applications.
引用
收藏
页码:599 / 629
页数:31
相关论文
共 50 条
  • [41] Web-Scale Multimedia Processing and Applications
    Chang, Edward
    Chang, Shih-Fu
    Hauptmann, Alexander G.
    Huang, Thomas S.
    Slaney, Malcolm
    PROCEEDINGS OF THE IEEE, 2012, 100 (09) : 2580 - 2583
  • [42] Analysis of Web-Scale Cloud Services
    Noor, Talal H.
    Sheng, Quan Z.
    Ngu, Anne H. H.
    Dustdar, Schahram
    IEEE INTERNET COMPUTING, 2014, 18 (04) : 55 - 61
  • [43] Face recognition for web-scale datasets
    Ortiz, Enrique G.
    Becker, Brian C.
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2014, 118 : 153 - 170
  • [44] Web-scale semantic information processing
    Heflin, Jeff
    Stuckenschmidt, Heiner
    JOURNAL OF WEB SEMANTICS, 2012, 10 : 1 - 2
  • [45] Web-Scale Information Extraction with Vertex
    Gulhane, Pankaj
    Madaan, Amit
    Mehta, Rupesh
    Ramamirtham, Jeyashankher
    Rastogi, Rajeev
    Satpal, Sandeep
    Sengamedu, Srinivasan H.
    Tengli, Ashwin
    Tiwari, Charu
    IEEE 27TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2011), 2011, : 1209 - 1220
  • [46] Web-Scale Personalized Real-Time Recommender System on Suumo
    Li, Shiyingxue
    Nomura, Shimpei
    Kikuta, Yohei
    Arino, Kazuma
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2017, PT II, 2017, 10235 : 521 - 538
  • [47] Defense Against Adversarial Images using Web-Scale Nearest-Neighbor Search
    Dubey, Abhimanyu
    van der Maaten, Laurens
    Yalniz, Zeki
    Li, Yixuan
    Mahajan, Dhruv
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 8759 - 8768
  • [48] Web-Scale Classification: Web Classification in the Big Data Era
    Partalas, Ioannis
    Amini, Massih-Reza
    Androutsopoulos, Ion
    Artieres, Thierry
    Gallinari, Patrick
    Gaussier, Eric
    Paliouras, Georgios
    WSDM'14: PROCEEDINGS OF THE 7TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2014, : 687 - 688
  • [49] ParaCrawl: Web-Scale Acquisition of Parallel Corpora
    Banon, Marta
    Chen, Pinzhen
    Haddow, Barry
    Heafield, Kenneth
    Hoang, Hieu
    Espla-Gomis, Miquel
    Forcada, Mikel
    Kamran, Amir
    Kirefu, Faheem
    Koehn, Philipp
    Ortiz-Rojas, Sergio
    Pla, Leopoldo
    Ramirez-Sanchez, Gema
    Sarrias, Elsa
    Strelec, Marek
    Thompson, Brian
    Waites, William
    Wiggins, Dion
    Zaragoza, Jaume
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 4555 - 4567
  • [50] Poisoning Web-Scale Training Datasets is Practical
    Carlini, Nicholas
    Jagielski, Matthew
    Choquette-Choo, Christopher A.
    Paleka, Daniel
    Pearce, Will
    Anderson, Hyrum
    Terzis, Andreas
    Thomas, Kurt
    Tramer, Florian
    45TH IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP 2024, 2024, : 407 - 425