Building a web-scale image similarity search system

被引:10
|
作者
Batko, Michal [1 ]
Falchi, Fabrizio [2 ]
Lucchese, Claudio [2 ]
Novak, David [1 ]
Perego, Raffaele [2 ]
Rabitti, Fausto [2 ]
Sedmidubsky, Jan [1 ]
Zezula, Pavel [1 ]
机构
[1] Masaryk Univ, Fac Informat, Brno, Czech Republic
[2] CNR, ISTI, I-56100 Pisa, Italy
关键词
Similarity search; Content-based image retrieval; Metric space; MPEG-7; descriptors; Peer-to-peer search network; IMPLEMENTATION;
D O I
10.1007/s11042-009-0339-z
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As the number of digital images is growing fast and Content-based Image Retrieval (CBIR) is gaining in popularity, CBIR systems should leap towards Web-scale datasets. In this paper, we report on our experience in building an experimental similarity search system on a test collection of more than 50 million images. The first big challenge we have been facing was obtaining a collection of images of this scale with the corresponding descriptive features. We have tackled the non-trivial process of image crawling and extraction of several MPEG-7 descriptors. The result of this effort is a test collection, the first of such scale, opened to the research community for experiments and comparisons. The second challenge was to develop indexing and searching mechanisms able to scale to the target size and to answer similarity queries in real-time. We have achieved this goal by creating sophisticated centralized and distributed structures based purely on the metric space model of data. We have joined them together which has resulted in an extremely flexible and scalable solution. In this paper, we study in detail the performance of this technology and its evolvement as the data volume grows by three orders of magnitude. The results of the experiments are very encouraging and promising for future applications.
引用
收藏
页码:599 / 629
页数:31
相关论文
共 50 条
  • [21] Web-Scale Datacenters
    Douglis, Fred
    IEEE INTERNET COMPUTING, 2014, 18 (04) : 13 - 14
  • [22] Climbing Out of the Box and Into the Cloud: Building Web-Scale for Libraries
    Jordan, Jay
    JOURNAL OF LIBRARY ADMINISTRATION, 2011, 51 (01) : 3 - 17
  • [23] Modeling Search Assistance Mechanisms within Web-Scale Discovery Systems
    Mischo, William H.
    Schlembach, Mary C.
    Norman, Michael A.
    JCDL'13: PROCEEDINGS OF THE 13TH ACM/IEEE-CS JOINT CONFERENCE ON DIGITAL LIBRARIES, 2013, : 407 - 408
  • [24] Web-scale distributed AI search across disconnected and heterogeneous infrastructures
    Kelsey, Tom
    McCaffery, Martin
    Kotthoff, Lars
    2014 IEEE 10TH INTERNATIONAL CONFERENCE ON E-SCIENCE (E-SCIENCE), VOL 1, 2014, : 39 - 46
  • [25] A Conceptual Model for a Web-Scale Entity Name System
    Bouquet, Paolo
    Palpanas, Themis
    Stoermer, Heiko
    Vignolo, Massimiliano
    SEMANTIC WEB, PROCEEDINGS, 2009, 5926 : 46 - 60
  • [26] DISTRIBUTED WEB-SCALE INFRASTRUCTURE FOR CRAWLING, INDEXING AND SEARCH WITH SEMANTIC SUPPORT
    Dlugolinsky, Stefan
    Seleng, Martin
    Laclavik, Michal
    Hluchy, Ladislav
    COMPUTER SCIENCE-AGH, 2012, 13 (04): : 5 - 19
  • [27] Search Query Quality and Web-Scale Discovery: A Qualitative and Quantitative Analysis
    Meadow, Kelly
    Meadow, James
    COLLEGE & UNDERGRADUATE LIBRARIES, 2012, 19 (2-4) : 163 - 175
  • [28] Web-scale Knowledge Collection
    Lockard, Colin
    Shiralkar, Prashant
    Dong, Xin Luna
    Hajishirzi, Hannaneh
    PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM '20), 2020, : 888 - 889
  • [29] Improving Image Recognition by Retrieving from Web-Scale Image-Text Data
    Iscen, Ahmet
    Fathi, Alireza
    Schmid, Cordelia
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19295 - 19304
  • [30] Querying Web-Scale Knowledge Graphs Through Effective Pruning of Search Space
    Jin, Jiahui
    Luo, Junzhou
    Khemmarat, Samamon
    Gao, Lixin
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (08) : 2342 - 2356