Building a web-scale image similarity search system

被引：10

作者：

Batko, Michal ^{[1
]}

Falchi, Fabrizio ^{[2
]}

Lucchese, Claudio ^{[2
]}

Novak, David ^{[1
]}

Perego, Raffaele ^{[2
]}

Rabitti, Fausto ^{[2
]}

Sedmidubsky, Jan ^{[1
]}

Zezula, Pavel ^{[1
]}

机构：

[1] Masaryk Univ, Fac Informat, Brno, Czech Republic

[2] CNR, ISTI, I-56100 Pisa, Italy

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2010年 / 47卷 / 03期

关键词：

Similarity search; Content-based image retrieval; Metric space; MPEG-7; descriptors; Peer-to-peer search network; IMPLEMENTATION;

D O I：

10.1007/s11042-009-0339-z

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

As the number of digital images is growing fast and Content-based Image Retrieval (CBIR) is gaining in popularity, CBIR systems should leap towards Web-scale datasets. In this paper, we report on our experience in building an experimental similarity search system on a test collection of more than 50 million images. The first big challenge we have been facing was obtaining a collection of images of this scale with the corresponding descriptive features. We have tackled the non-trivial process of image crawling and extraction of several MPEG-7 descriptors. The result of this effort is a test collection, the first of such scale, opened to the research community for experiments and comparisons. The second challenge was to develop indexing and searching mechanisms able to scale to the target size and to answer similarity queries in real-time. We have achieved this goal by creating sophisticated centralized and distributed structures based purely on the metric space model of data. We have joined them together which has resulted in an extremely flexible and scalable solution. In this paper, we study in detail the performance of this technology and its evolvement as the data volume grows by three orders of magnitude. The results of the experiments are very encouraging and promising for future applications.

引用

页码：599 / 629

页数：31

共 50 条

[31] Building a Web-Scale Dependency-Parsed Corpus from Common Crawl
Panchenko, Alexander
Ruppert, Eugen
Faralli, Stefano
Ponzetto, Simone P.
Biemann, Chris
PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 1816 - 1823
[32] Usability Testing of a Web-Scale Discovery System at an Academic Library
Comeaux, David J.
COLLEGE & UNDERGRADUATE LIBRARIES, 2012, 19 (2-4) : 189 - 206
[33] Web-Scale Media Recommendation Systems
Dror, Gideon
Koenigstein, Noam
Koren, Yehuda
PROCEEDINGS OF THE IEEE, 2012, 100 (09) : 2722 - 2736
[34] Web-Scale Extraction of Structured Data
Cafarella, Michael J.
Madhavan, Jayant
Halevy, Alon
SIGMOD RECORD, 2008, 37 (04) : 55 - 61
[35] Web-Scale Multimedia Information Networks
Qi, Guo-Jun
Tsai, Min-Hsuan
Tsai, Shen-Fu
Cao, Liangliang
Huang, Thomas S.
PROCEEDINGS OF THE IEEE, 2012, 100 (09) : 2688 - 2704
[36] Web-Scale Image Retrieval Using Compact Tensor Aggregation of Visual Descriptors
Negrel, Romain
Picard, David
Gosselin, Philippe-Henri
IEEE MULTIMEDIA, 2013, 20 (03) : 24 - 33
[37] Web-Scale Human Task Management
Schulte, Daniel
SOFTWARE ARCHITECTURE, 2011, 6903 : 190 - 193
[38] Web-Scale Training for Face Identification
Taigman, Yaniv
Yang, Ming
Ranzato, Marc'Aurelio
Wolf, Lior
2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 2746 - 2754
[39] Social Web-Scale Provenance in the Cloud
Simmhan, Yogesh
Gomadam, Karthik
PROVENANCE AND ANNOTATION OF DATA AND PROCESSES, 2010, 6378 : 298 - 300
[40] Maze: A Cost-Efficient Video Deduplication System at Web-scale
Qin, An
Xiao, Mengbai
Huang, Ben
Zhang, Xiaodong
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3163 - 3172

← 1 2 3 4 5 →