Harvesting Image Databases from the Web

被引:98
|
作者
Schroff, Florian [1 ]
Criminisi, Antonio [2 ]
Zisserman, Andrew [3 ]
机构
[1] Univ Calif San Diego, Dept Comp Sci & Engn, San Diego, CA 92093 USA
[2] Microsoft Res Cambridge, Cambridge CB3 0FB, England
[3] Univ Oxford, Dept Engn Sci, Robot Res Grp, Oxford OX1 3PJ, England
关键词
Weakly supervised; computer vision; object recognition; image retrieval;
D O I
10.1109/TPAMI.2010.133
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The objective of this work is to automatically generate a large number of images for a specified object class. A multimodal approach employing both text, metadata, and visual features is used to gather many high-quality images from the Web. Candidate images are obtained by a text-based Web search querying on the object identifier (e.g., the word penguin). The Webpages and the images they contain are downloaded. The task is then to remove irrelevant images and rerank the remainder. First, the images are reranked based on the text surrounding the image and metadata features. A number of methods are compared for this reranking. Second, the top-ranked images are used as (noisy) training data and an SVM visual classifier is learned to improve the ranking further. We investigate the sensitivity of the cross-validation procedure to this noisy training data. The principal novelty of the overall method is in combining text/metadata and visual features in order to achieve a completely automatic ranking of the images. Examples are given for a selection of animals, vehicles, and other classes, totaling 18 classes. The results are assessed by precision/recall curves on ground-truth annotated data and by comparison to previous approaches, including those of Berg and Forsyth [5] and Fergus et al. [12].
引用
收藏
页码:754 / 766
页数:13
相关论文
共 50 条
  • [1] Harvesting image databases from the web
    Schroff, F.
    Criminisi, A.
    Zisserman, A.
    [J]. 2007 IEEE 11TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOLS 1-6, 2007, : 2120 - +
  • [2] Harvesting models from web 2.0 databases
    Oscar Díaz
    Gorka Puente
    Javier Luis Cánovas Izquierdo
    Jesús García Molina
    [J]. Software & Systems Modeling, 2013, 12 : 15 - 34
  • [3] Harvesting models from web 2.0 databases
    Diaz, Oscar
    Puente, Gorka
    Canovas Izquierdo, Javier Luis
    Garcia Molina, Jesus
    [J]. SOFTWARE AND SYSTEMS MODELING, 2013, 12 (01): : 15 - 34
  • [4] Harvesting Large-Scale Weakly-Tagged Image Databases from the Web
    Fan, Jianping
    Shen, Yi
    Zhou, Ning
    Gao, Yuli
    [J]. 2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, : 802 - 809
  • [5] Learning concept templates from web images to query personal image databases
    Wu, Yi
    Bouguet, Jean-Yves
    Nefian, Ara
    Kozintsev, Igor V.
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-5, 2007, : 1986 - 1989
  • [6] Automatic Categorization of Image Databases using Web Folksonomies
    Capasso, Pasquale
    Chianese, Angelo
    Moscato, Vincenzo
    Penta, Antonio
    Picariello, Antonio
    [J]. ISM: 2008 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, 2008, : 685 - 690
  • [7] Rank Discovery From Web Databases
    Thirumuruganathan, Saravanan
    Zhang, Nan
    Das, Gautam
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (13): : 1582 - 1593
  • [8] Presenting interactive image databases on the Web using Java']Java
    Wertheim, SL
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1998, : 1097 - 1097
  • [9] An architecture for streamlining the implementation of biomedical text/image databases on the Web
    Bopf, M
    Coleman, T
    Long, LR
    Antani, S
    Thoma, GR
    Jeronimo, J
    Schiffman, M
    [J]. 17TH IEEE SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, PROCEEDINGS, 2004, : 563 - 568
  • [10] Learning-based Incremental Creation of Web Image Databases
    George, Marian
    Ghanem, Nagia
    Ismail, M. A.
    [J]. 2013 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2013), VOL 1, 2013, : 424 - 429