A probabilistic semantic model for image annotation and multi-modal image retrieval

被引:18
|
作者
Zhang, Ruofei [1 ]
Zhang, Zhongfei
Li, Mingjing
Ma, Wei-Ying
Zhang, Hong-Jiang
机构
[1] SUNY Binghamton, Dept Comp Sci, Binghamton, NY 13902 USA
[2] Microsoft Res Asia, Beijing 100080, Peoples R China
关键词
image annotation; multi-modal image retrieval; probabilistic semantic model; evaluation;
D O I
10.1007/s00530-006-0025-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper addresses automatic image annotation problem and its application to multi-modal image retrieval. The contribution of our work is three-fold. (1) We propose a probabilistic semantic model in which the visual features and the textual words are connected via a hidden layer which constitutes the semantic concepts to be discovered to explicitly exploit the synergy among the modalities. (2) The association of visual features and textual words is determined in a Bayesian framework such that the confidence of the association can be provided. (3) Extensive evaluation on a large-scale, visually and semantically diverse image collection crawled from Web is reported to evaluate the prototype system based on the model. In the proposed probabilistic model, a hidden concept layer which connects the visual feature and the word layer is discovered by fitting a generative model to the training image and annotation words through an Expectation-Maximization (EM) based iterative learning procedure. The evaluation of the prototype system on 17,000 images and 7736 automatically extracted annotation words from crawled Web pages for multi-modal image retrieval has indicated that the proposed semantic model and the developed Bayesian framework are superior to a state-of-the-art peer system in the literature.
引用
收藏
页码:27 / 33
页数:7
相关论文
共 50 条
  • [31] Sparse multi-modal probabilistic latent semantic analysis for single-image super-resolution
    Fernandez-Beltran, Ruben
    Pla, Filiberto
    SIGNAL PROCESSING, 2018, 152 : 227 - 237
  • [32] A Multi-Modal Incompleteness Ontology model (MMIO) to enhance information fusion for image retrieval
    Poslad, Stefan
    Kesorn, Kraisak
    INFORMATION FUSION, 2014, 20 : 225 - 241
  • [33] Multi-Modal Medical Image Matching Based on Multi-Task Learning and Semantic-Enhanced Cross-Modal Retrieval
    Zhang, Yilin
    TRAITEMENT DU SIGNAL, 2023, 40 (05) : 2041 - 2049
  • [34] An integrative semantic framework for image annotation and retrieval
    Osman, Taha
    Thakker, Dhavalkamar
    Schaefer, Gerald
    Lakin, Phil
    PROCEEDINGS OF THE IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE: WI 2007, 2007, : 366 - +
  • [35] Fusing semantic aspects for image annotation and retrieval
    Li, Zhixin
    Shi, Zhiping
    Liu, Xi
    Li, Zhiqing
    Shi, Zhongzhi
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2010, 21 (08) : 798 - 805
  • [36] Semantic Scene Classification for Image Annotation and Retrieval
    Cavus, Oezge
    Aksoy, Selim
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 2008, 5342 : 402 - 410
  • [37] Research on graphical annotation and retrieval of image semantic
    Li, Qian-Qian
    Yang, Ai-Min
    PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 1565 - 1569
  • [38] Multi-modal image retrieval with random walk on multi-layer graphs
    Khasanova, Renata
    Dong, Xiaowen
    Frossard, Pascal
    PROCEEDINGS OF 2016 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2016, : 1 - 6
  • [39] Multi-modal semantic autoencoder for cross-modal retrieval
    Wu, Yiling
    Wang, Shuhui
    Huang, Qingming
    NEUROCOMPUTING, 2019, 331 : 165 - 175
  • [40] MRIS: A Multi-modal Retrieval Approach for Image Synthesis on Diverse Modalities
    Chen, Boqi
    Niethammer, Marc
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT X, 2023, 14229 : 271 - 281