A probabilistic semantic model for image annotation and multi-modal image retrieval

被引:18
|
作者
Zhang, Ruofei [1 ]
Zhang, Zhongfei
Li, Mingjing
Ma, Wei-Ying
Zhang, Hong-Jiang
机构
[1] SUNY Binghamton, Dept Comp Sci, Binghamton, NY 13902 USA
[2] Microsoft Res Asia, Beijing 100080, Peoples R China
关键词
image annotation; multi-modal image retrieval; probabilistic semantic model; evaluation;
D O I
10.1007/s00530-006-0025-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper addresses automatic image annotation problem and its application to multi-modal image retrieval. The contribution of our work is three-fold. (1) We propose a probabilistic semantic model in which the visual features and the textual words are connected via a hidden layer which constitutes the semantic concepts to be discovered to explicitly exploit the synergy among the modalities. (2) The association of visual features and textual words is determined in a Bayesian framework such that the confidence of the association can be provided. (3) Extensive evaluation on a large-scale, visually and semantically diverse image collection crawled from Web is reported to evaluate the prototype system based on the model. In the proposed probabilistic model, a hidden concept layer which connects the visual feature and the word layer is discovered by fitting a generative model to the training image and annotation words through an Expectation-Maximization (EM) based iterative learning procedure. The evaluation of the prototype system on 17,000 images and 7736 automatically extracted annotation words from crawled Web pages for multi-modal image retrieval has indicated that the proposed semantic model and the developed Bayesian framework are superior to a state-of-the-art peer system in the literature.
引用
收藏
页码:27 / 33
页数:7
相关论文
共 50 条
  • [41] Multi-modal web image retrieval using manifold-ranking
    He, Ruhan
    Jin, Hai
    Tao, Wenbing
    DYNAMICS OF CONTINUOUS DISCRETE AND IMPULSIVE SYSTEMS-SERIES B-APPLICATIONS & ALGORITHMS, 2007, 14 : 228 - 232
  • [42] Potential Semantics in Multi-Modal Relevance Feedback Information for Image Retrieval
    Li, Jiyi
    Ma, Qiang
    Asano, Yasuhito
    Yoshikawa, Masatoshi
    2013 IEEE 37TH ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), 2013, : 830 - 831
  • [43] Optimized transfer learning based multi-modal medical image retrieval
    Abid, Muhammad Haris
    Ashraf, Rehan
    Mahmood, Toqeer
    Faisal, C. M. Nadeem
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (15) : 44069 - 44100
  • [44] Online Multi-Modal Distance Metric Learning with Application to Image Retrieval
    Wu, Pengcheng
    Hoi, Steven C. H.
    Zhao, Peilin
    Miao, Chunyan
    Liu, Zhi-Yong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (02) : 454 - 467
  • [45] Optimized transfer learning based multi-modal medical image retrieval
    Muhammad Haris Abid
    Rehan Ashraf
    Toqeer Mahmood
    C. M. Nadeem Faisal
    Multimedia Tools and Applications, 2024, 83 : 44069 - 44100
  • [46] Automatic image annotation and semantic based image retrieval for medical domain
    Burdescu, Dumitru Dan
    Mihai, Cristian Gabriel
    Stanescu, Liana
    Brezovan, Marius
    NEUROCOMPUTING, 2013, 109 : 33 - 48
  • [47] Efficient text-image semantic search: A multi-modal vision-language approach for fashion retrieval
    Moro, Gianluca
    Salvatori, Stefano
    Frisoni, Giacomo
    NEUROCOMPUTING, 2023, 538
  • [48] Using Multi-Modal Semantic Association Rules to fuse keywords and visual features automatically for Web image retrieval
    He, Ruhan
    Xiong, Naixue
    Yang, Laurence T.
    Park, Jong Hyuk
    INFORMATION FUSION, 2011, 12 (03) : 223 - 230
  • [49] MMDF-LDA: An improved Multi-Modal Latent Dirichlet Allocation model for social image annotation
    Liu Zheng
    Zhang Caiming
    Chen Caixian
    EXPERT SYSTEMS WITH APPLICATIONS, 2018, 104 : 168 - 184
  • [50] Multi-modal information retrieval with a semantic view mechanism
    Li, Q
    Yang, J
    Zhuang, YT
    19TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOL 1, PROCEEDINGS: AINA 2005, 2005, : 133 - 138