Sparse Multi-Modal Topical Coding for Image Annotation

被引:11
|
作者
Song, Lingyun [1 ]
Luo, Minnan [1 ]
Liu, Jun [1 ]
Zhang, Lingling [1 ]
Qian, Buyue [1 ]
Li, Max Haifei [2 ]
Zheng, Qinghua [1 ]
机构
[1] Xi An Jiao Tong Univ, Dept Comp Sci & Technol, SPKLSTN Lab, Xian 710049, Peoples R China
[2] Union Univ, Dept Comp Sci, Jackson, TN 38305 USA
基金
美国国家科学基金会;
关键词
Topic models; Sparse latent representation; Image annotation; Image retrieval; REGULARIZATION; REPRESENTATION; COMPLETION;
D O I
10.1016/j.neucom.2016.06.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image annotation plays a significant role in large scale image understanding, indexing and retrieval. The Probability Topic Models (PTMs) attempt to address this issue by learning latent representations of input samples, and have been shown to be effective by existing studies. Though useful, PTM has some limitations in interpreting the latent representations of images and texts, which if addressed would broaden its applicability. In this paper, we introduce sparsity to PTM to improve the interpretability of the inferred latent representations. Extending the Sparse Topical Coding that originally designed for unimodal documents learning, we propose a non-probabilistic formulation of PTM for automatic image annotation, namely Sparse Multi-Modal Topical Coding. Beyond controlling the sparsity, our model can capture more compact correlations between words and image regions. Empirical results on some benchmark datasets show that our model achieves better performance on automatic image annotation and text-based image retrieval over the baseline models. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:162 / 174
页数:13
相关论文
共 50 条
  • [31] Multi-Modal Sparse Tracking by Jointing Timing and Modal Consistency
    Li, Jiajun
    Fang, Bin
    Zhou, Mingliang
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2022, 36 (06)
  • [32] Multi-Modal Image Retrieval by Integrating Web Image Annotation, Concept Matching and Fuzzy Ranking Techniques
    Su, Ja-Hwung
    Wang, Bo-Wen
    Hsu, Tien-Yu
    Chou, Chien-Li
    Tseng, Vincent S.
    INTERNATIONAL JOURNAL OF FUZZY SYSTEMS, 2010, 12 (02) : 136 - 149
  • [33] Designing a symmetric classifier for image annotation using multi-layer sparse coding
    Tariq, Amara
    Foroosh, Hassan
    IMAGE AND VISION COMPUTING, 2018, 69 : 33 - 43
  • [34] AUTOMATIC IMAGE ANNOTATION VIA LOCAL SPARSE CODING
    Zhang, Wenbo
    Tian, Dongping
    Hu, Hong
    Zhao, Xiaofei
    Shi, Zhongzhi
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 1661 - 1665
  • [35] Multiview Hessian discriminative sparse coding for image annotation
    Liu, Weifeng
    Tao, Dacheng
    Cheng, Jun
    Tang, Yuanyan
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2014, 118 : 50 - 60
  • [36] Semantically Multi-modal Image Synthesis
    Zhu, Zhen
    Xu, Zhiliang
    You, Ansheng
    Bai, Xiang
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 5466 - 5475
  • [37] Multi-modal semantic image segmentation
    Pemasiri, Akila
    Kien Nguyen
    Sridharan, Sridha
    Fookes, Clinton
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2021, 202
  • [38] Erratum to: Jointly Image Annotation and Classification Based on Supervised Multi-Modal Hierarchical Semantic Model
    Chun-yan Yin
    Yong-Heng Chen
    Wan-li Zuo
    Pattern Recognition and Image Analysis, 2020, 30 : 566 - 566
  • [39] Multi-modal Medical Image Fusion based on Two-scale Image Decomposition and Sparse Representation
    Maqsood, Sarmad
    Javed, Umer
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2020, 57
  • [40] Multi-Modal Image Fusion via Sparse Representation and Multi-Scale Anisotropic Guided Measure
    Zhang, Shuai
    Huang, Fuyu
    Zhong, Hui
    Liu, Bingqi
    Chen, Yichao
    Wang, Ziang
    IEEE ACCESS, 2020, 8 : 35638 - 35649