Sparse Multi-Modal Topical Coding for Image Annotation

被引:11
|
作者
Song, Lingyun [1 ]
Luo, Minnan [1 ]
Liu, Jun [1 ]
Zhang, Lingling [1 ]
Qian, Buyue [1 ]
Li, Max Haifei [2 ]
Zheng, Qinghua [1 ]
机构
[1] Xi An Jiao Tong Univ, Dept Comp Sci & Technol, SPKLSTN Lab, Xian 710049, Peoples R China
[2] Union Univ, Dept Comp Sci, Jackson, TN 38305 USA
基金
美国国家科学基金会;
关键词
Topic models; Sparse latent representation; Image annotation; Image retrieval; REGULARIZATION; REPRESENTATION; COMPLETION;
D O I
10.1016/j.neucom.2016.06.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image annotation plays a significant role in large scale image understanding, indexing and retrieval. The Probability Topic Models (PTMs) attempt to address this issue by learning latent representations of input samples, and have been shown to be effective by existing studies. Though useful, PTM has some limitations in interpreting the latent representations of images and texts, which if addressed would broaden its applicability. In this paper, we introduce sparsity to PTM to improve the interpretability of the inferred latent representations. Extending the Sparse Topical Coding that originally designed for unimodal documents learning, we propose a non-probabilistic formulation of PTM for automatic image annotation, namely Sparse Multi-Modal Topical Coding. Beyond controlling the sparsity, our model can capture more compact correlations between words and image regions. Empirical results on some benchmark datasets show that our model achieves better performance on automatic image annotation and text-based image retrieval over the baseline models. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:162 / 174
页数:13
相关论文
共 50 条
  • [41] Fast Multi-Modal Unified Sparse Representation Learning
    Verma, Mridula
    Shukla, Kaushal Kumar
    PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17), 2017, : 448 - 452
  • [42] LEARNING UNIFIED SPARSE REPRESENTATIONS FOR MULTI-MODAL DATA
    Wang, Kaiye
    Wang, Wei
    Wang, Liang
    2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2015, : 3545 - 3549
  • [43] On the Effectiveness of Images in Multi-modal Text Classification: An Annotation Study
    Ma, Chunpeng
    Shen, Aili
    Yoshikawa, Hiyori
    Iwakura, Tomoya
    Beck, Daniel
    Baldwin, Timothy
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (03)
  • [44] An approach to multi-modal multi-view video coding
    Zhang, Yun
    Jiang, Gangyi
    Yi, Wenjuan
    Yu, Mei
    Jiang, Zhidi
    Kim, Yong Deak
    2006 8TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-4, 2006, : 1405 - +
  • [45] Cross-modal attention for multi-modal image registration
    Song, Xinrui
    Chao, Hanqing
    Xu, Xuanang
    Guo, Hengtao
    Xu, Sheng
    Turkbey, Baris
    Wood, Bradford J.
    Sanford, Thomas
    Wang, Ge
    Yan, Pingkun
    MEDICAL IMAGE ANALYSIS, 2022, 82
  • [46] MMDF-LDA: An improved Multi-Modal Latent Dirichlet Allocation model for social image annotation
    Liu Zheng
    Zhang Caiming
    Chen Caixian
    EXPERT SYSTEMS WITH APPLICATIONS, 2018, 104 : 168 - 184
  • [47] Mode reconstruction for source coding and multi-modal control
    Austin, A
    Egerstedt, M
    HYBRID SYSTEMS: COMPUTATION AND CONTROL, PROCEEDINGS, 2003, 2623 : 36 - 49
  • [48] Unified losses for multi-modal pose coding and regression
    Johnson, Leif
    Cooper, Joseph
    Ballard, Dana
    2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,
  • [49] MULTI-MODAL IMAGE STITCHING WITH NONLINEAR OPTIMIZATION
    Saha, Arindam
    Maity, Soumyadip
    Bhowmick, Brojeshwar
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 1987 - 1991
  • [50] Multi-Modal Deformable Medical Image Registration
    Fookes, Clinton
    Sridharan, Sridha
    ICSPCS: 2ND INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION SYSTEMS, PROCEEDINGS, 2008, : 661 - 669