Deep Top-k Ranking for Image-Sentence Matching

被引:30
|
作者
Zhang, Lingling [1 ]
Luo, Minnan [2 ]
Liu, Jun [2 ]
Chang, Xiaojun [3 ]
Yang, Yi [4 ]
Hauptmann, Alexander G. [5 ]
机构
[1] Xi An Jiao Tong Univ, Sch Elect & Informat Engn, Key Lab Intelligent Networks & Network Secur, Minist Educ, Xian 710049, Peoples R China
[2] Xi An Jiao Tong Univ, Natl Engn Lab Big Data Analyt, Xian 710049, Peoples R China
[3] Monash Univ, Fac Informat Technol, Clayton Vic 3800, Australia
[4] Univ Technol Sydney, Ctr Quantum Computat & Intelligent Syst, Ultimo, NSW 2007, Australia
[5] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
基金
澳大利亚研究理事会; 中国国家自然科学基金;
关键词
Image-sentence matching; cross-modal retrieval; deep learning; top-k ranking; FUSION;
D O I
10.1109/TMM.2019.2931352
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Image-sentence matching is a challenging task for the heterogeneity-gap between different modalities. Ranking-based methods have achieved excellent performance in this task in past decades. Given an image query, these methods typically assume that the correct matched image-sentence pair must rank before all other mismatched ones. However, this assumption may be too strict and prone to the overfitting problem, especially when some sentences in a massive database are similar and confusable with one another. In this paper, we relax the traditional ranking loss and propose a novel deep multi-modal network with a top-k ranking loss to mitigate the data ambiguity problem. With this strategy, query results will not be penalized unless the index of ground truth is outside the range of top-k query results. Considering the non-smoothness and non-convexity of the initial top-k ranking loss, we exploit a tight convex upper bound to approximate the loss and then utilize the traditional back-propagation algorithm to optimize the deep multi-modal network. Finally, we apply the method on three benchmark datasets, namely, Flickr8k, Flickr30k, and MSCOCO. Empirical results on metrics R@K (K = 1, 5, 10) show that our method achieves comparable performance in comparison to state-of-the-art methods.
引用
收藏
页码:775 / 785
页数:11
相关论文
共 50 条
  • [1] Bidirectional image-sentence retrieval by local and global deep matching
    Ma, Lin
    Jiang, Wenhao
    Jie, Zequn
    Wang, Xu
    [J]. NEUROCOMPUTING, 2019, 345 : 36 - 44
  • [2] Dynamic Pruning of Regions for Image-Sentence Matching
    Wu, Jie
    Liu, Weifeng
    Wang, Leiquan
    Shen, Xiuxuan
    Wei, Yiwei
    Wu, Chunlei
    [J]. SIGNAL PROCESSING-IMAGE COMMUNICATION, 2023, 117
  • [3] Is Top-k Sufficient for Ranking?
    Lan, Yanyan
    Niu, Shuzi
    Guo, Jiafeng
    Cheng, Xueqi
    [J]. PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 1261 - 1270
  • [4] Adversarial Top-K Ranking
    Suh, Changho
    Tan, Vincent Y. F.
    Zhao, Renbo
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2017, 63 (04) : 2201 - 2225
  • [5] Saliency-Guided Attention Network for Image-Sentence Matching
    Ji, Zhong
    Wang, Haoran
    Han, Jungong
    Pang, Yanwei
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5753 - 5762
  • [6] Top-k Ranking Bayesian Optimization
    Quoc Phong Nguyen
    Tay, Sebastian
    Low, Bryan Kian Hsiang
    Jaillet, Patrick
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 9135 - 9143
  • [7] Top-K Ranking Deep Contextual Bandits for Information Selection Systems
    Freeman, Jade
    Rawson, Michael
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 2209 - 2214
  • [8] Efficient Top-k Data Sources Ranking for Query on Deep Web
    Shen, Derong
    Li, Meifang
    Yu, Ge
    Kou, Yue
    Nie, Tiezheng
    [J]. WEB INFORMATION SYSTEMS ENGINEERING - WISE 2008, PROCEEDINGS, 2008, 5175 : 321 - 336
  • [9] Deep Convolutional Neural Network for Bidirectional Image-Sentence Mapping
    Yu, Tianyuan
    Bai, Liang
    Guo, Jinlin
    Yang, Zheng
    Xie, Yuxiang
    [J]. MULTIMEDIA MODELING, MMM 2017, PT II, 2017, 10133 : 136 - 147
  • [10] Modality-Invariant Image-Text Embedding for Image-Sentence Matching
    Liu, Ruoyu
    Zhao, Yao
    Wei, Shikui
    Zheng, Liang
    Yang, Yi
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (01)