Deep Top-k Ranking for Image-Sentence Matching

被引：30

作者：

Zhang, Lingling ^{[1
]}

Luo, Minnan ^{[2
]}

Liu, Jun ^{[2
]}

Chang, Xiaojun ^{[3
]}

Yang, Yi ^{[4
]}

Hauptmann, Alexander G. ^{[5
]}

机构：

[1] Xi An Jiao Tong Univ, Sch Elect & Informat Engn, Key Lab Intelligent Networks & Network Secur, Minist Educ, Xian 710049, Peoples R China

[2] Xi An Jiao Tong Univ, Natl Engn Lab Big Data Analyt, Xian 710049, Peoples R China

[3] Monash Univ, Fac Informat Technol, Clayton Vic 3800, Australia

[4] Univ Technol Sydney, Ctr Quantum Computat & Intelligent Syst, Ultimo, NSW 2007, Australia

[5] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2020年 / 22卷 / 03期

基金：

澳大利亚研究理事会; 中国国家自然科学基金;

关键词：

Image-sentence matching; cross-modal retrieval; deep learning; top-k ranking; FUSION;

D O I：

10.1109/TMM.2019.2931352

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Image-sentence matching is a challenging task for the heterogeneity-gap between different modalities. Ranking-based methods have achieved excellent performance in this task in past decades. Given an image query, these methods typically assume that the correct matched image-sentence pair must rank before all other mismatched ones. However, this assumption may be too strict and prone to the overfitting problem, especially when some sentences in a massive database are similar and confusable with one another. In this paper, we relax the traditional ranking loss and propose a novel deep multi-modal network with a top-k ranking loss to mitigate the data ambiguity problem. With this strategy, query results will not be penalized unless the index of ground truth is outside the range of top-k query results. Considering the non-smoothness and non-convexity of the initial top-k ranking loss, we exploit a tight convex upper bound to approximate the loss and then utilize the traditional back-propagation algorithm to optimize the deep multi-modal network. Finally, we apply the method on three benchmark datasets, namely, Flickr8k, Flickr30k, and MSCOCO. Empirical results on metrics R@K (K = 1, 5, 10) show that our method achieves comparable performance in comparison to state-of-the-art methods.

引用

页码：775 / 785

页数：11

共 50 条

[31] Decoupled Cross-Modal Phrase-Attention Network for Image-Sentence Matching
Shi, Zhangxiang
Zhang, Tianzhu
Wei, Xi
Wu, Feng
Zhang, Yongdong
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 1326 - 1337
[32] PBSM: An Effcient Top-K Subgraph Matching Algorithm
Chen, Wei
Liu, Jia
Chen, Ziyang
Tang, Xian
Li, Kaiyu
[J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2018, 32 (06)
[33] Expressive top-k matching for conditional graph patterns
Mahfoud, Houari
[J]. NEURAL COMPUTING & APPLICATIONS, 2022, 34 (17): : 14205 - 14221
[34] Why Not Yet: Fixing a Top-k Ranking that Is Not Fair to Individuals
Chen, Zixuan
Manolios, Panagiotis
Riedewald, Mirek
[J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (09): : 2377 - 2390
[35] Optimal Enumeration: Efficient Top-k Tree Matching
Chang, Lijun
Lin, Xuemin
Zhang, Wenjie
Yu, Jeffrey Xu
Zhang, Ying
Qin, Lu
[J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2015, 8 (05): : 533 - 544
[36] Ranking uncertain sky: The probabilistic top-k skyline operator
Zhang, Ying
Zhang, Wenjie
Lin, Xuemin
Jiang, Bin
Pei, Jian
[J]. INFORMATION SYSTEMS, 2011, 36 (05) : 898 - 915
[37] A Top-K Retrieval algorithm based on a decomposition of ranking functions
Madrid, Nicolas
Rusnok, Pavel
[J]. INFORMATION SCIENCES, 2019, 474 : 136 - 153
[38] Indexable Bayesian Personalized Ranking for Effiicient Top-k Recommendation
Le, Dung D.
Lauw, Hady W.
[J]. CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 1389 - 1398
[39] Expressive top-k matching for conditional graph patterns
Houari Mahfoud
[J]. Neural Computing and Applications, 2022, 34 : 14205 - 14221
[40] Distributed Top-k Subgraph Matching in A Big Graph
Gao, Jianliang
Lei, Chuqi
Tian, Ling
Ling, Yuan
Chen, Zheng
Song, Bo
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 5325 - 5327

← 1 2 3 4 5 →