Cross-Modal Correlation Learning by Adaptive Hierarchical Semantic Aggregation

被引：44

作者：

Hua, Yan ^{[1
,2
,3
]}

Wang, Shuhui ^{[1
,3
]}

Liu, Siyuan ^{[4
,5
]}

Cai, Anni ^{[2
]}

Huang, Qingming ^{[1
,6
]}

机构：

[1] Chinese Acad Sci, Inst Comp Technol, Key Lab Intellectual Informat Proc, Beijing 100190, Peoples R China

[2] Beijing Univ Posts & Telecommun, Sch Informat & Commun Engn, Beijing 100876, Peoples R China

[3] Commun Univ China, Sch Informat Engn, Beijing 100024, Peoples R China

[4] Penn State Univ, Smeal Coll Business, University Pk, PA 16801 USA

[5] Shenzhen Inst Adv Technol, Inst Adv Comp & Digital Engn, Ctr Cloud Comp, Shenzhen 518055, Peoples R China

[6] Univ Chinese Acad Sci, Beijing 100049, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2016年 / 18卷 / 06期

基金：

加拿大自然科学与工程研究理事会; 美国国家科学基金会;

关键词：

Cross-modal retrieval; localized correlation learning; semantic hierarchy; SIMILARITY;

D O I：

10.1109/TMM.2016.2535864

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the explosive growth of web data, effective and efficient technologies are in urgent need for retrieving semantically relevant contents of heterogeneous modalities. Previous studies devote efforts to modeling simple cross-modal statistical dependencies, and globally projecting the heterogeneous modalities into a measurable subspace. However, global projections cannot appropriately adapt to diverse contents, and the naturally existing multilevel semantic relation in web data is ignored. We study the problem of semantic coherent retrieval, where documents from different modalities should be ranked by the semantic relevance to the query. Accordingly, we propose TINA, a correlation learning method by adaptive hierarchical semantic aggregation. First, by joint modeling of content and ontology similarities, we build a semantic hierarchy to measure multilevel semantic relevance. Second, with a set of local linear projections and probabilistic membership functions, we propose two paradigms for local expert aggregation, i.e., local projection aggregation and local distance aggregation. To learn the cross-modal projections, we optimize the structure risk objective function that involves semantic coherence measurement, local projection consistency, and the complexity penalty of local projections. Compared to existing approaches, a better bias-variance tradeoff is achieved by TINA in real-world cross-modal correlation learning tasks. Extensive experiments on widely used NUS-WIDE and ICML-Challenge for image-text retrieval demonstrate that TINA better adapts to the multilevel semantic relation and content divergence, and, thus, outperforms state of the art with better semantic coherence.

引用

页码：1201 / 1216

页数：16

共 50 条

[21] Adaptive Marginalized Semantic Hashing for Unpaired Cross-Modal Retrieval
Luo, Kaiyi
Zhang, Chao
Li, Huaxiong
Jia, Xiuyi
Chen, Chunlin
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 9082 - 9095
[22] Cross-Modal Retrieval Based on Semantic Filtering and Adaptive Pooling
Qiao, Nan
Mao, Junyi
Xie, Hao
Wang, Zhiguo
Yin, Guangqiang
PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND NETWORKS, VOL II, CENET 2023, 2024, 1126 : 296 - 310
[23] Cross-modal semantic priming
Tabossi, P
LANGUAGE AND COGNITIVE PROCESSES, 1996, 11 (06): : 569 - 576
[24] Cross-Modal Semantic Communications
Li, Ang
Wei, Xin
Wu, Dan
Zhou, Liang
IEEE WIRELESS COMMUNICATIONS, 2022, 29 (06) : 144 - 151
[25] Deep Multigraph Hierarchical Enhanced Semantic Representation for Cross-Modal Retrieval
Zhu, Lei
Zhang, Chengyuan
Song, Jiayu
Zhang, Shichao
Tian, Chunwei
Zhu, Xinghui
IEEE MULTIMEDIA, 2022, 29 (03) : 17 - 26
[26] Cross-Modal Correlation Learning with Deep Convolutional Architecture
Hua, Yan
Tian, Hu
Cai, Anni
Shi, Ping
2015 VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2015,
[27] CROSS-MODAL LEARNING TO RANK WITH ADAPTIVE LISTWISE CONSTRAINT
Qu, Guangzhuo
Xiao, Jing
Zhu, Jia
Cao, Yang
Huang, Changqin
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 1658 - 1662
[28] Adaptive Adversarial Learning based cross-modal retrieval
Li, Zhuoyi
Lu, Huibin
Fu, Hao
Wang, Zhongrui
Gu, Guanghun
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 123
[29] Adaptive Cross-Modal Few-shot Learning
Xing, Chen
Rostamzadeh, Negar
Oreshkin, Boris N.
Pinheiro, Pedro O.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[30] Modal-adversarial Semantic Learning Network for Extendable Cross-modal Retrieval
Xu, Xing
Song, Jingkuan
Lu, Huimin
Yang, Yang
Shen, Fumin
Huang, Zi
ICMR '18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2018, : 46 - 54

← 1 2 3 4 5 →