Cross-Modal Correlation Learning by Adaptive Hierarchical Semantic Aggregation

被引:44
|
作者
Hua, Yan [1 ,2 ,3 ]
Wang, Shuhui [1 ,3 ]
Liu, Siyuan [4 ,5 ]
Cai, Anni [2 ]
Huang, Qingming [1 ,6 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Key Lab Intellectual Informat Proc, Beijing 100190, Peoples R China
[2] Beijing Univ Posts & Telecommun, Sch Informat & Commun Engn, Beijing 100876, Peoples R China
[3] Commun Univ China, Sch Informat Engn, Beijing 100024, Peoples R China
[4] Penn State Univ, Smeal Coll Business, University Pk, PA 16801 USA
[5] Shenzhen Inst Adv Technol, Inst Adv Comp & Digital Engn, Ctr Cloud Comp, Shenzhen 518055, Peoples R China
[6] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
基金
加拿大自然科学与工程研究理事会; 美国国家科学基金会;
关键词
Cross-modal retrieval; localized correlation learning; semantic hierarchy; SIMILARITY;
D O I
10.1109/TMM.2016.2535864
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the explosive growth of web data, effective and efficient technologies are in urgent need for retrieving semantically relevant contents of heterogeneous modalities. Previous studies devote efforts to modeling simple cross-modal statistical dependencies, and globally projecting the heterogeneous modalities into a measurable subspace. However, global projections cannot appropriately adapt to diverse contents, and the naturally existing multilevel semantic relation in web data is ignored. We study the problem of semantic coherent retrieval, where documents from different modalities should be ranked by the semantic relevance to the query. Accordingly, we propose TINA, a correlation learning method by adaptive hierarchical semantic aggregation. First, by joint modeling of content and ontology similarities, we build a semantic hierarchy to measure multilevel semantic relevance. Second, with a set of local linear projections and probabilistic membership functions, we propose two paradigms for local expert aggregation, i.e., local projection aggregation and local distance aggregation. To learn the cross-modal projections, we optimize the structure risk objective function that involves semantic coherence measurement, local projection consistency, and the complexity penalty of local projections. Compared to existing approaches, a better bias-variance tradeoff is achieved by TINA in real-world cross-modal correlation learning tasks. Extensive experiments on widely used NUS-WIDE and ICML-Challenge for image-text retrieval demonstrate that TINA better adapts to the multilevel semantic relation and content divergence, and, thus, outperforms state of the art with better semantic coherence.
引用
收藏
页码:1201 / 1216
页数:16
相关论文
共 50 条
  • [21] Adaptive Marginalized Semantic Hashing for Unpaired Cross-Modal Retrieval
    Luo, Kaiyi
    Zhang, Chao
    Li, Huaxiong
    Jia, Xiuyi
    Chen, Chunlin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 9082 - 9095
  • [22] Cross-Modal Retrieval Based on Semantic Filtering and Adaptive Pooling
    Qiao, Nan
    Mao, Junyi
    Xie, Hao
    Wang, Zhiguo
    Yin, Guangqiang
    PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND NETWORKS, VOL II, CENET 2023, 2024, 1126 : 296 - 310
  • [23] Cross-modal semantic priming
    Tabossi, P
    LANGUAGE AND COGNITIVE PROCESSES, 1996, 11 (06): : 569 - 576
  • [24] Cross-Modal Semantic Communications
    Li, Ang
    Wei, Xin
    Wu, Dan
    Zhou, Liang
    IEEE WIRELESS COMMUNICATIONS, 2022, 29 (06) : 144 - 151
  • [25] Deep Multigraph Hierarchical Enhanced Semantic Representation for Cross-Modal Retrieval
    Zhu, Lei
    Zhang, Chengyuan
    Song, Jiayu
    Zhang, Shichao
    Tian, Chunwei
    Zhu, Xinghui
    IEEE MULTIMEDIA, 2022, 29 (03) : 17 - 26
  • [26] Cross-Modal Correlation Learning with Deep Convolutional Architecture
    Hua, Yan
    Tian, Hu
    Cai, Anni
    Shi, Ping
    2015 VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2015,
  • [27] CROSS-MODAL LEARNING TO RANK WITH ADAPTIVE LISTWISE CONSTRAINT
    Qu, Guangzhuo
    Xiao, Jing
    Zhu, Jia
    Cao, Yang
    Huang, Changqin
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 1658 - 1662
  • [28] Adaptive Adversarial Learning based cross-modal retrieval
    Li, Zhuoyi
    Lu, Huibin
    Fu, Hao
    Wang, Zhongrui
    Gu, Guanghun
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 123
  • [29] Adaptive Cross-Modal Few-shot Learning
    Xing, Chen
    Rostamzadeh, Negar
    Oreshkin, Boris N.
    Pinheiro, Pedro O.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [30] Modal-adversarial Semantic Learning Network for Extendable Cross-modal Retrieval
    Xu, Xing
    Song, Jingkuan
    Lu, Huimin
    Yang, Yang
    Shen, Fumin
    Huang, Zi
    ICMR '18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2018, : 46 - 54