Cross-Modal Correlation Learning by Adaptive Hierarchical Semantic Aggregation

被引:44
|
作者
Hua, Yan [1 ,2 ,3 ]
Wang, Shuhui [1 ,3 ]
Liu, Siyuan [4 ,5 ]
Cai, Anni [2 ]
Huang, Qingming [1 ,6 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Key Lab Intellectual Informat Proc, Beijing 100190, Peoples R China
[2] Beijing Univ Posts & Telecommun, Sch Informat & Commun Engn, Beijing 100876, Peoples R China
[3] Commun Univ China, Sch Informat Engn, Beijing 100024, Peoples R China
[4] Penn State Univ, Smeal Coll Business, University Pk, PA 16801 USA
[5] Shenzhen Inst Adv Technol, Inst Adv Comp & Digital Engn, Ctr Cloud Comp, Shenzhen 518055, Peoples R China
[6] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
基金
加拿大自然科学与工程研究理事会; 美国国家科学基金会;
关键词
Cross-modal retrieval; localized correlation learning; semantic hierarchy; SIMILARITY;
D O I
10.1109/TMM.2016.2535864
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the explosive growth of web data, effective and efficient technologies are in urgent need for retrieving semantically relevant contents of heterogeneous modalities. Previous studies devote efforts to modeling simple cross-modal statistical dependencies, and globally projecting the heterogeneous modalities into a measurable subspace. However, global projections cannot appropriately adapt to diverse contents, and the naturally existing multilevel semantic relation in web data is ignored. We study the problem of semantic coherent retrieval, where documents from different modalities should be ranked by the semantic relevance to the query. Accordingly, we propose TINA, a correlation learning method by adaptive hierarchical semantic aggregation. First, by joint modeling of content and ontology similarities, we build a semantic hierarchy to measure multilevel semantic relevance. Second, with a set of local linear projections and probabilistic membership functions, we propose two paradigms for local expert aggregation, i.e., local projection aggregation and local distance aggregation. To learn the cross-modal projections, we optimize the structure risk objective function that involves semantic coherence measurement, local projection consistency, and the complexity penalty of local projections. Compared to existing approaches, a better bias-variance tradeoff is achieved by TINA in real-world cross-modal correlation learning tasks. Extensive experiments on widely used NUS-WIDE and ICML-Challenge for image-text retrieval demonstrate that TINA better adapts to the multilevel semantic relation and content divergence, and, thus, outperforms state of the art with better semantic coherence.
引用
收藏
页码:1201 / 1216
页数:16
相关论文
共 50 条
  • [31] Collaboratively Semantic Alignment and Metric Learning for Cross-Modal Hashing
    Li, Jiaxing
    Wong, Wai Keung
    Jiang, Lin
    Jiang, Kaihang
    Fang, Xiaozhao
    Xie, Shengli
    Wen, Jie
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2025, 37 (05) : 2311 - 2328
  • [32] Semantic consistency cross-modal dictionary learning with rank constraint
    Shang, Fei
    Zhang, Huaxiang
    Sun, Jiande
    Liu, Li
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 62 : 259 - 266
  • [33] Hierarchical discriminant feature learning for cross-modal face recognition
    Xu, Xiaolin
    Li, Yidong
    Jin, Yi
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (45-46) : 33483 - 33502
  • [34] Hierarchical discriminant feature learning for cross-modal face recognition
    Xiaolin Xu
    Yidong Li
    Yi Jin
    Multimedia Tools and Applications, 2020, 79 : 33483 - 33502
  • [35] Cross-Modal Learning Based on Semantic Correlation and Multi-Task Learning for Text-Video Retrieval
    Wu, Xiaoyu
    Wang, Tiantian
    Wang, Shengjin
    ELECTRONICS, 2020, 9 (12) : 1 - 17
  • [36] Semantic deep cross-modal hashing
    Lin, Qiubin
    Cao, Wenming
    He, Zhihai
    He, Zhiquan
    NEUROCOMPUTING, 2020, 396 (396) : 113 - 122
  • [37] Cross-modal semantic priming in schizophrenia
    Surguladze, S
    Rossell, S
    Rabe-Hesketh, S
    David, AS
    JOURNAL OF THE INTERNATIONAL NEUROPSYCHOLOGICAL SOCIETY, 2002, 8 (07) : 884 - 892
  • [38] Cross-modal & Cross-domain Learning for Unsupervised LiDAR Semantic Segmentation
    Chen, Yiyang
    Zhao, Shanshan
    Ding, Changxing
    Tang, Liyao
    Wang, Chaoyue
    Tao, Dacheng
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3866 - 3875
  • [39] THE DEVELOPMENT OF CROSS-MODAL SEMANTIC INTEGRATION
    MURRAY, S
    BULLETIN OF THE BRITISH PSYCHOLOGICAL SOCIETY, 1982, 35 (MAY): : 214 - 214
  • [40] Cross-Modal Remote Sensing Image-Audio Retrieval With Adaptive Learning for Aligning Correlation
    Huang, Jinghao
    Chen, Yaxiong
    Xiong, Shengwu
    Lu, Xiaoqiang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62