Cross-Modal Correlation Learning by Adaptive Hierarchical Semantic Aggregation

被引：44

作者：

Hua, Yan ^{[1
,2
,3
]}

Wang, Shuhui ^{[1
,3
]}

Liu, Siyuan ^{[4
,5
]}

Cai, Anni ^{[2
]}

Huang, Qingming ^{[1
,6
]}

机构：

[1] Chinese Acad Sci, Inst Comp Technol, Key Lab Intellectual Informat Proc, Beijing 100190, Peoples R China

[2] Beijing Univ Posts & Telecommun, Sch Informat & Commun Engn, Beijing 100876, Peoples R China

[3] Commun Univ China, Sch Informat Engn, Beijing 100024, Peoples R China

[4] Penn State Univ, Smeal Coll Business, University Pk, PA 16801 USA

[5] Shenzhen Inst Adv Technol, Inst Adv Comp & Digital Engn, Ctr Cloud Comp, Shenzhen 518055, Peoples R China

[6] Univ Chinese Acad Sci, Beijing 100049, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2016年 / 18卷 / 06期

基金：

加拿大自然科学与工程研究理事会; 美国国家科学基金会;

关键词：

Cross-modal retrieval; localized correlation learning; semantic hierarchy; SIMILARITY;

D O I：

10.1109/TMM.2016.2535864

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the explosive growth of web data, effective and efficient technologies are in urgent need for retrieving semantically relevant contents of heterogeneous modalities. Previous studies devote efforts to modeling simple cross-modal statistical dependencies, and globally projecting the heterogeneous modalities into a measurable subspace. However, global projections cannot appropriately adapt to diverse contents, and the naturally existing multilevel semantic relation in web data is ignored. We study the problem of semantic coherent retrieval, where documents from different modalities should be ranked by the semantic relevance to the query. Accordingly, we propose TINA, a correlation learning method by adaptive hierarchical semantic aggregation. First, by joint modeling of content and ontology similarities, we build a semantic hierarchy to measure multilevel semantic relevance. Second, with a set of local linear projections and probabilistic membership functions, we propose two paradigms for local expert aggregation, i.e., local projection aggregation and local distance aggregation. To learn the cross-modal projections, we optimize the structure risk objective function that involves semantic coherence measurement, local projection consistency, and the complexity penalty of local projections. Compared to existing approaches, a better bias-variance tradeoff is achieved by TINA in real-world cross-modal correlation learning tasks. Extensive experiments on widely used NUS-WIDE and ICML-Challenge for image-text retrieval demonstrate that TINA better adapts to the multilevel semantic relation and content divergence, and, thus, outperforms state of the art with better semantic coherence.

引用

页码：1201 / 1216

页数：16

共 50 条

[31] Collaboratively Semantic Alignment and Metric Learning for Cross-Modal Hashing
Li, Jiaxing
Wong, Wai Keung
Jiang, Lin
Jiang, Kaihang
Fang, Xiaozhao
Xie, Shengli
Wen, Jie
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2025, 37 (05) : 2311 - 2328
[32] Semantic consistency cross-modal dictionary learning with rank constraint
Shang, Fei
Zhang, Huaxiang
Sun, Jiande
Liu, Li
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 62 : 259 - 266
[33] Hierarchical discriminant feature learning for cross-modal face recognition
Xu, Xiaolin
Li, Yidong
Jin, Yi
MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (45-46) : 33483 - 33502
[34] Hierarchical discriminant feature learning for cross-modal face recognition
Xiaolin Xu
Yidong Li
Yi Jin
Multimedia Tools and Applications, 2020, 79 : 33483 - 33502
[35] Cross-Modal Learning Based on Semantic Correlation and Multi-Task Learning for Text-Video Retrieval
Wu, Xiaoyu
Wang, Tiantian
Wang, Shengjin
ELECTRONICS, 2020, 9 (12) : 1 - 17
[36] Semantic deep cross-modal hashing
Lin, Qiubin
Cao, Wenming
He, Zhihai
He, Zhiquan
NEUROCOMPUTING, 2020, 396 (396) : 113 - 122
[37] Cross-modal semantic priming in schizophrenia
Surguladze, S
Rossell, S
Rabe-Hesketh, S
David, AS
JOURNAL OF THE INTERNATIONAL NEUROPSYCHOLOGICAL SOCIETY, 2002, 8 (07) : 884 - 892
[38] Cross-modal & Cross-domain Learning for Unsupervised LiDAR Semantic Segmentation
Chen, Yiyang
Zhao, Shanshan
Ding, Changxing
Tang, Liyao
Wang, Chaoyue
Tao, Dacheng
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3866 - 3875
[39] THE DEVELOPMENT OF CROSS-MODAL SEMANTIC INTEGRATION
MURRAY, S
BULLETIN OF THE BRITISH PSYCHOLOGICAL SOCIETY, 1982, 35 (MAY): : 214 - 214
[40] Cross-Modal Remote Sensing Image-Audio Retrieval With Adaptive Learning for Aligning Correlation
Huang, Jinghao
Chen, Yaxiong
Xiong, Shengwu
Lu, Xiaoqiang
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62

← 1 2 3 4 5 →