Based on Spatial and Temporal Implicit Semantic Relational Inference for Cross-Modal Retrieval

被引：0

作者：

Jin M. ^{[1
]}

Hu W. ^{[1
]}

Zhu L. ^{[2
]}

Wang X. ^{[3
]}

Hong R. ^{[1
]}

机构：

[1] School of Computer and Information, Hefei University of Technology, Hefei

[2] School of Electronic and Information Engineering, Tongji University, Shanghai

[3] School of Data Science, University of Science and Technology of China, Hefei

来源：

IEEE Transactions on Circuits and Systems for Video Technology | 2024年 / 34卷 / 11期

关键词：

Computational modeling; cross-modal retrieval; Data models; Feature extraction; semantic alignment; semantic mining; Semantics; Task analysis; temporal space inference; Training; Visualization;

D O I：

10.1109/TCSVT.2024.3411298

中图分类号：

学科分类号：

摘要：

To meet users’ demands for video retrieval, text-video cross-modal retrieval technology continues to evolve. Methods based on pre-trained models and transfer learning are widely employed in designing cross-modal retrieval models, significantly enhancing the accuracy of video retrieval. However, these methods exhibit shortcomings when it comes to studying the relationships between video frames, preventing the model from fully establishing the hidden semantic relationships within video features. To further deduce the implicit semantic relationships among video frames, we propose a cross-modal retrieval model based on graph convolutional networks (GCN) and visual semantic inference (GVSI). The GCN is utilized to establish relationships between video frame features, facilitating the mining of hidden semantic information across video frames. In order to use text semantic features to help the model to infer temporal and implicit semantic information between video frames, we introduce a semantic mining and temporal space (SM&TS) inference module. Additionally, we design semantic alignment modules (SA_M) to align explicit and implicit object features present in both video and text. Finally, we analyze and validate the effectiveness of the model using MSR-VTT, MSVD, and LSMDC datasets. IEEE

引用

页码：1 / 1

共 50 条

[21] Multi-Modal Relational Graph for Cross-Modal Video Moment Retrieval
Zeng, Yawen
Cao, Da
Wei, Xiaochi
Liu, Meng
Zhao, Zhou
Qin, Zheng
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 2215 - 2224
[22] Abstraction and Association: Cross-Modal Retrieval Based on Consistency between Semantic Structures
Zheng, Qibin
Ren, Xiaoguang
Liu, Yi
Qin, Wei
MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020
[23] Cross-modal video retrieval algorithm based on multi-semantic clues
Ding L.
Li Y.
Yu C.
Liu Y.
Wang X.
Qi S.
Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2021, 47 (03): : 596 - 604
[24] Latent Space Semantic Supervision Based on Knowledge Distillation for Cross-Modal Retrieval
Zhang, Li
Wu, Xiangqian
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 7154 - 7164
[25] Adversarial Learning-Based Semantic Correlation Representation for Cross-Modal Retrieval
Zhu, Lei
Song, Jiayu
Zhu, Xiaofeng
Zhang, Chengyuan
Zhang, Shichao
Yuan, Xinpan
IEEE MULTIMEDIA, 2020, 27 (04) : 79 - 90
[26] Cross-Modal Retrieval Based on Semantic Auto-Encoder and Hash Learning
Lu, Zhu
Fang, Deng
Kun, Liu
Tingting, He
Yuanyuan, Liu
Data Analysis and Knowledge Discovery, 2021, 5 (12) : 110 - 122
[27] Multi-attention based semantic deep hashing for cross-modal retrieval
Zhu, Liping
Tian, Gangyi
Wang, Bingyao
Wang, Wenjie
Zhang, Di
Li, Chengyang
APPLIED INTELLIGENCE, 2021, 51 (08) : 5927 - 5939
[28] Semantic-Consistent and Multilayer Similarity Based Cross-Modal Hashing Retrieval
Liu, Yuanyuan
Wang, Xiaoyan
Zhang, Yuxin
Zhu, Lu
Data Analysis and Knowledge Discovery, 2024, 8 (07) : 89 - 102
[29] Multi-attention based semantic deep hashing for cross-modal retrieval
Liping Zhu
Gangyi Tian
Bingyao Wang
Wenjie Wang
Di Zhang
Chengyang Li
Applied Intelligence, 2021, 51 : 5927 - 5939
[30] Deep Semantic Correlation Learning based Hashing for Multimedia Cross-Modal Retrieval
Gong, Xiaolong
Huang, Linpeng
Wang, Fuwei
2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2018, : 117 - 126

← 1 2 3 4 5 →