Based on Spatial and Temporal Implicit Semantic Relational Inference for Cross-Modal Retrieval

被引：0

作者：

Jin M. ^{[1
]}

Hu W. ^{[1
]}

Zhu L. ^{[2
]}

Wang X. ^{[3
]}

Hong R. ^{[1
]}

机构：

[1] School of Computer and Information, Hefei University of Technology, Hefei

[2] School of Electronic and Information Engineering, Tongji University, Shanghai

[3] School of Data Science, University of Science and Technology of China, Hefei

来源：

IEEE Transactions on Circuits and Systems for Video Technology | 2024年 / 34卷 / 11期

关键词：

Computational modeling; cross-modal retrieval; Data models; Feature extraction; semantic alignment; semantic mining; Semantics; Task analysis; temporal space inference; Training; Visualization;

D O I：

10.1109/TCSVT.2024.3411298

中图分类号：

学科分类号：

摘要：

To meet users’ demands for video retrieval, text-video cross-modal retrieval technology continues to evolve. Methods based on pre-trained models and transfer learning are widely employed in designing cross-modal retrieval models, significantly enhancing the accuracy of video retrieval. However, these methods exhibit shortcomings when it comes to studying the relationships between video frames, preventing the model from fully establishing the hidden semantic relationships within video features. To further deduce the implicit semantic relationships among video frames, we propose a cross-modal retrieval model based on graph convolutional networks (GCN) and visual semantic inference (GVSI). The GCN is utilized to establish relationships between video frame features, facilitating the mining of hidden semantic information across video frames. In order to use text semantic features to help the model to infer temporal and implicit semantic information between video frames, we introduce a semantic mining and temporal space (SM&TS) inference module. Additionally, we design semantic alignment modules (SA_M) to align explicit and implicit object features present in both video and text. Finally, we analyze and validate the effectiveness of the model using MSR-VTT, MSVD, and LSMDC datasets. IEEE

引用

页码：1 / 1

共 50 条

[41] Semantic preserving asymmetric discrete hashing for cross-modal retrieval
Yang, Fan
Zhang, Qiao-xi
Ding, Xiao-jian
Ma, Fu-min
Cao, Jie
Tong, De-yu
APPLIED INTELLIGENCE, 2023, 53 (12) : 15352 - 15371
[42] Discrete semantic embedding hashing for scalable cross-modal retrieval
Liu, Junjie
Fei, Lunke
Jia, Wei
Zhao, Shuping
Wen, Jie
Teng, Shaohua
Zhang, Wei
2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 1461 - 1467
[43] Deep supervised multimodal semantic autoencoder for cross-modal retrieval
Tian, Yu
Yang, Wenjing
Liu, Qingsong
Yang, Qiong
COMPUTER ANIMATION AND VIRTUAL WORLDS, 2020, 31 (4-5)
[44] ONION: Online Semantic Autoencoder Hashing for Cross-Modal Retrieval
Zhang, Donglin
Wu, Xiao-Jun
Chen, Guoqing
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2023, 14 (02)
[45] Hierarchical Semantic Structure Preserving Hashing for Cross-Modal Retrieval
Wang, Di
Zhang, Caiping
Wang, Quan
Tian, Yumin
He, Lihuo
Zhao, Lin
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1217 - 1229
[46] Deep semantic hashing with dual attention for cross-modal retrieval
Wu, Jiagao
Weng, Weiwei
Fu, Junxia
Liu, Linfeng
Hu, Bin
NEURAL COMPUTING & APPLICATIONS, 2022, 34 (07): : 5397 - 5416
[47] Deep Semantic Correlation with Adversarial Learning for Cross-Modal Retrieval
Hua, Yan
Du, Jianhe
PROCEEDINGS OF 2019 IEEE 9TH INTERNATIONAL CONFERENCE ON ELECTRONICS INFORMATION AND EMERGENCY COMMUNICATION (ICEIEC 2019), 2019, : 252 - 255
[48] Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval
Song, Yale
Soleymani, Mohammad
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1979 - 1988
[49] Semantic Constraints Matrix Factorization Hashing for cross-modal retrieval
Li, Weian
Xiong, Haixia
Ou, Weihua
Gou, Jianping
Deng, Jiaxing
Liang, Linqing
Zhou, Quan
COMPUTERS & ELECTRICAL ENGINEERING, 2022, 100
[50] Discrete Semantic Matrix Factorization Hashing for Cross-Modal Retrieval
Qin, Jianyang
Fei, Lunke
Teng, Shaohua
Zhang, Wei
Liu, Dongning
Zhao, Genping
Yuan, Haoliang
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 1550 - 1557

← 1 2 3 4 5 →