Based on Spatial and Temporal Implicit Semantic Relational Inference for Cross-Modal Retrieval

被引:0
|
作者
Jin M. [1 ]
Hu W. [1 ]
Zhu L. [2 ]
Wang X. [3 ]
Hong R. [1 ]
机构
[1] School of Computer and Information, Hefei University of Technology, Hefei
[2] School of Electronic and Information Engineering, Tongji University, Shanghai
[3] School of Data Science, University of Science and Technology of China, Hefei
关键词
Computational modeling; cross-modal retrieval; Data models; Feature extraction; semantic alignment; semantic mining; Semantics; Task analysis; temporal space inference; Training; Visualization;
D O I
10.1109/TCSVT.2024.3411298
中图分类号
学科分类号
摘要
To meet users’ demands for video retrieval, text-video cross-modal retrieval technology continues to evolve. Methods based on pre-trained models and transfer learning are widely employed in designing cross-modal retrieval models, significantly enhancing the accuracy of video retrieval. However, these methods exhibit shortcomings when it comes to studying the relationships between video frames, preventing the model from fully establishing the hidden semantic relationships within video features. To further deduce the implicit semantic relationships among video frames, we propose a cross-modal retrieval model based on graph convolutional networks (GCN) and visual semantic inference (GVSI). The GCN is utilized to establish relationships between video frame features, facilitating the mining of hidden semantic information across video frames. In order to use text semantic features to help the model to infer temporal and implicit semantic information between video frames, we introduce a semantic mining and temporal space (SM&TS) inference module. Additionally, we design semantic alignment modules (SA_M) to align explicit and implicit object features present in both video and text. Finally, we analyze and validate the effectiveness of the model using MSR-VTT, MSVD, and LSMDC datasets. IEEE
引用
收藏
页码:1 / 1
相关论文
共 50 条
  • [21] Multi-Modal Relational Graph for Cross-Modal Video Moment Retrieval
    Zeng, Yawen
    Cao, Da
    Wei, Xiaochi
    Liu, Meng
    Zhao, Zhou
    Qin, Zheng
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 2215 - 2224
  • [22] Abstraction and Association: Cross-Modal Retrieval Based on Consistency between Semantic Structures
    Zheng, Qibin
    Ren, Xiaoguang
    Liu, Yi
    Qin, Wei
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020
  • [23] Cross-modal video retrieval algorithm based on multi-semantic clues
    Ding L.
    Li Y.
    Yu C.
    Liu Y.
    Wang X.
    Qi S.
    Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2021, 47 (03): : 596 - 604
  • [24] Latent Space Semantic Supervision Based on Knowledge Distillation for Cross-Modal Retrieval
    Zhang, Li
    Wu, Xiangqian
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 7154 - 7164
  • [25] Adversarial Learning-Based Semantic Correlation Representation for Cross-Modal Retrieval
    Zhu, Lei
    Song, Jiayu
    Zhu, Xiaofeng
    Zhang, Chengyuan
    Zhang, Shichao
    Yuan, Xinpan
    IEEE MULTIMEDIA, 2020, 27 (04) : 79 - 90
  • [26] Cross-Modal Retrieval Based on Semantic Auto-Encoder and Hash Learning
    Lu, Zhu
    Fang, Deng
    Kun, Liu
    Tingting, He
    Yuanyuan, Liu
    Data Analysis and Knowledge Discovery, 2021, 5 (12) : 110 - 122
  • [27] Multi-attention based semantic deep hashing for cross-modal retrieval
    Zhu, Liping
    Tian, Gangyi
    Wang, Bingyao
    Wang, Wenjie
    Zhang, Di
    Li, Chengyang
    APPLIED INTELLIGENCE, 2021, 51 (08) : 5927 - 5939
  • [28] Semantic-Consistent and Multilayer Similarity Based Cross-Modal Hashing Retrieval
    Liu, Yuanyuan
    Wang, Xiaoyan
    Zhang, Yuxin
    Zhu, Lu
    Data Analysis and Knowledge Discovery, 2024, 8 (07) : 89 - 102
  • [29] Multi-attention based semantic deep hashing for cross-modal retrieval
    Liping Zhu
    Gangyi Tian
    Bingyao Wang
    Wenjie Wang
    Di Zhang
    Chengyang Li
    Applied Intelligence, 2021, 51 : 5927 - 5939
  • [30] Deep Semantic Correlation Learning based Hashing for Multimedia Cross-Modal Retrieval
    Gong, Xiaolong
    Huang, Linpeng
    Wang, Fuwei
    2018 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2018, : 117 - 126