Based on Spatial and Temporal Implicit Semantic Relational Inference for Cross-Modal Retrieval

被引:0
|
作者
Jin M. [1 ]
Hu W. [1 ]
Zhu L. [2 ]
Wang X. [3 ]
Hong R. [1 ]
机构
[1] School of Computer and Information, Hefei University of Technology, Hefei
[2] School of Electronic and Information Engineering, Tongji University, Shanghai
[3] School of Data Science, University of Science and Technology of China, Hefei
关键词
Computational modeling; cross-modal retrieval; Data models; Feature extraction; semantic alignment; semantic mining; Semantics; Task analysis; temporal space inference; Training; Visualization;
D O I
10.1109/TCSVT.2024.3411298
中图分类号
学科分类号
摘要
To meet users’ demands for video retrieval, text-video cross-modal retrieval technology continues to evolve. Methods based on pre-trained models and transfer learning are widely employed in designing cross-modal retrieval models, significantly enhancing the accuracy of video retrieval. However, these methods exhibit shortcomings when it comes to studying the relationships between video frames, preventing the model from fully establishing the hidden semantic relationships within video features. To further deduce the implicit semantic relationships among video frames, we propose a cross-modal retrieval model based on graph convolutional networks (GCN) and visual semantic inference (GVSI). The GCN is utilized to establish relationships between video frame features, facilitating the mining of hidden semantic information across video frames. In order to use text semantic features to help the model to infer temporal and implicit semantic information between video frames, we introduce a semantic mining and temporal space (SM&TS) inference module. Additionally, we design semantic alignment modules (SA_M) to align explicit and implicit object features present in both video and text. Finally, we analyze and validate the effectiveness of the model using MSR-VTT, MSVD, and LSMDC datasets. IEEE
引用
收藏
页码:1 / 1
相关论文
共 50 条
  • [31] Spatial-Temporal Graphs for Cross-Modal Text2Video Retrieval
    Song, Xue
    Chen, Jingjing
    Wu, Zuxuan
    Jiang, Yu-Gang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 2914 - 2923
  • [32] Semantics-Aware Spatial-Temporal Binaries for Cross-Modal Video Retrieval
    Qi, Mengshi
    Qin, Jie
    Yang, Yi
    Wang, Yunhong
    Luo, Jiebo
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 2989 - 3004
  • [33] Joint Semantic Preserving Sparse Hashing for Cross-Modal Retrieval
    Hu, Zhikai
    Cheung, Yiu-Ming
    Li, Mengke
    Lan, Weichao
    Zhang, Donglin
    Liu, Qiang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (04) : 2989 - 3002
  • [34] Information Aggregation Semantic Adversarial Network for Cross-Modal Retrieval
    Wang, Hongfei
    Feng, Aimin
    Liu, Xuejun
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [35] Deep semantic hashing with dual attention for cross-modal retrieval
    Jiagao Wu
    Weiwei Weng
    Junxia Fu
    Linfeng Liu
    Bin Hu
    Neural Computing and Applications, 2022, 34 : 5397 - 5416
  • [36] Adaptive Marginalized Semantic Hashing for Unpaired Cross-Modal Retrieval
    Luo, Kaiyi
    Zhang, Chao
    Li, Huaxiong
    Jia, Xiuyi
    Chen, Chunlin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 9082 - 9095
  • [37] Semantic Boosting Cross-Modal Hashing for efficient multimedia retrieval
    Wang, Ke
    Tang, Jun
    Wang, Nian
    Shao, Ling
    INFORMATION SCIENCES, 2016, 330 : 199 - 210
  • [38] Semantic preserving asymmetric discrete hashing for cross-modal retrieval
    Fan Yang
    Qiao-xi Zhang
    Xiao-jian Ding
    Fu-min Ma
    Jie Cao
    De-yu Tong
    Applied Intelligence, 2023, 53 : 15352 - 15371
  • [39] Deep semantic similarity adversarial hashing for cross-modal retrieval
    Qiang, Haopeng
    Wan, Yuan
    Xiang, Lun
    Meng, Xiaojing
    NEUROCOMPUTING, 2020, 400 : 24 - 33
  • [40] Cross-Modal Image-Text Retrieval with Semantic Consistency
    Chen, Hui
    Ding, Guiguang
    Lin, Zijin
    Zhao, Sicheng
    Han, Jungong
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1749 - 1757