Video Captioning with Visual and Semantic Features

被引:5
|
作者
Lee, Sujin [1 ]
Kim, Incheol [2 ]
机构
[1] Kyonggi Univ, Dept Comp Sci, Grad Sch, Suwon, South Korea
[2] Kyonggi Univ, Dept Comp Sci, Suwon, South Korea
来源
关键词
Attention-Based Caption Generation; Deep Neural Networks; Semantic Feature; Video Captioning;
D O I
10.3745/JIPS.02.0098
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Video captioning refers to the process of extracting features from a video and generating video captions using the extracted features. This paper introduces a deep neural network model and its learning method for effective video captioning. In this study, visual features as well as semantic features, which effectively express the video, are also used. The visual features of the video are extracted using convolutional neural networks, such as C3D and ResNet, while the semantic features are extracted using a semantic feature extraction network proposed in this paper. Further, an attention-based caption generation network is proposed for effective generation of video captions using the extracted features. The performance and effectiveness of the proposed model is verified through various experiments using two large-scale video benchmarks such as the Microsoft Video Description (MSVD) and the Microsoft Research Video-To-Text (MSR-VTT).
引用
收藏
页码:1318 / 1330
页数:13
相关论文
共 50 条
  • [41] A Video Captioning Method by Semantic Topic-Guided Generation
    Ye, Ou
    Wei, Xinli
    Yu, Zhenhua
    Fu, Yan
    Yang, Ying
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 78 (01): : 1071 - 1093
  • [42] Video Captioning Based on Channel Soft Attention and Semantic Reconstructor
    Lei, Zhou
    Huang, Yiyong
    FUTURE INTERNET, 2021, 13 (02) : 1 - 18
  • [43] Video Captioning With Attention-Based LSTM and Semantic Consistency
    Gao, Lianli
    Guo, Zhao
    Zhang, Hanwang
    Xu, Xing
    Shen, Heng Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2017, 19 (09) : 2045 - 2055
  • [44] Video captioning algorithm based on mixed training and semantic association
    Chen, Shuqin
    Zhong, Xian
    Huang, Wenxin
    Lu, Yansheng
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2023, 51 (11): : 67 - 74
  • [45] Semantic Enhanced Video Captioning with Multi-feature Fusion
    Niu, Tian-Zi
    Dong, Shan-Shan
    Chen, Zhen-Duo
    Luo, Xin
    Guo, Shanqing
    Huang, Zi
    Xu, Xin-Shun
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (06)
  • [46] Refined Semantic Enhancement towards Frequency Diffusion for Video Captioning
    Zhong, Xian
    Li, Zipeng
    Chen, Shuqin
    Jiang, Kui
    Chen, Chen
    Ye, Mang
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3724 - 3732
  • [47] Set Prediction Guided by Semantic Concepts for Diverse Video Captioning
    Lu, Yifan
    Zhang, Ziqi
    Yuan, Chunfeng
    Li, Peng
    Wang, Yan
    Li, Bing
    Hu, Weiming
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3909 - 3917
  • [48] Learning topic emotion and logical semantic for video paragraph captioning
    Li, Qinyu
    Wang, Hanli
    Yi, Xiaokai
    DISPLAYS, 2024, 83
  • [49] Fused GRU with semantic-temporal attention for video captioning
    Gao, Lianli
    Wang, Xuanhan
    Song, Jingkuan
    Liu, Yang
    NEUROCOMPUTING, 2020, 395 : 222 - 228
  • [50] Visual Relation-Aware Unsupervised Video Captioning
    Ji, Puzhao
    Cao, Meng
    Zou, Yuexian
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT III, 2022, 13531 : 495 - 507