Video Captioning with Tube Features

被引:0
|
作者
Zhao, Bin [1 ,2 ]
Li, Xuelong [3 ]
Lu, Xiaoqiang [3 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Peoples R China
[2] Northwestern Polytech Univ, Ctr OPT IMagery Anal & Learning OPTIMAL, Xian 710072, Peoples R China
[3] Chinese Acad Sci, Xian Inst Opt & Precis Mech, Xian 710119, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual feature plays an important role in the video captioning task. Considering that the video content is mainly composed of the activities of salient objects, it has restricted the caption quality of current approaches which just focus on global frame features while paying less attention to the salient objects. To tackle this problem, in this paper, we design an object-aware feature for video captioning, denoted as tube feature. Firstly, Faster-RCNN is employed to extract object regions in frames, and a tube generation method is developed to connect the regions from different frames but belonging to the same object. After that, an encoder-decoder architecture is constructed for video caption generation. Specifically, the encoder is a bi-directional LSTM, which is utilized to capture the dynamic information of each tube. The decoder is a single LSTM extended with an attention model, which enables our approach to adaptively attend to the most correlated tubes when generating the caption. We evaluate our approach on two benchmark datasets: MSVD and Charades. The experimental results have demonstrated the effectiveness of tube feature in the video captioning task.
引用
收藏
页码:1177 / 1183
页数:7
相关论文
共 50 条
  • [31] Weakly Supervised Dense Video Captioning
    Shen, Zhiqiang
    Li, Jianguo
    Su, Zhou
    Li, Minjun
    Chen, Yurong
    Jiang, Yu-Gang
    Xue, Xiangyang
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5159 - 5167
  • [32] Sequential Memory Modelling for Video Captioning
    Puttaraja
    Nayaka, Chidambara
    Manikesh
    Sharma, Nitin
    Anand, Kumar M.
    [J]. 2022 IEEE 19TH INDIA COUNCIL INTERNATIONAL CONFERENCE, INDICON, 2022,
  • [33] Rethinking Network for Classroom Video Captioning
    Zhu, Mingjian
    Duan, Chenrui
    Yu, Changbin
    [J]. TWELFTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING SYSTEMS, 2021, 11719
  • [34] Adaptive Curriculum Learning for Video Captioning
    Li, Shanhao
    Yang, Bang
    Zou, Yuexian
    [J]. IEEE ACCESS, 2022, 10 : 31751 - 31759
  • [35] Understanding temporal structure for video captioning
    Sah, Shagan
    Nguyen, Thang
    Ptucha, Ray
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2020, 23 (01) : 147 - 159
  • [36] Video Captioning Time Stamp Calculation
    Guo, Yun
    [J]. ICSIT 2010: INTERNATIONAL CONFERENCE ON SOCIETY AND INFORMATION TECHNOLOGIES (POST-CONFERENCE EDITION), 2010, : 12 - 16
  • [37] Video Captioning with Transferred Semantic Attributes
    Pan, Yingwei
    Yao, Ting
    Li, Houqiang
    Mei, Tao
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 984 - 992
  • [38] Understanding temporal structure for video captioning
    Shagan Sah
    Thang Nguyen
    Ray Ptucha
    [J]. Pattern Analysis and Applications, 2020, 23 : 147 - 159
  • [39] Dense Video Captioning for Incomplete Videos
    Dang, Xuan
    Wang, Guolong
    Xiong, Kun
    Qin, Zheng
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2021, PT V, 2021, 12895 : 665 - 676
  • [40] An Efficient Framework for Dense Video Captioning
    Suin, Maitreya
    Rajagopalan, A. N.
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 12039 - 12046