VIDEO CAPTIONING WITH TEMPORAL AND REGION GRAPH CONVOLUTION NETWORK

被引:3
|
作者
Xiao, Xinlong [1 ]
Zhang, Yuejie [1 ]
Feng, Rui [1 ]
Zhang, Tao [2 ]
Gao, Shang [3 ]
Fan, Weiguo [4 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai Key Lab Intelligent Informat Proc, Shanghai, Peoples R China
[2] Shanghai Univ Finance & Econn, Sch Informat Managerment & Engn, Shanghai, Peoples R China
[3] Deakin Univ, Sch Informat Technol, Geelong, Vic, Australia
[4] Univ Iowa, Tippie Coll Business, Dept Business Analyt, Iowa City, IA 52242 USA
基金
中国国家自然科学基金;
关键词
Video Captioning; Graph Convolution Network; Temporal Graph Network; Region Graph Network; Language Generation Model;
D O I
10.1109/icme46284.2020.9102967
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Video captioning aims to generate a natural language description for a given video clip that includes not only spatial information but also temporal information. To better exploit such spatial-temporal information attached to videos, we propose a novel video captioning framework with Temporal Graph Network (TGN) and Region Graph Network (RGN). TGN mainly focuses on utilizing the sequential information of frames that most of existing methods ignore. RGN is designed to explore the relationships among salient objects. Different from previous work, we introduce Graph Convolution Network (GCN) to encode frames with their sequential information and build a region graph for utilizing object information. We also particularly adopt a stack GRU decoder with a coarse-to-fine structure for caption generation. Very promising experimental results on two benchmark datasets (MSVD and MSR-VTT) show the effectiveness of our model.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] Spatio-Temporal Graph Attention Convolution Network for Traffic Flow Forecasting
    Liu, Kun
    Zhu, Yifan
    Wang, Xiao
    Ji, Hongya
    Huang, Chengfei
    [J]. TRANSPORTATION RESEARCH RECORD, 2024, 2678 (09) : 136 - 149
  • [42] Spatio-temporal interactive graph convolution network for vehicle trajectory prediction
    Shen, Guojiang
    Li, Pengfei
    Chen, Zhiyu
    Yang, Yao
    Kong, Xiangjie
    [J]. INTERNET OF THINGS, 2023, 24
  • [43] An Optimized Temporal-Spatial Gated Graph Convolution Network for Traffic Forecasting
    Guo, Kan
    Hu, Yongli
    Sun, Yanfeng
    Qian, Zhen
    Gao, Junbin
    Yin, Baocai
    [J]. IEEE INTELLIGENT TRANSPORTATION SYSTEMS MAGAZINE, 2022, 14 (01) : 153 - 162
  • [44] Attribute prediction of spatio-temporal graph nodes based on weighted graph diffusion convolution network
    Ding, Linlin
    Yu, Haiyou
    Zhu, Chenli
    Ma, Ji
    Zhao, Yue
    [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2023, 26 (05): : 3655 - 3690
  • [45] Attribute prediction of spatio-temporal graph nodes based on weighted graph diffusion convolution network
    Linlin Ding
    Haiyou Yu
    Chenli Zhu
    Ji Ma
    Yue Zhao
    [J]. World Wide Web, 2023, 26 : 3655 - 3690
  • [46] Spatio-temporal Super-resolution Network: Enhance Visual Representations for Video Captioning
    Cao, Quanhui
    Tang, Pengjie
    Wang, Hanli
    [J]. 2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 3125 - 3129
  • [47] Deep Reinforcement Polishing Network for Video Captioning
    Xu, Wanru
    Yu, Jian
    Miao, Zhenjiang
    Wan, Lili
    Tian, Yi
    Ji, Qiang
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 1772 - 1784
  • [48] Global semantic enhancement network for video captioning
    Luo, Xuemei
    Luo, Xiaotong
    Wang, Di
    Liu, Jinhui
    Wan, Bo
    Zhao, Lin
    [J]. PATTERN RECOGNITION, 2024, 145
  • [49] Catching the Temporal Regions-of-Interest for Video Captioning
    Yang, Ziwei
    Han, Yahong
    Wang, Zheng
    [J]. PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 146 - 153
  • [50] Context Gating with Short Temporal Information for Video Captioning
    Xu, Jinlei
    Xu, Ting
    Tian, Xin
    Liu, Chunping
    Ji, Yi
    [J]. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,