Video Summarization With Frame Index Vision Transformer

被引:1
|
作者
Hsu, Tzu-Chun [1 ]
Liao, Yi-Sheng [1 ]
Huang, Chun-Rong [1 ]
机构
[1] Natl Chung Hsing Univ, Dept Comp Sci & Engn, Taichung 402, Taiwan
关键词
D O I
10.23919/MVA51890.2021.9511350
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a novel frame index vision transformer for video summarization. Given training frames, we linearly project the content of the frames to obtain frame embedding. By incorporating the frame embedding with the index embedding and class embedding, the proposed frame index vision transformer can be efficiently and effectively applied to learn the importance of the input frames. As shown in the experimental results, the proposed method outperforms the state-of-the-art deep learning methods including recurrent neural network (RNN) and convolutional neural network (CNN) based methods in both of the SumMe and TVSum datasets. In addition, our method can achieve real-time computational efficiency during testing.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Video Summarization With Spatiotemporal Vision Transformer
    Hsu, Tzu-Chun
    Liao, Yi-Sheng
    Huang, Chun-Rong
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 3013 - 3026
  • [2] Efficient Transformer for Video Summarization
    Kolmakova, Tatiana
    Makarov, Ilya
    [J]. ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2023, PT II, 2023, 14135 : 52 - 65
  • [3] ViTframe: Vision Transformer Acceleration via Informative Frame Selection for Video Recognition
    Qi, Chunyu
    Li, Zilong
    Song, Zhuoran
    Liang, Xiaoyao
    [J]. 2023 IEEE 41ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD, 2023, : 383 - 390
  • [4] Video Frame Interpolation Transformer
    Shi, Zhihao
    Xu, Xiangyu
    Liu, Xiaohong
    Chen, Jun
    Yang, Ming-Hsuan
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17461 - 17470
  • [5] Video Frame Interpolation with Transformer
    Lu, Liying
    Wu, Ruizheng
    Lin, Huaijia
    Lu, Jiangbo
    Jia, Jiaya
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3522 - 3532
  • [6] Video summarization with u-shaped transformer
    Yaosen Chen
    Bing Guo
    Yan Shen
    Renshuang Zhou
    Weichen Lu
    Wei Wang
    Xuming Wen
    Xinhua Suo
    [J]. Applied Intelligence, 2022, 52 : 17864 - 17880
  • [7] Video summarization with u-shaped transformer
    Chen, Yaosen
    Guo, Bing
    Shen, Yan
    Zhou, Renshuang
    Lu, Weichen
    Wang, Wei
    Wen, Xuming
    Suo, Xinhua
    [J]. APPLIED INTELLIGENCE, 2022, 52 (15) : 17864 - 17880
  • [8] ViViT: A Video Vision Transformer
    Arnab, Anurag
    Dehghani, Mostafa
    Heigold, Georg
    Sun, Chen
    Lucic, Mario
    Schmid, Cordelia
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 6816 - 6826
  • [9] Video Frame Interpolation with Flow Transformer
    Gao, Pan
    Tian, Haoyue
    Qin, Jie
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 1933 - 1942
  • [10] Fuzzy video summarization using key frame extraction
    Kapoor, Aditi
    Biswas, K. K.
    Hanmandlu, M.
    [J]. 2013 FOURTH NATIONAL CONFERENCE ON COMPUTER VISION, PATTERN RECOGNITION, IMAGE PROCESSING AND GRAPHICS (NCVPRIPG), 2013,