Video summarization with u-shaped transformer

被引:8
|
作者
Chen, Yaosen [1 ,3 ]
Guo, Bing [1 ]
Shen, Yan [2 ]
Zhou, Renshuang [1 ,3 ]
Lu, Weichen [3 ]
Wang, Wei [1 ,3 ,4 ]
Wen, Xuming [3 ,4 ]
Suo, Xinhua [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Sichuan, Peoples R China
[2] Chengdu Univ Informat Technol, Sch Comp Sci, Chengdu 610225, Sichuan, Peoples R China
[3] ChengDu Sobey Digital Technol Co Ltd, Media Intelligence Lab, Chengdu 610041, Sichuan, Peoples R China
[4] Peng Cheng Lab, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
Video summarization; Transformer; Multi-scale;
D O I
10.1007/s10489-022-03451-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, supervised video summarization has made tremendous progress with treating it as a sequence-to-sequence learning task. However, traditional recurrent neural networks (RNNs) have limitations in sequence modeling of long sequences, and the use of a transformer for sequence modeling requires a large number of parameters. We propose an efficient U-shaped transformer for video summarization tasks in this paper to address this issue, which we call "Uformer". Precisely, Uformer consists of three key components: embedding, Uformer block, and prediction head. First of all, the image features sequence is represented by the pre-trained deep convolutional network, then represented by a liner embedding. The image feature sequence differences are also represented by another liner embedding and concatenate together to form a two-stream embedding feature in the embedding component. Secondly, we stack multiple transformer layers into a U-shaped block to integrate the representations learned from the previous layers. Multi-scale Uformer can not only learn longer sequence information but also reduce the number of parameters and calculations. Finally, prediction head regression the localization of the keyframes and learning the corresponding classification scores. Uformer combine with non-maximum suppression (NMS) for post-processing to get the final video summarization. We improved the F-score from 50.2% to 53.9% by 3.7% on the SumMe dataset and improved F-score from 62.1% to 63.0% by 0.9% on the TVSum dataset. Our proposed model with 0.85M parameters which are only 32.32% of DR-DSN's parameters.
引用
收藏
页码:17864 / 17880
页数:17
相关论文
共 50 条
  • [1] Video summarization with u-shaped transformer
    Yaosen Chen
    Bing Guo
    Yan Shen
    Renshuang Zhou
    Weichen Lu
    Wei Wang
    Xuming Wen
    Xinhua Suo
    Applied Intelligence, 2022, 52 : 17864 - 17880
  • [2] Uformer: A General U-Shaped Transformer for Image Restoration
    Wang, Zhendong
    Cun, Xiaodong
    Bao, Jianmin
    Zhou, Wengang
    Liu, Jianzhuang
    Li, Houqiang
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17662 - 17672
  • [3] RockFormer: A U-Shaped Transformer Network for Martian Rock Segmentation
    Liu, Haiqiang
    Yao, Meibao
    Xiao, Xueming
    Xiong, Yonggang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [4] U-shaped stacked structure monolithic transformer for efficiency improvement
    Kang, Sungyoon
    Kim, Minchul
    Kim, Junghyun
    MICROWAVE AND OPTICAL TECHNOLOGY LETTERS, 2018, 60 (09) : 2325 - 2330
  • [5] Collaborative transformer U-shaped network for medical image segmentation
    Gao, Yufei
    Zhang, Shichao
    Shi, Lei
    Zhao, Guohua
    Shi, Yucheng
    APPLIED SOFT COMPUTING, 2025, 173
  • [6] Longitudinal assessment of U-shaped and inverted U-shaped developmental changes in the spontaneous movements of infants via markerless video analysis
    Kinoshita, Naoki
    Furui, Akira
    Soh, Zu
    Hayashi, Hideaki
    Shibanoki, Taro
    Mori, Hiroki
    Shimatani, Koji
    Funabiki, Yasuko
    Tsuji, Toshio
    SCIENTIFIC REPORTS, 2020, 10 (01)
  • [7] Longitudinal assessment of U-shaped and inverted U-shaped developmental changes in the spontaneous movements of infants via markerless video analysis
    Naoki Kinoshita
    Akira Furui
    Zu Soh
    Hideaki Hayashi
    Taro Shibanoki
    Hiroki Mori
    Koji Shimatani
    Yasuko Funabiki
    Toshio Tsuji
    Scientific Reports, 10
  • [8] U-Shaped Interest in U-Shaped Development-and What It Means
    Siegler, Robert S.
    JOURNAL OF COGNITION AND DEVELOPMENT, 2004, 5 (01) : 1 - 10
  • [9] Optimization of U-shaped pure transformer medical image segmentation network
    Dan, Yongping
    Jin, Weishou
    Wang, Zhida
    Sun, Changhao
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [10] UDT: U-shaped deformable transformer for subarachnoid haemorrhage image segmentation
    Xie, Wei
    Jin, Lianghao
    Hua, Shiqi
    Sun, Hao
    Sun, Bo
    Tu, Zhigang
    Liu, Jun
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2024, 9 (03) : 756 - 768