Self-Attention Based Video Summarization

被引:0
|
作者
Li, Yiyi [1 ,2 ]
Wang, Jilong [1 ]
机构
[1] Institute for Network Sciences and Cyberspace, Tsinghua University, Beijing,100084, China
[2] Beijing No.4 High School, Beijing,100034, China
关键词
Video recording - Learning systems - Reinforcement learning;
D O I
10.3724/SP.J.1089.2020.17854
中图分类号
学科分类号
摘要
Video summarization aims to identify the most representative contexts in videos. In this paper, we propose a new video summarization method which assigns different importance to video frames. Specifically, we exploit bidirectional LSTMs to capture temporal information of video frames and then employ self-attention mechanism to pay different attention on different frames for extracting their global features. Finally, we sample an action for each frame by using the corresponding regression score and apply the reinforcement learning strategy to optimize parameters in our model, where actions are defined as select or not select the current frame, states are defined as actions for the whole video, and the reward is defined as the sum of representative and diversity costs. We conduct video summarization experiments on two public video summarization datasets including SumMe and TVSum and evaluate the performance by using F-measure. Experimental results demonstrate that our proposed video summarization method has achieved the superior performance, comparing to the state-of-the-arts. © 2020, Beijing China Science Journal Publishing Co. Ltd. All right reserved.
引用
收藏
页码:652 / 659
相关论文
共 50 条
  • [1] LEARNING HIERARCHICAL SELF-ATTENTION FOR VIDEO SUMMARIZATION
    Liu, Yen-Ting
    Li, Yu-Jhe
    Yang, Fu-En
    Chen, Shang-Fu
    Wang, Yu-Chiang Frank
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 3377 - 3381
  • [2] Self-attention binary neural tree for video summarization
    Fu, Hao
    Wang, Hongxing
    [J]. PATTERN RECOGNITION LETTERS, 2021, 143 : 19 - 26
  • [3] Self-attention binary neural tree for video summarization
    Fu, Hao
    Wang, Hongxing
    [J]. Wang, Hongxing (ihxwang@cqu.edu.cn), 1600, Elsevier B.V. (143): : 19 - 26
  • [4] Bi-Directional Self-Attention with Relative Positional Encoding for Video Summarization
    Lin, Jingxu
    Zhong, Sheng-hua
    [J]. 2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2020, : 1161 - 1166
  • [5] Self-Attention Guided Copy Mechanism for Abstractive Summarization
    Xu, Song
    Li, Haoran
    Yuan, Peng
    Wu, Youzheng
    He, Xiaodong
    Zhou, Bowen
    [J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 1355 - 1362
  • [6] A Video Visual Security Metric Based on Spatiotemporal Self-Attention
    Tang, Bo
    Li, Fengdong
    Liu, Jianbo
    Yang, Cheng
    [J]. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 9230 - 9244
  • [7] Spatiotemporal module for video saliency prediction based on self-attention
    Wang, Yuhao
    Liu, Zhuoran
    Xia, Yibo
    Zhu, Chunbo
    Zhao, Danpei
    [J]. IMAGE AND VISION COMPUTING, 2021, 112
  • [8] Text summarization based on multi-head self-attention mechanism and pointer network
    Qiu, Dong
    Yang, Bing
    [J]. COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (01) : 555 - 567
  • [9] Text summarization based on multi-head self-attention mechanism and pointer network
    Dong Qiu
    Bing Yang
    [J]. Complex & Intelligent Systems, 2022, 8 : 555 - 567
  • [10] Unsupervised Video Anomaly Detection with Self-Attention based Feature Aggregating
    Ye, Zhenhao
    Li, Yanlong
    Cui, Zhichao
    Liu, Yuehu
    Li, Li
    Wang, Le
    Zhang, Chi
    [J]. 2023 IEEE 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, ITSC, 2023, : 3551 - 3556