VIDEO SUMMARIZATION WITH ANCHORS AND MULTI-HEAD ATTENTION

被引:0
|
作者
Sung, Yi-Lin [1 ]
Hong, Cheng-Yao [1 ]
Hsu, Yen-Chi [1 ]
Liu, Tyng-Luh [1 ]
机构
[1] Acad Sinica, Inst Informat Sci, Taipei, Taiwan
关键词
Video summarization; multi-head attention; anchors; deep learning;
D O I
10.1109/icip40778.2020.9191178
中图分类号
TB8 [摄影技术];
学科分类号
0804 ;
摘要
Video summarization is a challenging task that will automatically generate a representative and attractive highlight movie from the source video. Previous works explicitly exploit the hierarchical structure of video to train a summarizer. However, their method sometimes uses fixed-length segmentation, which breaks the video structure or requires additional training data to train the segmentation model. In this paper, we propose an Anchor-Based Attention RNN (ABA-RNN) for solving the video summarization problem. ABA-RNN provides two contributions. One is that we attain the frame-level and clip-level features by the anchor-based approach, and the model only needs one layer of RNN by introducing subtraction manner used in minus-LSTM. We also use multi-head attention to let the model select suitable lengths of segments. Another contribution is that we do not need any extra video preprocessing to determine shot boundaries and our architecture is end-to-end training. In experiments, we follow the standard datasets SumMe and TVSum and achieve competitive performance against the state-of-the-art results.
引用
收藏
页码:2396 / 2400
页数:5
相关论文
共 50 条
  • [1] Multi-head attention with reinforcement learning for supervised video summarization
    Kadam, Bhakti Deepak
    Deshpande, Ashwini Mangesh
    [J]. Journal of Electronic Imaging, 2024, 33 (05)
  • [2] Abstractive Text Summarization with Multi-Head Attention
    Li, Jinpeng
    Zhang, Chuang
    Chen, Xiaojun
    Cao, Yanan
    Liao, Pengcheng
    Zhang, Peng
    [J]. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [3] VATMAN: Integrating Video-Audio-Text for Multimodal Abstractive SummarizatioN via Crossmodal Multi-Head Attention Fusion
    Baek, Doosan
    Kim, Jiho
    Lee, Hongchul
    [J]. IEEE ACCESS, 2024, 12 : 119174 - 119184
  • [4] On the diversity of multi-head attention
    Li, Jian
    Wang, Xing
    Tu, Zhaopeng
    Lyu, Michael R.
    [J]. NEUROCOMPUTING, 2021, 454 : 14 - 24
  • [5] Text summarization based on multi-head self-attention mechanism and pointer network
    Dong Qiu
    Bing Yang
    [J]. Complex & Intelligent Systems, 2022, 8 : 555 - 567
  • [6] Text summarization based on multi-head self-attention mechanism and pointer network
    Qiu, Dong
    Yang, Bing
    [J]. COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (01) : 555 - 567
  • [7] Combining Multi-Head Attention and Sparse Multi-Head Attention Networks for Session-Based Recommendation
    Zhao, Zhiwei
    Wang, Xiaoye
    Xiao, Yingyuan
    [J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [8] Multi-Head Attention with Disagreement Regularization
    Li, Jian
    Tu, Zhaopeng
    Yang, Baosong
    Lyu, Michael R.
    Zhang, Tong
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 2897 - 2903
  • [9] Modality attention fusion model with hybrid multi-head self-attention for video understanding
    Zhuang, Xuqiang
    Liu, Fang'al
    Hou, Jian
    Hao, Jianhua
    Cai, Xiaohong
    [J]. PLOS ONE, 2022, 17 (10):
  • [10] Video sentiment analysis with bimodal information-augmented multi-head attention
    Wu, Ting
    Peng, Junjie
    Zhang, Wenqiang
    Zhang, Huiran
    Tan, Shuhua
    Yi, Fen
    Ma, Chuanshuai
    Huang, Yansong
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 235