A Knowledge Augmented and Multimodal-Based Framework for Video Summarization

被引:7
|
作者
Xie, Jiehang [1 ]
Chen, Xuanbai [2 ]
Lu, Shao-Ping [1 ]
Yang, Yulu [1 ]
机构
[1] Nankai Univ, Tianjin, Peoples R China
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
Video Summarization; Multimodal Information;
D O I
10.1145/3503161.3548089
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Video summarization aims to generate a compact version of a lengthy video that retains its primary content. In general, humans are gifted with producing a high-quality video summary, because they acquire crucial content through multiple dimensional information and own abundant background knowledge about the original video. However, existing methods rarely consider multichannel information and ignore the impact of external knowledge, resulting in the limited quality of the generated summaries. This paper proposes a knowledge augmented and multimodal-based video summarization method, termed KAMV, to address the problem above. Specifically, we design a knowledge encoder with a hybrid method consisting of generation and retrieval, to capture descriptive content and latent connections between events and entities based on the external knowledge base, which can provide rich implicit knowledge for better comprehending the video viewed. Furthermore, for the sake of exploring the interactions among visual, audio, implicit knowledge and emphasizing the content that is most relevant to the desired summary, we present a fusion module under the supervision of these multimodal information. By conducting extensive experiments on four public datasets, the results demonstrate the superior performance yielded by the proposed KAMV compared to the state-of-the-art video summarization approaches.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Multimodal-Based and Aesthetic-Guided Narrative Video Summarization
    Xie, Jiehang
    Chen, Xuanbai
    Zhang, Tianyi
    Zhang, Yixuan
    Lu, Shao-Ping
    Cesar, Pablo
    Yang, Yulu
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 4894 - 4908
  • [2] DEEP LEARNING FOR MULTIMODAL-BASED VIDEO INTERESTINGNESS PREDICTION
    Shen, Yuesong
    Demarty, Claire-Helene
    Duong, Ngoc Q. K.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 1003 - 1008
  • [3] Video Summarization Based on Multimodal Features
    Zhang, Yu
    Liu, Ju
    Liu, Xiaoxi
    Gao, Xuesong
    [J]. INTERNATIONAL JOURNAL OF MULTIMEDIA DATA ENGINEERING & MANAGEMENT, 2020, 11 (04): : 60 - 76
  • [4] UniMS: A Unified Framework for Multimodal Summarization with Knowledge Distillation
    Zhang, Zhengkun
    Meng, Xiaojun
    Wang, Yasheng
    Jiang, Xin
    Liu, Qun
    Yang, Zhenglu
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 11757 - 11764
  • [5] Video semantic concept discovery using multimodal-based association classification
    Lin, Lin
    Ravitz, Guy
    Shyu, Mei-Ling
    Chen, Shu-Ching
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-5, 2007, : 859 - +
  • [6] Interactive and Multimodal-based Augmented Reality for Remote Assistance using a Digital Surgical Microscope
    Wisotzky, Eric L.
    Rosenthal, Jean-Claude
    Eisert, Peter
    Hilsmann, Anna
    Schmid, Falko
    Bauer, Michael
    Schneider, Armin
    Uecker, Florian C.
    [J]. 2019 26TH IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES (VR), 2019, : 1477 - 1484
  • [7] Interactive System for Video Summarization Based on Multimodal Fusion
    Zheng Li
    Xiaobing Du
    Cuixia Ma
    Yanfeng Li
    Hongan Wang
    [J]. Journal of Beijing Institute of Technology, 2019, 28 (01) : 27 - 34
  • [8] Multimodal Video Summarization based on Fuzzy Similarity Features
    Psallidas, Theodoros
    Vasilakakis, Michael D.
    Spyrou, Evaggelos
    Iakovidis, Dimitris K.
    [J]. 2022 IEEE 14TH IMAGE, VIDEO, AND MULTIDIMENSIONAL SIGNAL PROCESSING WORKSHOP (IVMSP), 2022,
  • [9] Video summarization via knowledge-aware multimodal deep networks
    Xie, Jiehang
    Chen, Xuanbai
    Zhao, Sicheng
    Lu, Shao-Ping
    [J]. KNOWLEDGE-BASED SYSTEMS, 2024, 293
  • [10] Perceptual Video Summarization-A New Framework for Video Summarization
    Thomas, Sinnu Susan
    Gupta, Sumana
    Subramanian, Venkatesh K.
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2017, 27 (08) : 1790 - 1802