A Knowledge Augmented and Multimodal-Based Framework for Video Summarization

被引:7
|
作者
Xie, Jiehang [1 ]
Chen, Xuanbai [2 ]
Lu, Shao-Ping [1 ]
Yang, Yulu [1 ]
机构
[1] Nankai Univ, Tianjin, Peoples R China
[2] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
Video Summarization; Multimodal Information;
D O I
10.1145/3503161.3548089
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Video summarization aims to generate a compact version of a lengthy video that retains its primary content. In general, humans are gifted with producing a high-quality video summary, because they acquire crucial content through multiple dimensional information and own abundant background knowledge about the original video. However, existing methods rarely consider multichannel information and ignore the impact of external knowledge, resulting in the limited quality of the generated summaries. This paper proposes a knowledge augmented and multimodal-based video summarization method, termed KAMV, to address the problem above. Specifically, we design a knowledge encoder with a hybrid method consisting of generation and retrieval, to capture descriptive content and latent connections between events and entities based on the external knowledge base, which can provide rich implicit knowledge for better comprehending the video viewed. Furthermore, for the sake of exploring the interactions among visual, audio, implicit knowledge and emphasizing the content that is most relevant to the desired summary, we present a fusion module under the supervision of these multimodal information. By conducting extensive experiments on four public datasets, the results demonstrate the superior performance yielded by the proposed KAMV compared to the state-of-the-art video summarization approaches.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Graph-based Multimodal Ranking Models for Multimodal Summarization
    Zhu, Junnan
    Xiang, Lu
    Zhou, Yu
    Zhang, Jiajun
    Zong, Chengqing
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (04)
  • [42] A Multimodal Framework for Video Ads Understanding
    Weng, Zejia
    Meng, Lingchen
    Wang, Rui
    Wu, Zuxuan
    Jiang, Yu-Gang
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4843 - 4847
  • [43] A Multimodal Framework for Video Caption Generation
    Bhooshan, Reshmi S.
    Suresh, K.
    [J]. IEEE Access, 2022, 10 : 92166 - 92176
  • [44] A Multimodal Framework for Video Caption Generation
    Bhooshan, Reshmi S.
    Suresh, K.
    [J]. IEEE ACCESS, 2022, 10 : 92166 - 92176
  • [45] Multimodal-based weld reinforcement monitoring system for wire arc additive manufacturing
    Shen, Bin
    Lu, Jun
    Wang, Yiming
    Chen, Dongli
    Han, Jing
    Zhang, Yi
    Zhao, Zhuang
    [J]. JOURNAL OF MATERIALS RESEARCH AND TECHNOLOGY-JMR&T, 2022, 20 : 561 - 571
  • [46] Research on the Application of Multimodal-Based Machine Learning Algorithms to Water Quality Classification
    Xin, Lei
    Mou, Tianyu
    [J]. WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
  • [47] Instructional Video Summarization Using Attentive Knowledge Grounding
    Kim, Kyungho
    Lee, Kyungjae
    Hwang, Seung-won
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: APPLIED DATA SCIENCE AND DEMO TRACK, ECML PKDD 2020, PT V, 2021, 12461 : 565 - 569
  • [48] An Effective Video Summarization Framework Toward Handheld Devices
    Zhang, Luming
    Xia, Yingjie
    Mao, Kuang
    Ma, He
    Shan, Zhenyu
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2015, 62 (02) : 1309 - 1316
  • [49] ICAF: Iterative Contrastive Alignment Framework for Multimodal Abstractive Summarization
    Zhang, Zijian
    Shu, Chang
    Chen, Youxin
    Xiao, Jing
    Zhang, Qian
    Zheng, Lu
    [J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [50] Progressive Video Summarization via Multimodal Self-supervised Learning
    Li, Haopeng
    Ke, Qiuhong
    Gong, Mingming
    Drummond, Tom
    [J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5573 - 5582