A audio-visual model for efficient video summarization

被引:0
|
作者
El-Nagar, Gamal [1 ]
El-Sawy, Ahmed [1 ]
Rashad, Metwally [1 ,2 ]
机构
[1] Benha Univ, Fac Comp & Artificial Intelligence, Dept Comp Sci, Banha, Egypt
[2] Prince Sattam Bin Abdulaziz Univ, Coll Engn, Dept Comp Engn & Informat, Al Kharj 16273, Saudi Arabia
关键词
Video Skimming; VGGish; SumMe; TVSum; Visualization of score curves;
D O I
10.1016/j.jvcir.2024.104130
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The adage "a picture is worth a thousand words"resonates in the digital video domain, suggesting that a video could be seen as a composition of millions of these words. Videos are composed of countless frames. Video summarization creates cohesive visual units in scenes by condensing shots from segments. Video summarization gains prominence by condensing lengthy videos while retaining crucial content. Despite effective techniques using keyframes or keyshots in video summarization, integrating audio components is imperative. This paper focuses on integrating deep learning techniques to generate dynamic summaries enriched with audio. To address that gap, an efficient model employs audio-visual features, enriching summarization for more robust and informative video summaries. The model selects keyshots based on their significance scores, safeguarding essential content. Assigning these scores to specific video shots is a pivotal yet demanding task for video summarization. The model's evaluation occurs on benchmark datasets, TVSum and SumMe. Experimental outcomes reveal its efficacy, showcasing considerable performance enhancements. On the TVSum, SumMe datasets, an F -Score metric of 79.33% and 66.78%, respectively, is achieved, surpassing previous state-of-the-art techniques.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] An audio-visual saliency model for movie summarization
    Rapantzikos, Konstantinos
    Evangelopoulos, Georgios
    Maragos, Petros
    Avrithis, Yannis
    [J]. 2007 IEEE NINTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2007, : 320 - 323
  • [2] Attention-Based Audio-Visual Fusion for Video Summarization
    Fang, Yinghong
    Zhang, Junpeng
    Lu, Cewu
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2019), PT II, 2019, 11954 : 328 - 340
  • [3] Audio-Visual Glance Network for Efficient Video Recognition
    Nugroho, Muhammad Adi
    Woo, Sangmin
    Lee, Sumin
    Kim, Changick
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 10116 - 10125
  • [4] AUTOMATIC SUMMARIZATION OF AUDIO-VISUAL SOCCER FEEDS
    Chen, Fan
    De Vleeschouwer, C.
    Duxans Barrobes, H.
    Gregorio Escalada, J.
    Conejero, D.
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME 2010), 2010, : 837 - 842
  • [5] Efficient video coding based on audio-visual focus of attention
    Lee, Jong-Seok
    De Simone, Francesca
    Ebrahimi, Touradj
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2011, 22 (08) : 704 - 711
  • [6] PREDICTING AUDIO-VISUAL SALIENT EVENTS BASED ON VISUAL, AUDIO AND TEXT MODALITIES FOR MOVIE SUMMARIZATION
    Koutras, P.
    Zlatintsi, A.
    Iosif, E.
    Katsamanis, A.
    Maragos, P.
    Potamianos, A.
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2015, : 4361 - 4365
  • [7] Automatic summarization of soccer highlights using audio-visual descriptors
    Raventos, A.
    Quijada, R.
    Torres, Luis
    Tarres, Francesc
    [J]. SPRINGERPLUS, 2015, 4
  • [8] Audio-visual quality and interactions between television audio and video
    Joly, A
    Montard, N
    Buttin, M
    [J]. ISSPA 2001: SIXTH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOLS 1 AND 2, PROCEEDINGS, 2001, : 438 - 441
  • [9] Combining audio and video metrics to assess audio-visual quality
    Becerra Martinez, Helard A.
    Farias, Mylene C. Q.
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (18) : 23993 - 24012
  • [10] An audio-visual approach to web video categorization
    Ionescu, Bogdan Emanuel
    Seyerlehner, Klaus
    Mironica, Ionut
    Vertan, Constantin
    Lambert, Patrick
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2014, 70 (02) : 1007 - 1032