A audio-visual model for efficient video summarization

被引:0
|
作者
El-Nagar, Gamal [1 ]
El-Sawy, Ahmed [1 ]
Rashad, Metwally [1 ,2 ]
机构
[1] Benha Univ, Fac Comp & Artificial Intelligence, Dept Comp Sci, Banha, Egypt
[2] Prince Sattam Bin Abdulaziz Univ, Coll Engn, Dept Comp Engn & Informat, Al Kharj 16273, Saudi Arabia
关键词
Video Skimming; VGGish; SumMe; TVSum; Visualization of score curves;
D O I
10.1016/j.jvcir.2024.104130
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The adage "a picture is worth a thousand words"resonates in the digital video domain, suggesting that a video could be seen as a composition of millions of these words. Videos are composed of countless frames. Video summarization creates cohesive visual units in scenes by condensing shots from segments. Video summarization gains prominence by condensing lengthy videos while retaining crucial content. Despite effective techniques using keyframes or keyshots in video summarization, integrating audio components is imperative. This paper focuses on integrating deep learning techniques to generate dynamic summaries enriched with audio. To address that gap, an efficient model employs audio-visual features, enriching summarization for more robust and informative video summaries. The model selects keyshots based on their significance scores, safeguarding essential content. Assigning these scores to specific video shots is a pivotal yet demanding task for video summarization. The model's evaluation occurs on benchmark datasets, TVSum and SumMe. Experimental outcomes reveal its efficacy, showcasing considerable performance enhancements. On the TVSum, SumMe datasets, an F -Score metric of 79.33% and 66.78%, respectively, is achieved, surpassing previous state-of-the-art techniques.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Catching audio-visual mice:: The extrapolation of audio-visual speed
    Hofbauer, MM
    Wuerger, SM
    Meyer, GF
    Röhrbein, F
    Schill, K
    Zetzsche, C
    [J]. PERCEPTION, 2003, 32 : 96 - 96
  • [42] Identification of story units in audio-visual sequences by joint audio and video processing
    Saraceno, C
    Leonardi, R
    [J]. 1998 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING - PROCEEDINGS, VOL 1, 1998, : 363 - 367
  • [43] Audio-visual aligned saliency model for omnidirectional video with implicit neural representation learning
    Zhu, Dandan
    Shao, Xuan
    Zhang, Kaiwei
    Min, Xiongkuo
    Zhai, Guangtao
    Yang, Xiaokang
    [J]. APPLIED INTELLIGENCE, 2023, 53 (19) : 22615 - 22634
  • [44] Audio-visual aligned saliency model for omnidirectional video with implicit neural representation learning
    Dandan Zhu
    Xuan Shao
    Kaiwei Zhang
    Xiongkuo Min
    Guangtao Zhai
    Xiaokang Yang
    [J]. Applied Intelligence, 2023, 53 : 22615 - 22634
  • [45] An audio-visual speech recognition with a new mandarin audio-visual database
    Liao, Wen-Yuan
    Pao, Tsang-Long
    Chen, Yu-Te
    Chang, Tsun-Wei
    [J]. INT CONF ON CYBERNETICS AND INFORMATION TECHNOLOGIES, SYSTEMS AND APPLICATIONS/INT CONF ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL 1, 2007, : 19 - +
  • [46] Audio-Visual Efficient Conformer for Robust Speech Recognition
    Burchi, Maxime
    Timofte, Radu
    [J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2257 - 2266
  • [47] Creating motion video summaries with partial audio-visual alignment
    Gong, YH
    Liu, X
    Hua, W
    [J]. IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, : 285 - 288
  • [48] Audio-Visual Autoencoding for Privacy-Preserving Video Streaming
    Xu, Honghui
    Cai, Zhipeng
    Takabi, Daniel
    Li, Wei
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (03): : 1749 - 1761
  • [49] Full-reference audio-visual video quality metric
    Martinez, Helard Becerra
    Fariasa, Mylene C. Q.
    [J]. JOURNAL OF ELECTRONIC IMAGING, 2014, 23 (06)
  • [50] A ROBUST AUDIO-VISUAL SPEECH ENHANCEMENT MODEL
    Wang, Wupeng
    Xing, Chao
    Wang, Dong
    Chen, Xiao
    Sun, Fengyu
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7529 - 7533