A audio-visual model for efficient video summarization

被引:0
|
作者
El-Nagar, Gamal [1 ]
El-Sawy, Ahmed [1 ]
Rashad, Metwally [1 ,2 ]
机构
[1] Benha Univ, Fac Comp & Artificial Intelligence, Dept Comp Sci, Banha, Egypt
[2] Prince Sattam Bin Abdulaziz Univ, Coll Engn, Dept Comp Engn & Informat, Al Kharj 16273, Saudi Arabia
关键词
Video Skimming; VGGish; SumMe; TVSum; Visualization of score curves;
D O I
10.1016/j.jvcir.2024.104130
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The adage "a picture is worth a thousand words"resonates in the digital video domain, suggesting that a video could be seen as a composition of millions of these words. Videos are composed of countless frames. Video summarization creates cohesive visual units in scenes by condensing shots from segments. Video summarization gains prominence by condensing lengthy videos while retaining crucial content. Despite effective techniques using keyframes or keyshots in video summarization, integrating audio components is imperative. This paper focuses on integrating deep learning techniques to generate dynamic summaries enriched with audio. To address that gap, an efficient model employs audio-visual features, enriching summarization for more robust and informative video summaries. The model selects keyshots based on their significance scores, safeguarding essential content. Assigning these scores to specific video shots is a pivotal yet demanding task for video summarization. The model's evaluation occurs on benchmark datasets, TVSum and SumMe. Experimental outcomes reveal its efficacy, showcasing considerable performance enhancements. On the TVSum, SumMe datasets, an F -Score metric of 79.33% and 66.78%, respectively, is achieved, surpassing previous state-of-the-art techniques.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Video clip recognition using joint audio-visual processing model
    Kulesh, Victor
    Petrushin, Valery A.
    Sethi, Ishwar K.
    [J]. Proceedings - International Conference on Pattern Recognition, 2002, 16 (01): : 500 - 503
  • [22] AUTOMATIC CONSUMER VIDEO SUMMARIZATION BY AUDIO AND VISUAL ANALYSIS
    Jiang, Wei
    Cotton, Courtenay
    Loui, Alexander C.
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2011,
  • [23] Summarization of Multiple News Videos Considering the Consistency of Audio-Visual Contents
    Zhang, Ye
    Tanishige, Ryunosuke
    Ide, Ichiro
    Doman, Keisuke
    Kawanishi, Yasutomo
    Deguchi, Daisuke
    Murase, Hiroshi
    [J]. INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2019, 13 (01) : 135 - 155
  • [24] Towards Audio-Visual Saliency Prediction for Omnidirectional Video with Spatial Audio
    Chao, Fang-Yi
    Ozcinar, Cagri
    Zhang, Lu
    Hamidouche, Wassim
    Deforges, Olivier
    Smolic, Aljosa
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2020, : 355 - 358
  • [25] Perceptual Quality of Audio-Visual Content with Common Video and Audio Degradations
    Becerra Martinez, Helard
    Hines, Andrew
    Farias, Mylene C. Q.
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (13):
  • [26] Efficient Video Coding in H.264/AVC by using Audio-Visual Information
    Lee, Jong-Seok
    Ebrahimi, Touradj
    [J]. 2009 IEEE INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP 2009), 2009, : 402 - 407
  • [27] Combining text and audio-visual features in video indexing
    Chang, SF
    Manmatha, R
    Chua, TS
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 1005 - 1008
  • [28] VIDEO CAMERA IDENTIFICATION USING AUDIO-VISUAL FEATURES
    Milani, S.
    Cuccovillo, L.
    Tagliasacchi, M.
    Tubaro, S.
    Aichroth, P.
    [J]. 2014 5TH EUROPEAN WORKSHOP ON VISUAL INFORMATION PROCESSING (EUVIP 2014), 2014,
  • [29] A NO-REFERENCE AUDIO-VISUAL VIDEO QUALITY METRIC
    Martinez, Helard Becerra
    Farias, Mylene C. Q.
    [J]. 2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 2125 - 2129
  • [30] Audio-visual synchrony for detection of monologues in video archives
    Iyengar, G
    Nock, HJ
    Neti, C
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO AND ELECTROACOUSTICS MULTIMEDIA SIGNAL PROCESSING, 2003, : 772 - 775