A audio-visual model for efficient video summarization

被引：0

作者：

El-Nagar, Gamal ^{[1
]}

El-Sawy, Ahmed ^{[1
]}

Rashad, Metwally ^{[1
,2
]}

机构：

[1] Benha Univ, Fac Comp & Artificial Intelligence, Dept Comp Sci, Banha, Egypt

[2] Prince Sattam Bin Abdulaziz Univ, Coll Engn, Dept Comp Engn & Informat, Al Kharj 16273, Saudi Arabia

来源：

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION | 2024年 / 100卷

关键词：

Video Skimming; VGGish; SumMe; TVSum; Visualization of score curves;

D O I：

10.1016/j.jvcir.2024.104130

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The adage "a picture is worth a thousand words"resonates in the digital video domain, suggesting that a video could be seen as a composition of millions of these words. Videos are composed of countless frames. Video summarization creates cohesive visual units in scenes by condensing shots from segments. Video summarization gains prominence by condensing lengthy videos while retaining crucial content. Despite effective techniques using keyframes or keyshots in video summarization, integrating audio components is imperative. This paper focuses on integrating deep learning techniques to generate dynamic summaries enriched with audio. To address that gap, an efficient model employs audio-visual features, enriching summarization for more robust and informative video summaries. The model selects keyshots based on their significance scores, safeguarding essential content. Assigning these scores to specific video shots is a pivotal yet demanding task for video summarization. The model's evaluation occurs on benchmark datasets, TVSum and SumMe. Experimental outcomes reveal its efficacy, showcasing considerable performance enhancements. On the TVSum, SumMe datasets, an F -Score metric of 79.33% and 66.78%, respectively, is achieved, surpassing previous state-of-the-art techniques.

引用

页数：9

共 50 条

[21] Video clip recognition using joint audio-visual processing model
Kulesh, Victor
Petrushin, Valery A.
Sethi, Ishwar K.
[J]. Proceedings - International Conference on Pattern Recognition, 2002, 16 (01): : 500 - 503
[22] AUTOMATIC CONSUMER VIDEO SUMMARIZATION BY AUDIO AND VISUAL ANALYSIS
Jiang, Wei
Cotton, Courtenay
Loui, Alexander C.
[J]. 2011 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2011,
[23] Summarization of Multiple News Videos Considering the Consistency of Audio-Visual Contents
Zhang, Ye
Tanishige, Ryunosuke
Ide, Ichiro
Doman, Keisuke
Kawanishi, Yasutomo
Deguchi, Daisuke
Murase, Hiroshi
[J]. INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2019, 13 (01) : 135 - 155
[24] Towards Audio-Visual Saliency Prediction for Omnidirectional Video with Spatial Audio
Chao, Fang-Yi
Ozcinar, Cagri
Zhang, Lu
Hamidouche, Wassim
Deforges, Olivier
Smolic, Aljosa
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2020, : 355 - 358
[25] Perceptual Quality of Audio-Visual Content with Common Video and Audio Degradations
Becerra Martinez, Helard
Hines, Andrew
Farias, Mylene C. Q.
[J]. APPLIED SCIENCES-BASEL, 2021, 11 (13):
[26] Efficient Video Coding in H.264/AVC by using Audio-Visual Information
Lee, Jong-Seok
Ebrahimi, Touradj
[J]. 2009 IEEE INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP 2009), 2009, : 402 - 407
[27] Combining text and audio-visual features in video indexing
Chang, SF
Manmatha, R
Chua, TS
[J]. 2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 1005 - 1008
[28] VIDEO CAMERA IDENTIFICATION USING AUDIO-VISUAL FEATURES
Milani, S.
Cuccovillo, L.
Tagliasacchi, M.
Tubaro, S.
Aichroth, P.
[J]. 2014 5TH EUROPEAN WORKSHOP ON VISUAL INFORMATION PROCESSING (EUVIP 2014), 2014,
[29] A NO-REFERENCE AUDIO-VISUAL VIDEO QUALITY METRIC
Martinez, Helard Becerra
Farias, Mylene C. Q.
[J]. 2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 2125 - 2129
[30] Audio-visual synchrony for detection of monologues in video archives
Iyengar, G
Nock, HJ
Neti, C
[J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO AND ELECTROACOUSTICS MULTIMEDIA SIGNAL PROCESSING, 2003, : 772 - 775

← 1 2 3 4 5 →