Identification of story units in audio-visual sequences by joint audio and video processing

被引:0
|
作者
Saraceno, C [1 ]
Leonardi, R [1 ]
机构
[1] Univ Brescia, SCL Dept Elect Automat, I-25123 Brescia, Italy
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper, a novel technique, which uses a joint audio-visual analysis for scene identification and characterization, is proposed. The paper defines four different scene types: dialogues, stories, actions, and generic scenes. It then explains how any audio-visual material can be decomposed into a series of scenes obeying to the preview classification, by properly analyzing and then combining the underlying audio and visual information. A rule-based procedure is defined for such purpose. Before such rule-based decision can take place, a series of low-level pre-processing tasks care suggested to adequately measure audio and visual correlations. As far as visual information is concerned, it is proposed to measure similarities between non consecutive shots using a Learning Vector Quantization approach. An outlook on a possible implementation strategy for the overall scene identification task is suggested, and validated through a series of experimental simulations on real audio-visual data.
引用
收藏
页码:363 / 367
页数:5
相关论文
共 50 条
  • [11] Combining audio and video metrics to assess audio-visual quality
    Becerra Martinez, Helard A.
    Farias, Mylene C. Q.
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (18) : 23993 - 24012
  • [12] Combining audio and video metrics to assess audio-visual quality
    Helard A. Becerra Martinez
    Mylène C. Q. Farias
    [J]. Multimedia Tools and Applications, 2018, 77 : 23993 - 24012
  • [13] Joint watermarking of audio-visual data
    Dittmann, J
    Steinebach, M
    [J]. 2001 IEEE FOURTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2001, : 601 - 606
  • [14] Joint Audio-Visual Deepfake Detection
    Zhou, Yipin
    Lim, Ser-Nam
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 14780 - 14789
  • [15] Joint audio-video processing of IMPEG encoded sequences
    Boccignone, G
    De Santo, M
    Percannella, G
    [J]. IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS, PROCEEDINGS VOL 2, 1999, : 225 - 229
  • [16] Incongruence Detection in Audio-Visual Processing
    Havlena, Michal
    Heller, Jan
    Kayser, Hendrik
    Bach, Joerg-Hendrik
    Anemueller, Joern
    Pajdla, Tomas
    [J]. DETECTION AND IDENTIFICATION OF RARE AUDIOVISUAL CUES, 2012, 384 : 67 - +
  • [17] Audio-visual speech processing and attention
    Sams, M
    [J]. PSYCHOPHYSIOLOGY, 2003, 40 : S5 - S6
  • [18] Perceptual Quality of Audio-Visual Content with Common Video and Audio Degradations
    Becerra Martinez, Helard
    Hines, Andrew
    Farias, Mylene C. Q.
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (13):
  • [19] Towards Audio-Visual Saliency Prediction for Omnidirectional Video with Spatial Audio
    Chao, Fang-Yi
    Ozcinar, Cagri
    Zhang, Lu
    Hamidouche, Wassim
    Deforges, Olivier
    Smolic, Aljosa
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2020, : 355 - 358
  • [20] Integrating audio-visual features and text information for story segmentation of news video
    Liu, Hua-Yong
    Zhou, Dong-Ru
    [J]. Wuhan University Journal of Natural Sciences, 2003, 8 (04) : 1070 - 1074