Video concept detection by audio-visual grouplets

被引:0
|
作者
Jiang, Wei [1 ]
Loui, Alexander C. [1 ]
机构
[1] Eastman Kodak Co, Kodak Technol Ctr, Rochester, NY 80550 USA
关键词
Video concept detection; Audio-visual grouplet;
D O I
10.1007/s13735-012-0020-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We investigate general concept classification in unconstrained videos by joint audio-visual analysis. An audio-visual grouplet (AVG) representation is proposed based on analyzing the statistical temporal audio-visual interactions. Each AVG contains a set of audio and visual code-words that are grouped together according to their strong temporal correlations in videos, and the AVG carries unique audio-visual cues to represent the video content. By using the entire AVGs as building elements, video concepts can be more robustly classified than using traditional vocabularies with discrete audio or visual codewords. Specifically, we conduct coarse-level foreground/background separation in both audio and visual channels, and discover four types of AVGs by exploring mixed-and-matched temporal audiovisual correlations among the following factors: visual foreground, visual background, audio foreground, and audio background. All of these types of AVGs provide discriminative audio-visual patterns for classifying various semantic concepts. To effectively use the AVGs for improved concept classification, a distance metric learning algorithm is further developed. Based on the AVG structure, the algorithm uses an iterative quadratic programming formulation to learn the optimal distances between data points according to the large-margin nearest-neighbor setting. Various types of grouplet-based distances can be computed using individual AVGs, and through our distance metric learning algorithm these grouplet-based distances can be aggregated for final classification. We extensively evaluate our method over the large-scale Columbia consumer video set. Experiments demonstrate that the AVG-based audio-visual representation can achieve consistent and significant performance improvements compared wth other state-of-the-art approaches.
引用
收藏
页码:223 / 238
页数:16
相关论文
共 50 条
  • [31] A NO-REFERENCE AUDIO-VISUAL VIDEO QUALITY METRIC
    Martinez, Helard Becerra
    Farias, Mylene C. Q.
    [J]. 2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 2125 - 2129
  • [32] VIDEO CAMERA IDENTIFICATION USING AUDIO-VISUAL FEATURES
    Milani, S.
    Cuccovillo, L.
    Tagliasacchi, M.
    Tubaro, S.
    Aichroth, P.
    [J]. 2014 5TH EUROPEAN WORKSHOP ON VISUAL INFORMATION PROCESSING (EUVIP 2014), 2014,
  • [33] AVscript: Accessible Video Editing with Audio-Visual Scripts
    Huh, Mina
    Yang, Saelyne
    Peng, Yi-Hao
    Chen, Xiang 'Anthony'
    Kim, Young-Ho
    Pavel, Amy
    [J]. PROCEEDINGS OF THE 2023 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI 2023), 2023,
  • [34] Audio-Visual Glance Network for Efficient Video Recognition
    Nugroho, Muhammad Adi
    Woo, Sangmin
    Lee, Sumin
    Kim, Changick
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 10116 - 10125
  • [35] Audio-visual speaker recognition for video broadcast news
    Maison, B
    Neti, C
    Senior, A
    [J]. JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2001, 29 (1-2): : 71 - 79
  • [36] Spotting Audio-Visual Inconsistencies (SAVI) in Manipulated Video
    Bolles, Robert
    Burns, J. Brian
    Graciarena, Martin
    Kathol, Andreas
    Lawson, Aaron
    McLaren, Mitchell
    Mensink, Thomas
    [J]. 2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 1907 - 1914
  • [37] Audio-Visual Speaker Recognition for Video Broadcast News
    Benoît Maison
    Chalapathy Neti
    Andrew Senior
    [J]. Journal of VLSI signal processing systems for signal, image and video technology, 2001, 29 : 71 - 79
  • [38] Audio-visual event recognition in surveillance video sequences
    Cristani, Marco
    Bicego, Manuele
    Murino, Vittorio
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2007, 9 (02) : 257 - 267
  • [39] Detection of music segment boundaries using audio-visual features for a personal video recorder
    Otsuka, Isao
    Suginohara, Hidetsugu
    Kusunoki, Yoshiaki
    Divakaran, Ajay
    [J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2007, 53 (01) : 150 - 154
  • [40] Highlights extraction from sports video based on an audio-visual marker detection framework
    Xiong, ZY
    Radhakrishnan, R
    Divakaran, A
    Huang, TS
    [J]. 2005 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), VOLS 1 AND 2, 2005, : 29 - 32