An i-vector Representation of Acoustic Environments for Audio-based Video Event Detection on User Generated Content

被引:8
|
作者
Elizalde, Benjamin [1 ]
Lei, Howard [1 ]
Friedland, Gerald [1 ]
机构
[1] Int Comp Sci Inst, Berkeley, CA 94704 USA
关键词
Audio; i-vector; Video Event Detection; User Generated Content;
D O I
10.1109/ISM.2013.27
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Audio-based video event detection (VED) on usergenerated content (UGC) aims to find videos that show an observable event such as a wedding ceremony or birthday party rather than a sound, such as music, clapping or singing. The difficulty of video content analysis on UGC lies in the acoustic variability and lack of structure of the data. The UGC task has been explored mainly by computer vision, but can be benefited by the used of audio. The i-vector system is state-of-the-art in Speaker Verification, and is outperforming a conventional Gaussian Mixture Model (GMM)-based approach. The system compensates for undesired acoustic variability and extracts information from the acoustic environment, making it a meaningful choice for detection on UGC. This paper employs the i-vector-based system for audio-based VED on UGC and expands the understanding of the system on the task. It also includes a performance comparison with the conventional GMM-based and state-of-the-art Random Forest (RF)-based systems. The i-vector system aids audio-based event detection by addressing UGC audio characteristics. It outperforms the GMM-based system, and is competitive with the RF-based system in terms of the Missed Detection (MD) rate at 4% and 2.8% False Alarm (FA) rates, and complements the RF-based system by demonstrating slightly improvement in combination over the standalone systems.
引用
收藏
页码:114 / 117
页数:4
相关论文
共 8 条
  • [1] Audio-based event detection for sports video
    Baillie, M
    Jose, JM
    [J]. IMAGE AND VIDEO RETRIEVAL, PROCEEDINGS, 2003, 2728 : 300 - 309
  • [2] A Blind Segmentation Approach to Acoustic Event Detection Based on I-Vector
    Huang, Zhen
    Cheng, You-Chi
    Li, Kehuang
    Hautamaki, Ville
    Lee, Chin-Hui
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2281 - 2285
  • [3] Recurrent Support Vector Machines for Audio-Based Multimedia Event Detection
    Wang, Yun
    Metze, Florian
    [J]. ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2016, : 265 - 269
  • [4] SCENE-DEPENDENT ANOMALOUS ACOUSTIC-EVENT DETECTION BASED ON CONDITIONAL WAVENET AND I-VECTOR
    Komatsu, Tatsuya
    Hayashi, Tomoki
    Kondo, Reishi
    Toda, Tomoki
    Takeda, Kazuya
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 870 - 874
  • [5] Audio-based Event Detection in Office Live Environments Using Optimized MFCC-SVM Approach
    Kucukbay, Selver Ezgi
    Sert, Mustafa
    [J]. 2015 IEEE 9TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2015, : 475 - 480
  • [6] Robust Anchorperson Detection Based on Audio Streams using a Hybrid I-vector and DNN System
    Chang, Yun-Fan
    Lin, Payton
    Cheng, Shao-Hua
    Chan, Kai-Hsuan
    Zeng, Yi-Chong
    Liao, Chia-Wei
    Chang, Wen-Tsung
    Wang, Yu -Chiang
    Tsao, Yu
    [J]. 2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
  • [7] Acoustic Event Detection Based on Feature-Level Fusion of Audio and Video Modalities
    Butko, Taras
    Canton-Ferrer, Cristian
    Segura, Carlos
    Giro, Xavier
    Nadeu, Climent
    Hernando, Javier
    Casas, Josep R.
    [J]. EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2011,
  • [8] Acoustic Event Detection Based on Feature-Level Fusion of Audio and Video Modalities
    Taras Butko
    Cristian Canton-Ferrer
    Carlos Segura
    Xavier Giró
    Climent Nadeu
    Javier Hernando
    Josep R. Casas
    [J]. EURASIP Journal on Advances in Signal Processing, 2011