Recurrent Support Vector Machines for Audio-Based Multimedia Event Detection

被引:3
|
作者
Wang, Yun [1 ]
Metze, Florian [1 ]
机构
[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
关键词
D O I
10.1145/2911996.2912048
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Multimedia event detection (MED) is the task of detecting given events (e.g. parade, birthday party) in a large collection of video clips. While the most useful information comes from visual features and speech recognition, a lot can also be inferred from the non-speech audio content, either alone or in conjunction with visual and speech cues. This paper studies MED with non-speech audio information only. MED is usually performed in two stages. The first stage generates a representation for each clip in the form of either a single vector or a sequence of vectors, often by aggregating frame-level features; the second stage performs binary or multi-class classification to decide whether each target event occurs in each clip. Common classifiers used for the second stage include support vector machines (SVMs), feed-forward deep neural networks (DNNs), and recurrent neural networks (RNNs). In this paper, we propose to classify clips for events using "recurrent SVMs". These models combine the kernel mapping and the large-margin optimization criterion of SVMs, and the ability to process sequences of variable lengths of RNNs. Reinforced with data augmentation, recurrent SVMs have achieved higher mean average precision (MAP) on the TRECVID 2011 MED task than both SVMs and RNNs.
引用
收藏
页码:265 / 269
页数:5
相关论文
共 50 条
  • [1] AUDIO-BASED MULTIMEDIA EVENT DETECTION USING DEEP RECURRENT NEURAL NETWORKS
    Wang, Yun
    Neves, Leonardo
    Metze, Florian
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 2742 - 2746
  • [2] Audio-Based Multimedia Event Detection with DNNs and Sparse Sampling
    Ashraf, Khalid
    Elizalde, Benjamin
    Iandola, Forrest
    Moskewicz, Matthew
    Bernd, Julia
    Friedland, Gerald
    Keutzer, Kurt
    [J]. ICMR'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2015, : 611 - 614
  • [3] Audio-Based Emotion Recognition in Judicial Domain: A Multilayer Support Vector Machines Approach
    Fersini, E.
    Messina, E.
    Arosio, G.
    Archetti, F.
    [J]. MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, 2009, 5632 : 594 - 602
  • [4] Audio-based event detection for sports video
    Baillie, M
    Jose, JM
    [J]. IMAGE AND VIDEO RETRIEVAL, PROCEEDINGS, 2003, 2728 : 300 - 309
  • [5] Event detection in an audio-based sensor network
    Smeaton, Alan F.
    McHugh, Michael
    [J]. MULTIMEDIA SYSTEMS, 2006, 12 (03) : 179 - 194
  • [6] Insights into Audio-Based Multimedia Event Classification with Neural Networks
    Ravanelli, Mirco
    Elizalde, Benjamin
    Bernd, Julia
    Friedland, Gerald
    [J]. MMCOMMONS'15: PROCEEDINGS OF THE 2015 WORKSHOP ON COMMUNITY-ORGANIZED MULTIMODAL MINING: OPPORTUNITIES FOR NOVEL SOLUTIONS, 2015, : 19 - 23
  • [7] Audio-based event detection in the operating room
    Fuchtmann, Jonas
    Riedel, Thomas
    Berlet, Maximilian
    Jell, Alissa
    Wegener, Luca
    Wagner, Lars
    Graf, Simone
    Wilhelm, Dirk
    Ostler-Mildner, Daniel
    [J]. INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2024,
  • [8] Event detection in an audio-based sensor network
    Alan F. Smeaton
    Michael McHugh
    [J]. Multimedia Systems, 2006, 12 : 179 - 194
  • [9] Audio based event detection for multimedia surveillance
    Atrey, Pradeep K.
    Maddage, Namunu C.
    Kankanhalli, Mohan S.
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 5671 - 5674
  • [10] Sound Event Classification with Feature Vector Combination for Automatic Audio-based Surveillance
    Lee, Seunghyung
    Park, Jinuk
    Park, Sangjun
    Hahn, Minsoo
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2016,