Recurrent Support Vector Machines for Audio-Based Multimedia Event Detection

被引：3

作者：

Wang, Yun ^{[1
]}

Metze, Florian ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA

来源：

ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL | 2016年

关键词：

D O I：

10.1145/2911996.2912048

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Multimedia event detection (MED) is the task of detecting given events (e.g. parade, birthday party) in a large collection of video clips. While the most useful information comes from visual features and speech recognition, a lot can also be inferred from the non-speech audio content, either alone or in conjunction with visual and speech cues. This paper studies MED with non-speech audio information only. MED is usually performed in two stages. The first stage generates a representation for each clip in the form of either a single vector or a sequence of vectors, often by aggregating frame-level features; the second stage performs binary or multi-class classification to decide whether each target event occurs in each clip. Common classifiers used for the second stage include support vector machines (SVMs), feed-forward deep neural networks (DNNs), and recurrent neural networks (RNNs). In this paper, we propose to classify clips for events using "recurrent SVMs". These models combine the kernel mapping and the large-margin optimization criterion of SVMs, and the ability to process sequences of variable lengths of RNNs. Reinforced with data augmentation, recurrent SVMs have achieved higher mean average precision (MAP) on the TRECVID 2011 MED task than both SVMs and RNNs.

引用

页码：265 / 269

页数：5

共 50 条

[1] AUDIO-BASED MULTIMEDIA EVENT DETECTION USING DEEP RECURRENT NEURAL NETWORKS
Wang, Yun
Neves, Leonardo
Metze, Florian
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 2742 - 2746
[2] Audio-Based Multimedia Event Detection with DNNs and Sparse Sampling
Ashraf, Khalid
Elizalde, Benjamin
Iandola, Forrest
Moskewicz, Matthew
Bernd, Julia
Friedland, Gerald
Keutzer, Kurt
[J]. ICMR'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2015, : 611 - 614
[3] Audio-Based Emotion Recognition in Judicial Domain: A Multilayer Support Vector Machines Approach
Fersini, E.
Messina, E.
Arosio, G.
Archetti, F.
[J]. MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, 2009, 5632 : 594 - 602
[4] Audio-based event detection for sports video
Baillie, M
Jose, JM
[J]. IMAGE AND VIDEO RETRIEVAL, PROCEEDINGS, 2003, 2728 : 300 - 309
[5] Event detection in an audio-based sensor network
Smeaton, Alan F.
McHugh, Michael
[J]. MULTIMEDIA SYSTEMS, 2006, 12 (03) : 179 - 194
[6] Insights into Audio-Based Multimedia Event Classification with Neural Networks
Ravanelli, Mirco
Elizalde, Benjamin
Bernd, Julia
Friedland, Gerald
[J]. MMCOMMONS'15: PROCEEDINGS OF THE 2015 WORKSHOP ON COMMUNITY-ORGANIZED MULTIMODAL MINING: OPPORTUNITIES FOR NOVEL SOLUTIONS, 2015, : 19 - 23
[7] Audio-based event detection in the operating room
Fuchtmann, Jonas
Riedel, Thomas
Berlet, Maximilian
Jell, Alissa
Wegener, Luca
Wagner, Lars
Graf, Simone
Wilhelm, Dirk
Ostler-Mildner, Daniel
[J]. INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2024,
[8] Event detection in an audio-based sensor network
Alan F. Smeaton
Michael McHugh
[J]. Multimedia Systems, 2006, 12 : 179 - 194
[9] Audio based event detection for multimedia surveillance
Atrey, Pradeep K.
Maddage, Namunu C.
Kankanhalli, Mohan S.
[J]. 2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 5671 - 5674
[10] Sound Event Classification with Feature Vector Combination for Automatic Audio-based Surveillance
Lee, Seunghyung
Park, Jinuk
Park, Sangjun
Hahn, Minsoo
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2016,

← 1 2 3 4 5 →