Multimedia event detection with multimodal feature fusion and temporal concept localization

被引：33

作者：

Oh, Sangmin ^{[1
]}

McCloskey, Scott ^{[5
]}

Kim, Ilseo ^{[2
]}

Vahdat, Arash ^{[6
]}

Cannons, Kevin J. ^{[6
]}

Hajimirsadeghi, Hossein ^{[6
]}

Mori, Greg ^{[6
]}

Perera, A. G. Amitha ^{[3
]}

Pandey, Megha ^{[4
]}

Corso, Jason J. ^{[7
]}

机构：

[1] Kitware Inc, Clifton Pk, NY USA

[2] Kitware Inc, Comp Vis Team, Clifton Pk, NY USA

[3] Kitware Inc, Comp Vis, Clifton Pk, NY USA

[4] Kitware Inc, Comp Vis Grp, Clifton Pk, NY USA

[5] Honeywell Labs, Minneapolis, MN USA

[6] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC V5A 1S6, Canada

[7] SUNY Buffalo, Dept Comp Sci & Engn, Buffalo, NY 14260 USA

来源：

MACHINE VISION AND APPLICATIONS | 2014年 / 25卷 / 01期

关键词：

Multimedia; Classification; Machine learning; Fusion;

D O I：

10.1007/s00138-013-0525-x

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a system for multimedia event detection. The developed system characterizes complex multimedia events based on a large array of multimodal features, and classifies unseen videos by effectively fusing diverse responses. We present three major technical innovations. First, we explore novel visual and audio features across multiple semantic granularities, including building, often in an unsupervised manner, mid-level and high-level features upon low-level features to enable semantic understanding. Second, we show a novel Latent SVM model which learns and localizes discriminative high-level concepts in cluttered video sequences. In addition to improving detection accuracy beyond existing approaches, it enables a unique summary for every retrieval by its use of high-level concepts and temporal evidence localization. The resulting summary provides some transparency into why the system classified the video as it did. Finally, we present novel fusion learning algorithms and our methodology to improve fusion learning under limited training data condition. Thorough evaluation on a large TRECVID MED 2011 dataset showcases the benefits of the presented system.

引用

页码：49 / 69

页数：21

共 50 条

[1] Multimedia event detection with multimodal feature fusion and temporal concept localization
Sangmin Oh
Scott McCloskey
Ilseo Kim
Arash Vahdat
Kevin J. Cannons
Hossein Hajimirsadeghi
Greg Mori
A. G. Amitha Perera
Megha Pandey
Jason J. Corso
Machine Vision and Applications, 2014, 25 : 49 - 69
[2] Multimodal Feature Fusion for Robust Event Detection in Web Videos
Natarajan, Pradeep
Wu, Shuang
Vitaladevuni, Shiv
Zhuang, Xiaodan
Tsakalidis, Stavros
Park, Unsang
Prasad, Rohit
Natarajan, Premkumar
2012 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2012, : 1298 - 1305
[3] Double Fusion for Multimedia Event Detection
Lan, Zhen-zhong
Bao, Lei
Yu, Shoou-I
Liu, Wei
Hauptmann, Alexander G.
ADVANCES IN MULTIMEDIA MODELING, 2012, 7131 : 173 - 185
[4] Concept Based Hybrid Fusion of Multimodal Event Signals
Wang, Yuhui
von der Weth, Christian
Zhang, Yehong
Low, Kian Hsiang
Singh, Vivek K.
Kankanhalli, Mohan
PROCEEDINGS OF 2016 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2016, : 14 - 19
[5] Event detection using multimodal feature analysis
Li, ZY
Tan, YP
2005 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), VOLS 1-6, CONFERENCE PROCEEDINGS, 2005, : 3845 - 3848
[6] Investigating Multimodal Audiovisual Event Detection and Localization
Vryzas, N.
Kotsakis, R.
Dimoulas, C. A.
Kalliris, G.
PROCEEDINGS OF AUDIO MOSTLY 2016 - A CONFERENCE ON INTERACTION WITH SOUND IN COOPERATION WITH ACM, 2016, : 97 - 104
[7] Multimodal information fusion for video concept detection
Wu, Y
Lin, CK
Chang, EY
Smith, JR
ICIP: 2004 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1- 5, 2004, : 2391 - 2394
[8] Efficient Heuristic Methods for Multimodal Fusion and Concept Fusion in Video Concept Detection
Geng, Jie
Miao, Zhenjiang
Zhang, Xiao-Ping
IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (04) : 498 - 511
[9] Multimedia Event Detection using Visual Concept Signatures
Younessian, Ehsan
Quinn, Michael
Mitamura, Teruko
Hauptmann, Alex
MULTIMEDIA CONTENT AND MOBILE DEVICES, 2013, 8667
[10] Multimedia classification and event detection using double fusion
Lan, Zhen-zhong
Bao, Lei
Yu, Shoou-I
Liu, Wei
Hauptmann, Alexander G.
MULTIMEDIA TOOLS AND APPLICATIONS, 2014, 71 (01) : 333 - 347

← 1 2 3 4 5 →