Multimedia event detection with multimodal feature fusion and temporal concept localization

被引:33
|
作者
Oh, Sangmin [1 ]
McCloskey, Scott [5 ]
Kim, Ilseo [2 ]
Vahdat, Arash [6 ]
Cannons, Kevin J. [6 ]
Hajimirsadeghi, Hossein [6 ]
Mori, Greg [6 ]
Perera, A. G. Amitha [3 ]
Pandey, Megha [4 ]
Corso, Jason J. [7 ]
机构
[1] Kitware Inc, Clifton Pk, NY USA
[2] Kitware Inc, Comp Vis Team, Clifton Pk, NY USA
[3] Kitware Inc, Comp Vis, Clifton Pk, NY USA
[4] Kitware Inc, Comp Vis Grp, Clifton Pk, NY USA
[5] Honeywell Labs, Minneapolis, MN USA
[6] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC V5A 1S6, Canada
[7] SUNY Buffalo, Dept Comp Sci & Engn, Buffalo, NY 14260 USA
关键词
Multimedia; Classification; Machine learning; Fusion;
D O I
10.1007/s00138-013-0525-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a system for multimedia event detection. The developed system characterizes complex multimedia events based on a large array of multimodal features, and classifies unseen videos by effectively fusing diverse responses. We present three major technical innovations. First, we explore novel visual and audio features across multiple semantic granularities, including building, often in an unsupervised manner, mid-level and high-level features upon low-level features to enable semantic understanding. Second, we show a novel Latent SVM model which learns and localizes discriminative high-level concepts in cluttered video sequences. In addition to improving detection accuracy beyond existing approaches, it enables a unique summary for every retrieval by its use of high-level concepts and temporal evidence localization. The resulting summary provides some transparency into why the system classified the video as it did. Finally, we present novel fusion learning algorithms and our methodology to improve fusion learning under limited training data condition. Thorough evaluation on a large TRECVID MED 2011 dataset showcases the benefits of the presented system.
引用
收藏
页码:49 / 69
页数:21
相关论文
共 50 条
  • [1] Multimedia event detection with multimodal feature fusion and temporal concept localization
    Sangmin Oh
    Scott McCloskey
    Ilseo Kim
    Arash Vahdat
    Kevin J. Cannons
    Hossein Hajimirsadeghi
    Greg Mori
    A. G. Amitha Perera
    Megha Pandey
    Jason J. Corso
    Machine Vision and Applications, 2014, 25 : 49 - 69
  • [2] Multimodal Feature Fusion for Robust Event Detection in Web Videos
    Natarajan, Pradeep
    Wu, Shuang
    Vitaladevuni, Shiv
    Zhuang, Xiaodan
    Tsakalidis, Stavros
    Park, Unsang
    Prasad, Rohit
    Natarajan, Premkumar
    2012 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2012, : 1298 - 1305
  • [3] Double Fusion for Multimedia Event Detection
    Lan, Zhen-zhong
    Bao, Lei
    Yu, Shoou-I
    Liu, Wei
    Hauptmann, Alexander G.
    ADVANCES IN MULTIMEDIA MODELING, 2012, 7131 : 173 - 185
  • [4] Concept Based Hybrid Fusion of Multimodal Event Signals
    Wang, Yuhui
    von der Weth, Christian
    Zhang, Yehong
    Low, Kian Hsiang
    Singh, Vivek K.
    Kankanhalli, Mohan
    PROCEEDINGS OF 2016 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2016, : 14 - 19
  • [5] Event detection using multimodal feature analysis
    Li, ZY
    Tan, YP
    2005 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), VOLS 1-6, CONFERENCE PROCEEDINGS, 2005, : 3845 - 3848
  • [6] Investigating Multimodal Audiovisual Event Detection and Localization
    Vryzas, N.
    Kotsakis, R.
    Dimoulas, C. A.
    Kalliris, G.
    PROCEEDINGS OF AUDIO MOSTLY 2016 - A CONFERENCE ON INTERACTION WITH SOUND IN COOPERATION WITH ACM, 2016, : 97 - 104
  • [7] Multimodal information fusion for video concept detection
    Wu, Y
    Lin, CK
    Chang, EY
    Smith, JR
    ICIP: 2004 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1- 5, 2004, : 2391 - 2394
  • [8] Efficient Heuristic Methods for Multimodal Fusion and Concept Fusion in Video Concept Detection
    Geng, Jie
    Miao, Zhenjiang
    Zhang, Xiao-Ping
    IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (04) : 498 - 511
  • [9] Multimedia Event Detection using Visual Concept Signatures
    Younessian, Ehsan
    Quinn, Michael
    Mitamura, Teruko
    Hauptmann, Alex
    MULTIMEDIA CONTENT AND MOBILE DEVICES, 2013, 8667
  • [10] Multimedia classification and event detection using double fusion
    Lan, Zhen-zhong
    Bao, Lei
    Yu, Shoou-I
    Liu, Wei
    Hauptmann, Alexander G.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2014, 71 (01) : 333 - 347