Multimedia event detection with multimodal feature fusion and temporal concept localization

被引:33
|
作者
Oh, Sangmin [1 ]
McCloskey, Scott [5 ]
Kim, Ilseo [2 ]
Vahdat, Arash [6 ]
Cannons, Kevin J. [6 ]
Hajimirsadeghi, Hossein [6 ]
Mori, Greg [6 ]
Perera, A. G. Amitha [3 ]
Pandey, Megha [4 ]
Corso, Jason J. [7 ]
机构
[1] Kitware Inc, Clifton Pk, NY USA
[2] Kitware Inc, Comp Vis Team, Clifton Pk, NY USA
[3] Kitware Inc, Comp Vis, Clifton Pk, NY USA
[4] Kitware Inc, Comp Vis Grp, Clifton Pk, NY USA
[5] Honeywell Labs, Minneapolis, MN USA
[6] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC V5A 1S6, Canada
[7] SUNY Buffalo, Dept Comp Sci & Engn, Buffalo, NY 14260 USA
关键词
Multimedia; Classification; Machine learning; Fusion;
D O I
10.1007/s00138-013-0525-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a system for multimedia event detection. The developed system characterizes complex multimedia events based on a large array of multimodal features, and classifies unseen videos by effectively fusing diverse responses. We present three major technical innovations. First, we explore novel visual and audio features across multiple semantic granularities, including building, often in an unsupervised manner, mid-level and high-level features upon low-level features to enable semantic understanding. Second, we show a novel Latent SVM model which learns and localizes discriminative high-level concepts in cluttered video sequences. In addition to improving detection accuracy beyond existing approaches, it enables a unique summary for every retrieval by its use of high-level concepts and temporal evidence localization. The resulting summary provides some transparency into why the system classified the video as it did. Finally, we present novel fusion learning algorithms and our methodology to improve fusion learning under limited training data condition. Thorough evaluation on a large TRECVID MED 2011 dataset showcases the benefits of the presented system.
引用
收藏
页码:49 / 69
页数:21
相关论文
共 50 条
  • [21] Vehicle Detection for Unmanned Systems Based on Multimodal Feature Fusion
    Wang, Yuli
    Liu, Hui
    Chen, Nan
    APPLIED SCIENCES-BASEL, 2022, 12 (12):
  • [22] Smart Contract Vulnerability Detection Based on Multimodal Feature Fusion
    Yu, Jie
    Yu, Xiao
    Li, Jiale
    Sun, Haoxin
    Sun, Mengdi
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024, 2024, 14864 : 344 - 355
  • [23] Series Arc Fault Detection Based on Multimodal Feature Fusion
    Qu, Na
    Wei, Wenlong
    Hu, Congqiang
    SENSORS, 2023, 23 (17)
  • [24] Knowledge Based Multimodal Result Fusion for Distributed and Heterogeneous Multimedia Environments: Concept and Ideas
    Stegmaier, Florian
    Buerger, Tobias
    Doeller, Mario
    Kosch, Harald
    ADAPTIVE MULTIMEDIA RETRIEVAL: CONTEXT, EXPLORATION, AND FUSION, 2012, 6817 : 61 - +
  • [25] Multimodal fusion for multimedia analysis: a survey
    Atrey, Pradeep K.
    Hossain, M. Anwar
    El Saddik, Abdulmotaleb
    Kankanhalli, Mohan S.
    MULTIMEDIA SYSTEMS, 2010, 16 (06) : 345 - 379
  • [26] Multimodal fusion for multimedia analysis: a survey
    Pradeep K. Atrey
    M. Anwar Hossain
    Abdulmotaleb El Saddik
    Mohan S. Kankanhalli
    Multimedia Systems, 2010, 16 : 345 - 379
  • [27] MULTIMODAL INFORMATION FUSION AND TEMPORAL INTEGRATION FOR VIOLENCE DETECTION IN MOVIES
    Penet, Cedric
    Demarty, Claire-Helene
    Gravier, Guillaume
    Gros, Patrick
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 2393 - 2396
  • [28] Evaluating multimedia features and fusion for example-based event detection
    Myers, Gregory K.
    Nallapati, Ramesh
    van Hout, Julien
    Pancoast, Stephanie
    Nevatia, Ramakant
    Sun, Chen
    Habibian, Amirhossein
    Koelma, Dennis C.
    van de Sande, Koen E. A.
    Smeulders, Arnold W. M.
    Snoek, Cees G. M.
    MACHINE VISION AND APPLICATIONS, 2014, 25 (01) : 17 - 32
  • [29] Evaluating multimedia features and fusion for example-based event detection
    Gregory K. Myers
    Ramesh Nallapati
    Julien van Hout
    Stephanie Pancoast
    Ramakant Nevatia
    Chen Sun
    Amirhossein Habibian
    Dennis C. Koelma
    Koen E. A. van de Sande
    Arnold W. M. Smeulders
    Cees G. M. Snoek
    Machine Vision and Applications, 2014, 25 : 17 - 32
  • [30] LATE FUSION AND CALIBRATION FOR MULTIMEDIA EVENT DETECTION USING FEW EXAMPLES
    van Hout, Julien
    Yeh, Eric
    Koelma, Dennis C.
    Snoek, Cees G. M.
    Sun, Chen
    Nevatia, Ramakant
    Wong, Julie
    Myers, Gregory K.
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,