Multimedia event detection with multimodal feature fusion and temporal concept localization

被引:33
|
作者
Oh, Sangmin [1 ]
McCloskey, Scott [5 ]
Kim, Ilseo [2 ]
Vahdat, Arash [6 ]
Cannons, Kevin J. [6 ]
Hajimirsadeghi, Hossein [6 ]
Mori, Greg [6 ]
Perera, A. G. Amitha [3 ]
Pandey, Megha [4 ]
Corso, Jason J. [7 ]
机构
[1] Kitware Inc, Clifton Pk, NY USA
[2] Kitware Inc, Comp Vis Team, Clifton Pk, NY USA
[3] Kitware Inc, Comp Vis, Clifton Pk, NY USA
[4] Kitware Inc, Comp Vis Grp, Clifton Pk, NY USA
[5] Honeywell Labs, Minneapolis, MN USA
[6] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC V5A 1S6, Canada
[7] SUNY Buffalo, Dept Comp Sci & Engn, Buffalo, NY 14260 USA
关键词
Multimedia; Classification; Machine learning; Fusion;
D O I
10.1007/s00138-013-0525-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a system for multimedia event detection. The developed system characterizes complex multimedia events based on a large array of multimodal features, and classifies unseen videos by effectively fusing diverse responses. We present three major technical innovations. First, we explore novel visual and audio features across multiple semantic granularities, including building, often in an unsupervised manner, mid-level and high-level features upon low-level features to enable semantic understanding. Second, we show a novel Latent SVM model which learns and localizes discriminative high-level concepts in cluttered video sequences. In addition to improving detection accuracy beyond existing approaches, it enables a unique summary for every retrieval by its use of high-level concepts and temporal evidence localization. The resulting summary provides some transparency into why the system classified the video as it did. Finally, we present novel fusion learning algorithms and our methodology to improve fusion learning under limited training data condition. Thorough evaluation on a large TRECVID MED 2011 dataset showcases the benefits of the presented system.
引用
收藏
页码:49 / 69
页数:21
相关论文
共 50 条
  • [41] DETECTION IN COMPLEX SCENES USING RGB AND DEPTH MULTIMODAL FEATURE FUSION
    Yan, Shengli
    Rao, Yuan
    Hou, Wenhui
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 2495 - 2499
  • [42] UNSUPERVISED FEATURE EXTRACTION FOR MULTIMEDIA EVENT DETECTION AND RANKING USING AUDIO CONTENT
    Amid, Ehsan
    Mesaros, Annamaria
    Palomaki, Kalle J.
    Laaksonen, Jorma
    Kurimo, Mikko
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [43] Multimedia Event Detection Using Segment-Based Approach for Motion Feature
    Sang Phan
    Thanh Duc Ngo
    Vu Lam
    Son Tran
    Duy-Dinh Le
    Duc Anh Duong
    Satoh, Shin'ichi
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2014, 74 (01): : 19 - 31
  • [44] Multimedia Event Detection Using Segment-Based Approach for Motion Feature
    Sang Phan
    Thanh Duc Ngo
    Vu Lam
    Son Tran
    Duy-Dinh Le
    Duc Anh Duong
    Shin’ichi Satoh
    Journal of Signal Processing Systems, 2014, 74 : 19 - 31
  • [45] CONNECTIONIST TEMPORAL LOCALIZATION FOR SOUND EVENT DETECTION WITH SEQUENTIAL LABELING
    Wang, Yun
    Metze, Florian
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 745 - 749
  • [46] Spatial-Temporal Feature Fusion for Human Fall Detection
    Ma, Xin
    Wang, Haibo
    Xue, Bingxia
    Li, Yibin
    COMPUTER VISION, CCCV 2015, PT I, 2015, 546 : 438 - 447
  • [47] Multimodal discovering and fusion for semantics multimedia analysis
    Hong, He
    ALPIT 2007: PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON ADVANCED LANGUAGE PROCESSING AND WEB INFORMATION TECHNOLOGY, 2007, : 155 - 158
  • [48] Soccer Event Detection via Collaborative Multimodal Feature Analysis and Candidate Ranking
    Halin, Alfian Abdul
    Rajeswari, Mandava
    Abbasnejad, Mohammad
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2013, 10 (05) : 493 - 502
  • [49] Multimodal motor imagery decoding method based on temporal spatial feature alignment and fusion
    Zhang, Yukun
    Qiu, Shuang
    He, Huiguang
    JOURNAL OF NEURAL ENGINEERING, 2023, 20 (02)
  • [50] MULTIMEDIA EVENT DETECTION VIA DEEP SPATIAL-TEMPORAL NEURAL NETWORKS
    Hou, Jingyi
    Wu, Xinxiao
    Yu, Feiwu
    Jia, Yunde
    2016 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO (ICME), 2016,