Multimedia event detection with multimodal feature fusion and temporal concept localization

被引:33
|
作者
Oh, Sangmin [1 ]
McCloskey, Scott [5 ]
Kim, Ilseo [2 ]
Vahdat, Arash [6 ]
Cannons, Kevin J. [6 ]
Hajimirsadeghi, Hossein [6 ]
Mori, Greg [6 ]
Perera, A. G. Amitha [3 ]
Pandey, Megha [4 ]
Corso, Jason J. [7 ]
机构
[1] Kitware Inc, Clifton Pk, NY USA
[2] Kitware Inc, Comp Vis Team, Clifton Pk, NY USA
[3] Kitware Inc, Comp Vis, Clifton Pk, NY USA
[4] Kitware Inc, Comp Vis Grp, Clifton Pk, NY USA
[5] Honeywell Labs, Minneapolis, MN USA
[6] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC V5A 1S6, Canada
[7] SUNY Buffalo, Dept Comp Sci & Engn, Buffalo, NY 14260 USA
关键词
Multimedia; Classification; Machine learning; Fusion;
D O I
10.1007/s00138-013-0525-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a system for multimedia event detection. The developed system characterizes complex multimedia events based on a large array of multimodal features, and classifies unseen videos by effectively fusing diverse responses. We present three major technical innovations. First, we explore novel visual and audio features across multiple semantic granularities, including building, often in an unsupervised manner, mid-level and high-level features upon low-level features to enable semantic understanding. Second, we show a novel Latent SVM model which learns and localizes discriminative high-level concepts in cluttered video sequences. In addition to improving detection accuracy beyond existing approaches, it enables a unique summary for every retrieval by its use of high-level concepts and temporal evidence localization. The resulting summary provides some transparency into why the system classified the video as it did. Finally, we present novel fusion learning algorithms and our methodology to improve fusion learning under limited training data condition. Thorough evaluation on a large TRECVID MED 2011 dataset showcases the benefits of the presented system.
引用
收藏
页码:49 / 69
页数:21
相关论文
共 50 条
  • [31] EFAFN: An Efficient Feature Adaptive Fusion Network with Facial Feature for Multimodal Sarcasm Detection
    Sun, Yukuan
    Zhang, Hangming
    Yang, Shengjiao
    Wang, Jianming
    APPLIED SCIENCES-BASEL, 2022, 12 (21):
  • [32] Robust spatial-temporal deep model for multimedia event detection
    Yu, Litao
    Sun, Xiaoshuai
    Huang, Zi
    NEUROCOMPUTING, 2016, 213 : 48 - 53
  • [33] Video Understanding via Convolutional Temporal Pooling Network and Multimodal Feature Fusion
    Kwon, Heeseung
    Kwak, Suha
    Cho, Minsu
    PROCEEDINGS OF THE 1ST WORKSHOP AND CHALLENGE ON COMPREHENSIVE VIDEO UNDERSTANDING IN THE WILD (COVIEW'18), 2018, : 35 - 39
  • [34] Multimedia Evidence Fusion for Video Concept Detection via OWA Operator
    Li, Ming
    Zheng, Yan-Tao
    Lin, Shou-Xun
    Zhang, Yong-Dong
    Chua, Tat-Seng
    ADVANCES IN MULTIMEDIA MODELING, PROCEEDINGS, 2009, 5371 : 208 - +
  • [35] Event detection in sports video based on multiple feature fusion
    Hua-Yong, Liu
    Tingting, He
    Hui, Zhang
    FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2007, : 446 - +
  • [36] Multisource Multimodal Feature Fusion for Small Leak Detection in Gas Pipelines
    Yan, Wendi
    Liu, Wei
    Zhang, Qiao
    Bi, Hongbo
    Jiang, Chunlei
    Liu, Haixu
    Wang, Tao
    Dong, Taiji
    Ye, Xiaohui
    IEEE SENSORS JOURNAL, 2024, 24 (02) : 1857 - 1865
  • [37] FSFM: A Feature Square Tower Fusion Module for Multimodal Object Detection
    Liu, Xiaomin
    Zhu, Chen
    Yang, Chunyu
    Zhou, Linna
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
  • [38] Multimodal and multiscale feature fusion for weakly supervised video anomaly detection
    Sun, Wenwen
    Cao, Lin
    Guo, Yanan
    Du, Kangning
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [39] MFFFLD: A Multimodal-Feature-Fusion-Based Fingerprint Liveness Detection
    Yuan, Chengsheng
    Jiao, Shengming
    Sun, Xingming
    Wu, Q. M. Jonathan
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2022, 14 (02) : 648 - 661
  • [40] A research for sound event localization and detection based on local-global adaptive fusion and temporal importance network
    Shi, Di
    Guo, Min
    Ma, Miao
    MULTIMEDIA SYSTEMS, 2024, 30 (06)