Semantic Model Vectors for Complex Video Event Recognition

被引:84
|
作者
Merler, Michele [1 ]
Huang, Bert [2 ]
Xie, Lexing [3 ]
Hua, Gang [4 ]
Natsev, Apostol [5 ]
机构
[1] Columbia Univ, Dept Comp Sci, New York, NY 10027 USA
[2] Univ Maryland, Dept Comp Sci, College Pk, MD 20740 USA
[3] Australian Natl Univ, Res Sch Comp Sci, Canberra, ACT 0200, Australia
[4] Stevens Inst Technol, Dept Comp Sci, Hoboken, NJ 07030 USA
[5] IBM TJ Watson Res Ctr, Hawthorne, NY 10532 USA
关键词
Complex video events; event recognition; high-level descriptor; SCENE;
D O I
10.1109/TMM.2011.2168948
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose semantic model vectors, an intermediate level semantic representation, as a basis for modeling and detecting complex events in unconstrained real-world videos, such as those from YouTube. The semantic model vectors are extracted using a set of discriminative semantic classifiers, each being an ensemble of SVM models trained from thousands of labeled web images, for a total of 280 generic concepts. Our study reveals that the proposed semantic model vectors representation outperforms-and is complementary to-other low-level visual descriptors for video event modeling. We hence present an end-to-end video event detection system, which combines semantic model vectors with other static or dynamic visual descriptors, extracted at the frame, segment, or full clip level. We perform a comprehensive empirical study on the 2010 TRECVID Multimedia Event Detection task (http://www.nist.gov/itl/iad/mig/med10.cfm), which validates the semantic model vectors representation not only as the best individual descriptor, outperforming state-of-the-art global and local static features as well as spatio-temporal HOG and HOF descriptors, but also as the most compact. We also study early and late feature fusion across the various approaches, leading to a 15% performance boost and an overall system performance of 0.46 mean average precision. In order to promote further research in this direction, we made our semantic model vectors for the TRECVID MED 2010 set publicly available for the community to use (http://www1.cs.columbia.edu/similar to mmerler/SMV.html).
引用
收藏
页码:88 / 101
页数:14
相关论文
共 50 条
  • [1] Event detection and recognition for semantic annotation of video
    Lamberto Ballan
    Marco Bertini
    Alberto Del Bimbo
    Lorenzo Seidenari
    Giuseppe Serra
    [J]. Multimedia Tools and Applications, 2011, 51 : 279 - 302
  • [2] Event detection and recognition for semantic annotation of video
    Ballan, Lamberto
    Bertini, Marco
    Del Bimbo, Alberto
    Seidenari, Lorenzo
    Serra, Giuseppe
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2011, 51 (01) : 279 - 302
  • [3] Video Event Recognition Leveraging Hierarchy of Semantic Concepts
    Soltanian, Mohammad
    Ghaemmaghami, Shahrokh
    [J]. 2017 25TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2017, : 1549 - 1553
  • [4] Video Event Recognition with Fuzzy Semantic Petri Nets
    Szwed, Piotr
    [J]. MAN-MACHINE INTERACTIONS 3, 2014, 242 : 431 - 439
  • [5] A Semantic Model for Video Based Face Recognition
    Gong, Dihong
    Zhu, Kai
    Li, Zhifeng
    Qiao, Yu
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION (ICIA), 2013, : 1369 - 1374
  • [6] Leveraging Weak Semantic Relevance for Complex Video Event Classification
    Li, Chao
    Cao, Jiewei
    Huang, Zi
    Zhu, Lei
    Shen, Heng Tao
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 3667 - 3676
  • [7] Object Tracking and Video Event Recognition with Fuzzy Semantic Petri Nets
    Szwed, Piotr
    Komorkiewicz, Mateusz
    [J]. 2013 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS (FEDCSIS), 2013, : 167 - 174
  • [8] Web Video Event Recognition by Semantic Analysis From Ubiquitous Documents
    Yu, Litao
    Yang, Yang
    Huang, Zi
    Wang, Peng
    Song, Jingkuan
    Shen, Heng Tao
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 26 (12) : 5689 - 5701
  • [9] Deep Fusion of Multiple Semantic Cues for Complex Event Recognition
    Zhang, Xishan
    Zhang, Hanwang
    Zhang, Yongdong
    Yang, Yang
    Wang, Meng
    Luan, Huanbo
    Li, Jintao
    Chua, Tat-Seng
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (03) : 1033 - 1046
  • [10] SUM-MAX VIDEO POOLING FOR COMPLEX EVENT RECOGNITION
    Phan, Sang
    Duy-Dinh Le
    Satoh, Shin'ichi
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 1026 - 1030