A component-based video content representation for action recognition

被引:9
|
作者
Adeli, Vida [1 ]
Fazl-Ersi, Ehsan [1 ]
Harati, Ahad [1 ]
机构
[1] Ferdowsi Univ Mashhad, Dept Comp Engn, Mashhad 9177948944, Razavi Khorasan, Iran
关键词
Actionness likelihood; Action recognition; Action components; LSTM; Three-stream convolutional neural network; MOTION REPRESENTATION;
D O I
10.1016/j.imavis.2019.08.009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper investigates the challenging problem of action recognition in videos and proposes a new component-based approach for video content representation. Although satisfactory performance for action recognition has already been obtained for certain scenarios, many of the existing solutions require fully-annotated video datasets in which region of the activity in each frame is specified by a bounding box. Another group of methods require auxiliary techniques to extract human-related areas in the video frames before being able to accurately recognize actions. In this paper, a Weakly-Supervised Learning (WSL) framework is introduced that eliminates the need for per-frame annotations and learns video representations that improve recognition accuracy and also highlights the activity related regions within each frame. To this end, two new representation ideas are proposed, one focus on representing the main components of an action, i.e. actionness regions, and the other focus on encoding the background context to represent general and holistic cues. A three-stream CNN is developed, which takes the two proposed representations and combines them with a motion-encoding stream. Temporal cues in each of the three different streams are modeled through LSTM, and finally fully-connected neural network layers are used to fuse various streams and produce the final video representation. Experimental results on four challenging datasets, demonstrate that the proposed Component-based Multi-stream CNN model (CM-CNN), trained on a WSL setting, outperforms the state-of-the-art in action recognition, even the fully-supervised approaches. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页数:15
相关论文
共 50 条
  • [11] A Robust and Efficient Video Representation for Action Recognition
    Wang, Heng
    Oneata, Dan
    Verbeek, Jakob
    Schmid, Cordelia
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2016, 119 (03) : 219 - 238
  • [12] Exploring Multimodal Video Representation for Action Recognition
    Wang, Cheng
    Yang, Haojin
    Meinel, Christoph
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 1924 - 1931
  • [13] Hybrid Component-Based Face Recognition System
    Dargham, Jamal Ahmad
    Chekima, Ali
    Hamdan, Munira
    DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE, 2012, 151 : 573 - 580
  • [14] Enhanced Hybrid Component-Based Face Recognition
    Gumede, Andile M.
    Viriri, Serestina
    Gwetu, Mandlenkosi V.
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2017, PT I, 2017, 10448 : 257 - 265
  • [15] Component-Based Recognition of Faces and Facial Expressions
    Taheri, Sima
    Patel, Vishal M.
    Chellappa, Rama
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2013, 4 (04) : 360 - 371
  • [16] Component-based content linking beyond the application
    Meinecke, Johannes
    Majer, Frederic
    Gaedke, Martin
    WEB ENGINEERING, PROCEEDINGS, 2007, 4607 : 427 - +
  • [17] Component-based video player model for network computers
    Liu, Fa-Gui
    Huang, Kai-Yao
    Zhang, Hui
    Huanan Ligong Daxue Xuebao/Journal of South China University of Technology (Natural Science), 2005, 33 (07): : 24 - 27
  • [18] Action recognition based on spatio-temporal information and nonnegative component representation
    Wang J.
    Zhang X.
    Zhang P.
    Jiang L.
    Luo L.
    Dongnan Daxue Xuebao (Ziran Kexue Ban)/Journal of Southeast University (Natural Science Edition), 2016, 46 (04): : 675 - 680
  • [19] TOWARDS TEMPORAL ADAPTIVE REPRESENTATION FOR VIDEO ACTION RECOGNITION
    Cai, Junjie
    Yu, Jie
    Imai, Francisco
    Tian, Qi
    2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 4155 - 4159
  • [20] Spatiotemporal Saliency Representation Learning for Video Action Recognition
    Kong, Yongqiang
    Wang, Yunhong
    Li, Annan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1515 - 1528