Learning Bag of Spatio-Temporal Features for Human Interaction Recognition

被引:3
|
作者
Slimani, Khadidja Nour El Houda [1 ]
Benezeth, Yannick [2 ]
Souami, Feryel [1 ]
机构
[1] Univ Sci & Technol Houari Boumediene, LRIA, BP 32 El Alia, Algiers 16111, Algeria
[2] Univ Burgundy Franche Comte, ImViA EA 7535, F-21000 Dijon, France
来源
TWELFTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2019) | 2020年 / 11433卷
关键词
Human interaction; Edge-based region; MSER; Bag of Visual Words; 3D-SIFT; Sum of Histograms; SVM; VIDEOS;
D O I
10.1117/12.2559268
中图分类号
O43 [光学];
学科分类号
070207 ; 0803 ;
摘要
Bag of Visual Words Model (BoVW) has achieved impressive performance on human activity recognition. However, it is extremely difficult to capture high-level semantic meanings behind video features with this method as the spatiotemporal distribution of visual words is ignored, preventing localization of the interactions within a video. In this paper, we propose a supervised learning framework that automatically recognizes high-level human interaction based on a bag of spatiotemporal visual features. At first, a representative baseline keyframe that captures the major body parts of the interacting persons is selected and the bounding boxes containing persons are extracted to parse the poses of all persons in the interaction. Based on this keyframe, features are detected by combining edge features and Maximally Stable Extremal Regions (MSER) features for each interacting person and backward-forward tracked over the entire video sequence. Based on feature tracks, 3D XYT spatiotemporal volumes are generated for each interacting target. Then, the K-means algorithm is used to build a codebook of visual features to represent a given interaction. The interaction is then represented by the sum of the frequency occurrence of visual words between persons. Extensive experimental evaluations on the UT-interaction dataset demonstrate the strength of our method to recognize the ongoing interactions from videos with a simple implementation.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Human Interaction Recognition Using Improved Spatio-Temporal Features
    Sivarathinabala, M.
    Abirami, S.
    PROCEEDINGS OF 3RD INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING, NETWORKING AND INFORMATICS (ICACNI 2015), VOL 1, 2016, 43 : 191 - 199
  • [2] Spatio-Temporal Frames in a Bag-of-visual-features Approach for Human Actions Recognition
    Lopes, Ana Paula B.
    Oliveira, Rodrigo S.
    de Almeida, Jussara M.
    Araujo, Arnaldo de A.
    2009 XXII BRAZILIAN SYMPOSIUM ON COMPUTER GRAPHICS AND IMAGE PROCESSING (SIBGRAPI 2009), 2009, : 315 - 321
  • [3] Bag of Spatio-temporal Synonym Sets for Human Action Recognition
    Pang, Lin
    Cao, Juan
    Guo, Junbo
    Lin, Shouxun
    Song, Yan
    ADVANCES IN MULTIMEDIA MODELING, PROCEEDINGS, 2010, 5916 : 422 - 432
  • [4] Human Action Recognition by Learning Spatio-Temporal Features With Deep Neural Networks
    Wang, Lei
    Xu, Yangyang
    Cheng, Jun
    Xia, Haiying
    Yin, Jianqin
    Wu, Jiaji
    IEEE ACCESS, 2018, 6 : 17913 - 17922
  • [5] Spatio-temporal Semantic Features for Human Action Recognition
    Liu, Jia
    Wang, Xiaonian
    Li, Tianyu
    Yang, Jie
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2012, 6 (10): : 2632 - 2649
  • [6] Human Action Recognition Based on Spatio-temporal Features
    Sawant, Nikhil
    Biswas, K. K.
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PROCEEDINGS, 2009, 5909 : 357 - 362
  • [7] Affective interaction recognition using spatio-temporal features and context
    Liang, Jinglian
    Xu, Chao
    Feng, Zhiyong
    Ma, Xirong
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2016, 144 : 155 - 165
  • [8] Robust Spatio-Temporal Features for Human Interaction Recognition via Artificial Neural Network
    Jalal, Ahmad
    Mahmood, Maria
    Sidduqi, M. A.
    2018 INTERNATIONAL CONFERENCE ON FRONTIERS OF INFORMATION TECHNOLOGY (FIT 2018), 2018, : 218 - 223
  • [9] Evaluating a bag-of-visual features approach using spatio-temporal features for action recognition
    Nazir, Saima
    Yousaf, Muhammad Haroon
    Velastin, Sergio A.
    COMPUTERS & ELECTRICAL ENGINEERING, 2018, 72 : 660 - 669
  • [10] Accelerated Learning of Discriminative Spatio-temporal Features for Action Recognition
    Varshney, Munender
    Rameshan, Renu
    2016 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS (SPCOM), 2016,