Learning Bag of Spatio-Temporal Features for Human Interaction Recognition

被引:3
|
作者
Slimani, Khadidja Nour El Houda [1 ]
Benezeth, Yannick [2 ]
Souami, Feryel [1 ]
机构
[1] Univ Sci & Technol Houari Boumediene, LRIA, BP 32 El Alia, Algiers 16111, Algeria
[2] Univ Burgundy Franche Comte, ImViA EA 7535, F-21000 Dijon, France
来源
TWELFTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2019) | 2020年 / 11433卷
关键词
Human interaction; Edge-based region; MSER; Bag of Visual Words; 3D-SIFT; Sum of Histograms; SVM; VIDEOS;
D O I
10.1117/12.2559268
中图分类号
O43 [光学];
学科分类号
070207 ; 0803 ;
摘要
Bag of Visual Words Model (BoVW) has achieved impressive performance on human activity recognition. However, it is extremely difficult to capture high-level semantic meanings behind video features with this method as the spatiotemporal distribution of visual words is ignored, preventing localization of the interactions within a video. In this paper, we propose a supervised learning framework that automatically recognizes high-level human interaction based on a bag of spatiotemporal visual features. At first, a representative baseline keyframe that captures the major body parts of the interacting persons is selected and the bounding boxes containing persons are extracted to parse the poses of all persons in the interaction. Based on this keyframe, features are detected by combining edge features and Maximally Stable Extremal Regions (MSER) features for each interacting person and backward-forward tracked over the entire video sequence. Based on feature tracks, 3D XYT spatiotemporal volumes are generated for each interacting target. Then, the K-means algorithm is used to build a codebook of visual features to represent a given interaction. The interaction is then represented by the sum of the frequency occurrence of visual words between persons. Extensive experimental evaluations on the UT-interaction dataset demonstrate the strength of our method to recognize the ongoing interactions from videos with a simple implementation.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] A fast human action recognition network based on spatio-temporal features
    Xu, Jie
    Song, Rui
    Wei, Haoliang
    Guo, Jinhong
    Zhou, Yifei
    Huang, Xiwei
    Neurocomputing, 2021, 441 : 350 - 358
  • [22] Human Action Recognition by SOM Considering the Probability of Spatio-temporal Features
    Ji, Yanli
    Shimada, Atsushi
    Taniguchi, Rin-ichiro
    NEURAL INFORMATION PROCESSING: MODELS AND APPLICATIONS, PT II, 2010, 6444 : 391 - 398
  • [23] Human Action Recognition in Video by Fusion of Structural and Spatio-temporal Features
    Borzeshi, Ehsan Zare
    Concha, Oscar Perez
    Piccardi, Massimo
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 2012, 7626 : 474 - 482
  • [24] Role of Spatio-Temporal Feature Position in Recognition of Human Vehicle Interaction
    Ali, Qurat ul Ain
    Yousaf, Muhammad Haroon
    PROCEEDINGS OF TENCON 2018 - 2018 IEEE REGION 10 CONFERENCE, 2018, : 0471 - 0476
  • [25] LEARNING SPATIO-TEMPORAL DEPENDENCIES FOR ACTION RECOGNITION
    Cai, Qiao
    Yin, Yafeng
    Man, Hong
    2013 20TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2013), 2013, : 3740 - 3744
  • [26] Human Activity Recognition Based on Transfer Learning with Spatio-Temporal Representations
    Zebhi, Saeedeh
    Almodarresi, S. M. T.
    Abootalebi, Vahid
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2021, 18 (06) : 839 - 845
  • [27] STIT: Spatio-Temporal Interaction Transformers for Human-Object Interaction Recognition in Videos
    Almushyti, Muna
    Li, Frederick W. B.
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 3287 - 3294
  • [28] Spatio-Temporal Interaction Graph Parsing Networks for Human-Object Interaction Recognition
    Wang, Ning
    Zhu, Guangming
    Zhang, Liang
    Shen, Peiyi
    Li, Hongsheng
    Hua, Cong
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4985 - 4993
  • [29] Abnormal Activity Recognition Using Spatio-Temporal Features
    Chathuramali, K. G. Manosha
    Ramasinghe, Sameera
    Rodrigo, Ranga
    2014 7TH INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION FOR SUSTAINABILITY (ICIAFS), 2014,
  • [30] Dynamic Spatio-Temporal Bag of Expressions (D-STBoE) Model for Human Action Recognition
    Nazir, Saima
    Yousaf, Muhammad Haroon
    Nebel, Jean-Christophe
    Velastin, Sergio A.
    SENSORS, 2019, 19 (12)