A Spatio-Temporal CRF for Human Interaction Understanding

被引：30

作者：

Wang, Zhenhua ^{[1
]}

Liu, Sheng ^{[2
]}

Zhang, Jianhua ^{[3
]}

Chen, Shengyong ^{[2
,4
]}

Guan, Qiu ^{[2
]}

机构：

[1] Zhejiang Univ Technol, Sch Comp Sci, Hangzhou 310014, Zhejiang, Peoples R China

[2] Zhejiang Univ Technol, Dept Comp Sci, Hangzhou 310014, Zhejiang, Peoples R China

[3] Zhejiang Univ Technol, Coll Comp Sci, Hangzhou 310014, Zhejiang, Peoples R China

[4] Tianjin Univ Technol, Tianjin 300384, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2017年 / 27卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Conditional random fields (CRFs); human action recognition (HAR); interaction; video understanding; ACTION RECOGNITION;

D O I：

10.1109/TCSVT.2016.2539699

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

A better understanding of human interactions in videos can be achieved by simultaneously considering the coarse interactions between people, the action of each individual, and the activity of all people as a whole. We divide the recognition task into two stages. The first stage discriminates interactions and noninteractions, actions and activities based on local image information, while during the second stage, actions and activities are recognized in a global manner based on the local recognition results. A conditional random field (CRF) is designed to model human interactions in the spatio-temporal space. Different from most existing global models which cover either action or activity variables only, our model covers them both by considering the interactions between different types of variables. The graph structure of the CRF is predicted by a model learned from training data, which is different from traditional graph construction methods that typically rely on human heuristics. We learn the parameters of the CRF via structured support vector machine. We propose an efficient inference algorithm to tackle the estimation of labels in long videos containing many people. Our model admits both semantic-level understanding of human interactions in videos and competitive action and activity recognition performance.

引用

页码：1647 / 1660

页数：14

共 50 条

[31] A Hybrid Method for Human Interaction Recognition using Spatio-Temporal Interest Points
Li, Nijun
Cheng, Xu
Guo, Haiyan
Wu, Zhenyang
2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 2513 - 2518
[32] High Resolution Image Classification Based on Spatio-Temporal Context Model of CRF
Zhang, Aiying
Tang, Ping
IGARSS 2018 - 2018 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2018, : 6979 - 6982
[33] Spatio-Temporal Prediction of Suspect Location by Spatio-Temporal Semantics
Duan L.
Hu T.
Zhu X.
Ye X.
Wang S.
Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics and Information Science of Wuhan University, 2019, 44 (05): : 765 - 770
[34] Human Hand Gesture Recognition Using Spatio-Temporal Volumes for Human-computer Interaction
Vafadar, Maryam
Behrad, Afireza
2008 INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS, VOLS 1 AND 2, 2008, : 713 - 718
[35] Spatio-Temporal Shape Parameterization of the Human Ventricles
Szilagyi, Sandor M.
ACTA POLYTECHNICA HUNGARICA, 2015, 12 (03) : 59 - 72
[36] Reconstruction of the spatio-temporal dynamics of a human magnetoencephalogram
Jirsa, VK
Friedrich, R
Haken, H
PHYSICA D, 1995, 89 (1-2): : 100 - 122
[37] Spatio-temporal Analysis of Human Mortality in Canada
Cupido, Kyran
McClure, Olivia
CANADIAN STUDIES IN POPULATION, 2022, 49 (3-4) : 183 - 198
[38] Reconstruction of the spatio-temporal dynamics of a human magnetoencephalogram
Jirsa, V.K.
Friedrich, R.
Haken, H.
Physica D: Nonlinear Phenomena, 1995, 89 (1-2):
[39] SPATIO-TEMPORAL INTERACTION BETWEEN VISUAL COLOR MECHANISMS
FOSTER, DH
IDRIS, IIM
VISION RESEARCH, 1974, 14 (01) : 35 - 39
[40] Spatio-temporal information for human action recognition
Yao, Li
Liu, Yunjian
Huang, Shihui
EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2016,

← 1 2 3 4 5 →