A Spatio-Temporal CRF for Human Interaction Understanding

被引：30

作者：

Wang, Zhenhua ^{[1
]}

Liu, Sheng ^{[2
]}

Zhang, Jianhua ^{[3
]}

Chen, Shengyong ^{[2
,4
]}

Guan, Qiu ^{[2
]}

机构：

[1] Zhejiang Univ Technol, Sch Comp Sci, Hangzhou 310014, Zhejiang, Peoples R China

[2] Zhejiang Univ Technol, Dept Comp Sci, Hangzhou 310014, Zhejiang, Peoples R China

[3] Zhejiang Univ Technol, Coll Comp Sci, Hangzhou 310014, Zhejiang, Peoples R China

[4] Tianjin Univ Technol, Tianjin 300384, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2017年 / 27卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Conditional random fields (CRFs); human action recognition (HAR); interaction; video understanding; ACTION RECOGNITION;

D O I：

10.1109/TCSVT.2016.2539699

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

A better understanding of human interactions in videos can be achieved by simultaneously considering the coarse interactions between people, the action of each individual, and the activity of all people as a whole. We divide the recognition task into two stages. The first stage discriminates interactions and noninteractions, actions and activities based on local image information, while during the second stage, actions and activities are recognized in a global manner based on the local recognition results. A conditional random field (CRF) is designed to model human interactions in the spatio-temporal space. Different from most existing global models which cover either action or activity variables only, our model covers them both by considering the interactions between different types of variables. The graph structure of the CRF is predicted by a model learned from training data, which is different from traditional graph construction methods that typically rely on human heuristics. We learn the parameters of the CRF via structured support vector machine. We propose an efficient inference algorithm to tackle the estimation of labels in long videos containing many people. Our model admits both semantic-level understanding of human interactions in videos and competitive action and activity recognition performance.

引用

页码：1647 / 1660

页数：14

共 50 条

[41] Spatio-temporal information for human action recognition
Li Yao
Yunjian Liu
Shihui Huang
EURASIP Journal on Image and Video Processing, 2016
[42] Modelling of spatio-temporal interaction for video quality assessment
Huynh-Thu, Quan
Ghanbari, Mohammed
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2010, 25 (07) : 535 - 546
[43] DINDOW: Towards an Interaction Based on Spatio-temporal Memory
Ibanez, Jesus
Serrano, Oscar
Garcia, David
VISUAL INFORMATION SYSTEMS: WEB-BASED VISUAL INFORMATION SEARCH AND MANAGEMENT, VISUAL 2008, 2008, 5188 : 271 - 278
[44] Spatio-temporal distribution of human lifespan in China
Shaobin Wang
Kunli Luo
Yonglin Liu
Scientific Reports, 5
[45] Spatio-temporal distribution of human lifespan in China
Wang, Shaobin
Luo, Kunli
Liu, Yonglin
SCIENTIFIC REPORTS, 2015, 5
[46] Spatio-temporal Matching for Human Detection in Video
Zhou, Feng
De la Torre, Fernando
COMPUTER VISION - ECCV 2014, PT VI, 2014, 8694 : 62 - 77
[47] SPATIO-TEMPORAL INTEGRATION IN HUMAN PERIPHERAL RETINA
OWEN, WG
VISION RESEARCH, 1972, 12 (05) : 1011 - &
[48] SPATIO-TEMPORAL INTERACTION FOR AERIAL VIDEO CHANGE DETECTION
Bourdis, Nicolas
Marraud, Denis
Sahbi, Hichem
2012 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2012, : 2253 - 2256
[49] Spatio-temporal interaction of bacteria mixture within biofilms
Li, Y.
Kim, K. S.
Deschamps, J.
Briandet, R.
Trubuil, A.
SPATIAL STATISTICS CONFERENCE 2015, PART 1, 2015, 26 : 11 - 18
[50] Spatio-temporal Analysis of Human Mortality in Canada
Kyran Cupido
Olivia McClure
Canadian Studies in Population, 2022, 49 : 183 - 198

← 1 2 3 4 5 →