Layout-induced Video Representation for Recognizing Agent-in-Place Actions

被引：3

作者：

Yu, Ruichi ^{[1
,2
]}

Wang, Hongcheng ^{[2
]}

Li, Ang ^{[1
]}

Zheng, Jingxiao ^{[1
]}

Morariu, Vlad I. ^{[1
,3
]}

Davis, Larry S. ^{[1
]}

机构：

[1] Univ Maryland, College Pk, MD 20742 USA

[2] Comcast Appl Res, Washington, DC 20549 USA

[3] Adobe Res, San Jose, CA USA

来源：

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) | 2019年

关键词：

D O I：

10.1109/ICCV.2019.00135

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We address scene layout modeling for recognizing agent-in-place actions, which are actions associated with agents who perform them and the places where they occur, in the context of outdoor home surveillance. We introduce a novel representation to model the geometry and topology of scene layouts so that a network can generalize from the layouts observed in the training scenes to unseen scenes in the test set. This Layout-Induced Video Representation (LIVR) abstracts away low-level appearance variance and encodes geometric and topological relationships of places to explicitly model scene layout. LIVR partitions the semantic features of a scene into different places to force the network to learn generic place-based feature descriptions which are independent of specific scene layouts; then, LIVR dynamically aggregates features based on connectivities of places in each specific scene to model its layout. We introduce a new Agent-in-Place Action (APA) dataset to show that our method allows neural network models to generalize significantly better to unseen scenes.

引用

页码：1262 / 1272

页数：11