Layout-induced Video Representation for Recognizing Agent-in-Place Actions

被引:3
|
作者
Yu, Ruichi [1 ,2 ]
Wang, Hongcheng [2 ]
Li, Ang [1 ]
Zheng, Jingxiao [1 ]
Morariu, Vlad I. [1 ,3 ]
Davis, Larry S. [1 ]
机构
[1] Univ Maryland, College Pk, MD 20742 USA
[2] Comcast Appl Res, Washington, DC 20549 USA
[3] Adobe Res, San Jose, CA USA
关键词
D O I
10.1109/ICCV.2019.00135
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We address scene layout modeling for recognizing agent-in-place actions, which are actions associated with agents who perform them and the places where they occur, in the context of outdoor home surveillance. We introduce a novel representation to model the geometry and topology of scene layouts so that a network can generalize from the layouts observed in the training scenes to unseen scenes in the test set. This Layout-Induced Video Representation (LIVR) abstracts away low-level appearance variance and encodes geometric and topological relationships of places to explicitly model scene layout. LIVR partitions the semantic features of a scene into different places to force the network to learn generic place-based feature descriptions which are independent of specific scene layouts; then, LIVR dynamically aggregates features based on connectivities of places in each specific scene to model its layout. We introduce a new Agent-in-Place Action (APA) dataset to show that our method allows neural network models to generalize significantly better to unseen scenes.
引用
收藏
页码:1262 / 1272
页数:11
相关论文
共 1 条
  • [1] Recognizing Human Actions in Basketball Video Sequences on the Basis of Global and Local Pairwise Representation
    Takahashi, Masaki
    Naemura, Masahide
    Fujii, Mahito
    Little, James J.
    INTERNATIONAL JOURNAL OF MULTIMEDIA DATA ENGINEERING & MANAGEMENT, 2014, 5 (03): : 28 - 46