Contextual Action Recognition with R*CNN

被引:255
|
作者
Gkioxari, Georgia [1 ]
Girshick, Ross
Malik, Jitendra [1 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
关键词
D O I
10.1109/ICCV.2015.129
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There are multiple cues in an image which reveal what action a person is performing. For example, a jogger has a pose that is characteristic for jogging, but the scene (e.g. road, trail) and the presence of other joggers can be an additional source of information. In this work, we exploit the simple observation that actions are accompanied by contextual cues to build a strong action recognition system. We adapt RCNN to use more than one region for classification while still maintaining the ability to localize the action. We call our system R*CNN. The action-specific models and the feature maps are trained jointly, allowing for action specific representations to emerge. R*CNN achieves 90.2% mean AP on the PASAL VOC Action dataset, outperforming all other approaches in the field by a significant margin. Last, we show that R*CNN is not limited to action recognition. In particular, R*CNN can also be used to tackle fine-grained tasks such as attribute classification. We validate this claim by reporting state-of-the-art performance on the Berkeley Attributes of People dataset.(1)
引用
收藏
页码:1080 / 1088
页数:9
相关论文
共 50 条
  • [1] Contextual Priming and Feedback for Faster R-CNN
    Shrivastava, Abhinav
    Gupta, Abhinav
    [J]. COMPUTER VISION - ECCV 2016, PT I, 2016, 9905 : 330 - 348
  • [2] Contextual Fisher Kernels for Human Action Recognition
    Zhang, Zhong
    Wang, Chunheng
    Xiao, Baihua
    Zhou, Wen
    Liu, Shuang
    [J]. 2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 437 - 440
  • [3] Contextual Max Pooling for Human Action Recognition
    Zhang, Zhong
    Liu, Shuang
    Mei, Xing
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2015, E98D (04) : 989 - 993
  • [4] Binary Hashing CNN Features for Action Recognition
    Li, Weisheng
    Feng, Chen
    Xiao, Bin
    Chen, Yanquan
    [J]. KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2018, 12 (09): : 4412 - 4428
  • [5] Arrow R-CNN for Flowchart Recognition
    Schaefer, Bernhard
    Stuckenschmidt, Heiner
    [J]. 2019 INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION WORKSHOPS (ICDARW) AND 13TH IAPR INTERNATIONAL WORKSHOP ON GRAPHICS RECOGNITION (GREC 2019), VOL 1, 2019, : 7 - 13
  • [6] P-CNN: Pose-based CNN Features for Action Recognition
    Cheron, Guilhem
    Laptev, Ivan
    Schmid, Cordelia
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 3218 - 3226
  • [7] Driver action recognition using deformable and dilated faster R-CNN with optimized region proposals
    Mingqi Lu
    Yaocong Hu
    Xiaobo Lu
    [J]. Applied Intelligence, 2020, 50 : 1100 - 1111
  • [8] Driver action recognition using deformable and dilated faster R-CNN with optimized region proposals
    Lu, Mingqi
    Hu, Yaocong
    Lu, Xiaobo
    [J]. APPLIED INTELLIGENCE, 2020, 50 (04) : 1100 - 1111
  • [9] Gated 3D-CNN for Action Recognition
    Shrestha, Labina
    Dubey, Shikha
    Olimov, Farrukh
    Jeon, Moongu
    [J]. RECENT CHALLENGES IN INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2022, 2022, 1716 : 556 - 565
  • [10] 3D CNN for Human Action Recognition
    Boualia, Sameh Neili
    Ben Amara, Najoua Essoukri
    [J]. 2021 18TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2021, : 276 - 282