Joint learning of object and action detectors

被引:24
|
作者
Kalogeiton, Vicky [1 ,2 ]
Weinzaepfel, Philippe [3 ]
Ferrari, Vittorio [2 ]
Schmid, Cordelia [1 ]
机构
[1] Univ Grenoble Alpes, INRIA, CNRS, Grenoble INP,LJK, F-38000 Grenoble, France
[2] Univ Edinburgh, Edinburgh, Midlothian, Scotland
[3] Naver Labs Europe, Meylan, France
关键词
D O I
10.1109/ICCV.2017.219
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While most existing approaches for detection in videos focus on objects or human actions separately, we aim at jointly detecting objects performing actions, such as cat eating or dog jumping. We introduce an end-to-end multi-task objective that jointly learns object-action relationships. We compare it with different training objectives, validate its effectiveness for detecting objects-actions in videos, and show that both tasks of object and action detection benefit from this joint learning. Moreover, the proposed architecture can be used for zero-shot learning of actions: our multitask objective leverages the commonalities of an action performed by different objects, e.g. dog and cat jumping, enabling to detect actions of an object without training with these object-actions pairs. In experiments on the A2D dataset [50], we obtain state-of-the-art results on segmentation of object-action pairs. We finally apply our multitask architecture to detect visual relationships between objects in images of the VRD dataset [24].
引用
收藏
页码:2001 / 2010
页数:10
相关论文
共 50 条
  • [21] DSOD: Learning Deeply Supervised Object Detectors from Scratch
    Shen, Zhiqiang
    Liu, Zhuang
    Li, Jianguo
    Jiang, Yu-Gang
    Chen, Yurong
    Xue, Xiangyang
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1937 - 1945
  • [22] Learning Object Class Detectors from Weakly Annotated Video
    Prest, Alessandro
    Leistner, Christian
    Civera, Javier
    Schmid, Cordelia
    Ferrari, Vittorio
    2012 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2012, : 3282 - 3289
  • [23] Learning Object Detectors With Semi-Annotated Weak Labels
    Zhang, Dingwen
    Han, Junwei
    Guo, Guangyu
    Zhao, Long
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (12) : 3622 - 3635
  • [24] Object Manifold Learning with Action Features for Active Tactile Object Recognition
    Tanaka, Daisuke
    Matsubara, Takamitsu
    Ichien, Kentaro
    Sugimoto, Kenji
    2014 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2014), 2014, : 608 - 614
  • [25] Deep active object recognition by joint label and action prediction
    Malmir, Mohsen
    Sikka, Karan
    Forster, Deborah
    Fasel, Ian
    Movellan, Javier R.
    Cottrell, Garrison W.
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2017, 156 : 128 - 137
  • [26] Unsupervised Object Detection Pretraining with Joint Object Priors Generation and Detector Learning
    Wang, Yizhou
    Chen, Meilin
    Tang, Shixiang
    Zhu, Feng
    Yang, Haiyang
    Bai, Lei
    Zhao, Rui
    Yan, Yunfeng
    Qi, Donglian
    Ouyang, Wanli
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [27] Collaborative Deep Reinforcement Learning for Joint Object Search
    Kong, Xiangyu
    Xin, Bo
    Wang, Yizhou
    Hua, Gang
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 7072 - 7081
  • [28] Visual Dictionary Learning for Joint Object Categorization and Segmentation
    Jain, Aastha
    Zappella, Luca
    McClure, Patrick
    Vidal, Rene
    COMPUTER VISION - ECCV 2012, PT V, 2012, 7576 : 718 - 731
  • [29] Joint Pose Estimator and Feature Learning for Object Detection
    Ali, Karim
    Fleuret, Francois
    Hasler, David
    Fua, Pascal
    2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2009, : 1373 - 1380
  • [30] A Joint Learning Framework for Attribute Models and Object Descriptions
    Mahajan, Dhruv
    Sellamanickam, Sundararajan
    Nair, Vinod
    2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2011, : 1227 - 1234