Learning human activities and object affordances from RGB-D videos

被引:436
|
作者
Koppula, Hema Swetha [1 ]
Gupta, Rudhir [1 ]
Saxena, Ashutosh [1 ]
机构
[1] Cornell Univ, Dept Comp Sci, Ithaca, NY 14853 USA
来源
关键词
3D perception; human activity detection; object affordance; supervised learning; spatio-temporal context; personal robots; TRACKING;
D O I
10.1177/0278364913478446
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Understanding human activities and object affordances are two very important skills, especially for personal robots which operate in human environments. In this work, we consider the problem of extracting a descriptive labeling of the sequence of sub-activities being performed by a human, and more importantly, of their interactions with the objects in the form of associated affordances. Given a RGB-D video, we jointly model the human activities and object affordances as a Markov random field where the nodes represent objects and sub-activities, and the edges represent the relationships between object affordances, their relations with sub-activities, and their evolution over time. We formulate the learning problem using a structural support vector machine (SSVM) approach, where labelings over various alternate temporal segmentations are considered as latent variables. We tested our method on a challenging dataset comprising 120 activity videos collected from 4 subjects, and obtained an accuracy of 79.4% for affordance, 63.4% for sub-activity and 75.0% for high-level activity labeling. We then demonstrate the use of such descriptive labeling in performing assistive tasks by a PR2 robot.
引用
收藏
页码:951 / 970
页数:20
相关论文
共 50 条
  • [1] Predicting Human Activities in Sequences of Actions in RGB-D Videos
    Jardim, David
    Nunes, Luis
    Dias, Miguel
    [J]. NINTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2016), 2017, 10341
  • [2] Self-Supervised Learning of Object Segmentation from Unlabeled RGB-D Videos
    Lu, Shiyang
    Deng, Yunfu
    Boularias, Abdeslam
    Bekris, Kostas
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 7017 - 7023
  • [3] Visual Recognition in RGB Images and Videos by Learning from RGB-D Data
    Li, Wen
    Chen, Lin
    Xu, Dong
    Van Gool, Luc
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (08) : 2030 - 2036
  • [4] Recognition and Classification of Human Activity from RGB-D Videos
    Gurkaynak, Deniz
    Yalcin, Hulya
    [J]. 2015 23RD SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2015, : 1745 - 1748
  • [5] Joint Object Affordance Reasoning and Segmentation in RGB-D Videos
    Thermos, Spyridon
    Potamianos, Gerasimos
    Daras, Petros
    [J]. IEEE ACCESS, 2021, 9 : 89699 - 89713
  • [6] A computational framework for attentional object discovery in RGB-D videos
    Germán Martín García
    Mircea Pavel
    Simone Frintrop
    [J]. Cognitive Processing, 2017, 18 : 169 - 182
  • [7] A computational framework for attentional object discovery in RGB-D videos
    Garcia, German Martin
    Pavel, Mircea
    Frintrop, Simone
    [J]. COGNITIVE PROCESSING, 2017, 18 (02) : 169 - 182
  • [8] Object Learning for 6D Pose Estimation and Grasping from RGB-D Videos of In-hand Manipulation
    Patten, Timothy
    Park, Kiru
    Leitner, Markus
    Wolfram, Kevin
    Vincze, Markus
    [J]. 2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 4831 - 4838
  • [9] Learning Coupled Classifiers with RGB images for RGB-D object recognition
    Li, Xiao
    Fang, Min
    Zhang, Ju-Jie
    Wu, Jinqiao
    [J]. PATTERN RECOGNITION, 2017, 61 : 433 - 446
  • [10] Application of Transfer Learning in RGB-D Object Recognition
    Kumar, Abhishek
    Shrivatsav, S. Nithin
    Subrahmanyam, G. R. K. S.
    Mishra, Deepak
    [J]. 2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 580 - 584