Learning human activities and object affordances from RGB-D videos

被引:436
|
作者
Koppula, Hema Swetha [1 ]
Gupta, Rudhir [1 ]
Saxena, Ashutosh [1 ]
机构
[1] Cornell Univ, Dept Comp Sci, Ithaca, NY 14853 USA
来源
关键词
3D perception; human activity detection; object affordance; supervised learning; spatio-temporal context; personal robots; TRACKING;
D O I
10.1177/0278364913478446
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Understanding human activities and object affordances are two very important skills, especially for personal robots which operate in human environments. In this work, we consider the problem of extracting a descriptive labeling of the sequence of sub-activities being performed by a human, and more importantly, of their interactions with the objects in the form of associated affordances. Given a RGB-D video, we jointly model the human activities and object affordances as a Markov random field where the nodes represent objects and sub-activities, and the edges represent the relationships between object affordances, their relations with sub-activities, and their evolution over time. We formulate the learning problem using a structural support vector machine (SSVM) approach, where labelings over various alternate temporal segmentations are considered as latent variables. We tested our method on a challenging dataset comprising 120 activity videos collected from 4 subjects, and obtained an accuracy of 79.4% for affordance, 63.4% for sub-activity and 75.0% for high-level activity labeling. We then demonstrate the use of such descriptive labeling in performing assistive tasks by a PR2 robot.
引用
收藏
页码:951 / 970
页数:20
相关论文
共 50 条
  • [41] An Empirical Analysis of Deep Feature Learning for RGB-D Object Recognition
    Caglayan, Ali
    Can, Ahmet Burak
    [J]. IMAGE ANALYSIS AND RECOGNITION, ICIAR 2017, 2017, 10317 : 312 - 320
  • [42] Learning for classification of traffic-related object on RGB-D data
    Xia, Yingjie
    Shi, Xingmin
    Zhao, Na
    [J]. MULTIMEDIA SYSTEMS, 2017, 23 (01) : 129 - 138
  • [43] RGB-D Object Recognition based on RGBD-PCANet Learning
    Sun, Shiying
    Zhao, Xiaoguang
    An, Ning
    Tan, Min
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION (ICMA), 2017, : 1075 - 1080
  • [44] Impact of Automated Action Labeling in Classification of Human Actions in RGB-D Videos
    Jardim, David
    Nunes, Luis
    Dias, Miguel
    [J]. ECAI 2016: 22ND EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, 285 : 1632 - 1633
  • [45] Robust Object Recognition in RGB-D Egocentric Videos based on Sparse Affine Hull Kernel
    Wan, Shaohua
    Aggarwal, J. K.
    [J]. 2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2015,
  • [46] Unsupervised object individuation from RGB-D image sequences
    Koo, Seongyong
    Lee, Dongheui
    Kwon, Dong-Soo
    [J]. 2014 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2014), 2014, : 4450 - 4457
  • [47] Fall Detection in RGB-D Videos for Elderly Care
    Yun, Yixiao
    Innocenti, Christopher
    Nero, Gustav
    Linden, Henrik
    Gu, Irene Yu-Hua
    [J]. 2015 17TH INTERNATIONAL CONFERENCE ON E-HEALTH NETWORKING, APPLICATION & SERVICES (HEALTHCOM), 2015, : 422 - 427
  • [48] Generative adversarial networks for generating RGB-D videos
    Nakahira, Yuki
    Kawamoto, Kazuhiko
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1276 - 1281
  • [49] RGB-D Object Recognition from Hand-Held Object Teaching
    Qiao, Leixian
    Li, Xue
    Jiang, Shuqiang
    [J]. 8TH INTERNATIONAL CONFERENCE ON INTERNET MULTIMEDIA COMPUTING AND SERVICE (ICIMCS2016), 2016, : 31 - 34
  • [50] Single RGB-D Fitting: Total Human Modeling with an RGB-D Shot
    Fang, Xianyong
    Yang, Jikui
    Rao, Jie
    Wang, Linbo
    Deng, Zhigang
    [J]. 25TH ACM SYMPOSIUM ON VIRTUAL REALITY SOFTWARE AND TECHNOLOGY (VRST 2019), 2019,