Multimodal Learning of Keypoint Predictive Models for Visual Object Manipulation

被引:0
|
作者
Bechtle, Sarah [1 ]
Das, Neha [2 ]
Meier, Franziska [3 ]
机构
[1] DeepMind, London N1C 4AG, England
[2] Meta AI Res, Menlo Pk, CA 94025 USA
[3] Tech Univ Munich, D-80333 Munich, Germany
关键词
Visualization; Predictive models; Kinematics; Task analysis; Detectors; Propioception; Training; Keypoint representations; manipulation; multi- modal learning; BODY SCHEMA; INTEGRATION;
D O I
10.1109/TRO.2022.3204509
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Humans have impressive generalization capabilities when it comes to manipulating objects and tools in completely novel environments. These capabilities are, at least partially, a result of humans having internal models of their bodies and any grasped object. How to learn such body representations for robots remains an open problem. In this work, we present a self-supervised learning approach that extends a robot's kinematic model for object manipulation from visual latent representations. Our framework comprises two components: First, we present our multimodal keypoint detector: A neural network autoencoder architecture that fuses proprioception and vision during learning to predict visual key points on an object; second, we show how we can learn an extension of the kinematic chain of the robot by regressing virtual joints from the visual keypoints predicted by our multimodal keypoint detector. Our evaluation shows that our approach learns to consistently predict visual keypoints on objects in the manipulator's hand and, thus, can easily facilitate learning an extended kinematic chain to include the object grasped in various configurations, from a few seconds of visual data. Finally, we show that this extended kinematic chain lends itself for object manipulation tasks such as placing a grasped object and present experiments in simulation and on hardware.
引用
收藏
页码:1212 / 1224
页数:13
相关论文
共 50 条
  • [1] Learning Object Models for Whole Body Manipulation
    Stilman, Mike
    Nishiwaki, Koichi
    Kagami, Satoshi
    [J]. HUMANOIDS: 2007 7TH IEEE-RAS INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS, 2007, : 174 - +
  • [2] Interactive object recognition by keypoint models
    Hardt, M
    Geisler, J
    [J]. SIGNAL PROCESSING, SENSOR FUSION, AND TARGET RECOGNITION VIII, 1999, 3720 : 160 - 168
  • [3] Spatial Keypoint Representation for Visual Object Retrieval
    Nowak, Tomasz
    Najgebauer, Patryk
    Romanowski, Jakub
    Gabryel, Marcin
    Korytkowski, Marcin
    Scherer, Rafal
    Kostadinov, Dimce
    [J]. ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2014, PT II, 2014, 8468 : 639 - 650
  • [4] KETO: Learning Keypoint Representations for Tool Manipulation
    Qin, Zengyi
    Fang, Kuan
    Zhu, Yuke
    Li Fei-Fei
    Savarese, Silvio
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 7278 - 7285
  • [5] Learning Object Models For Non-prehensile Manipulation
    Sanan, Siddharth
    Bretan, Mason
    Heck, Larry
    [J]. 2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2019, : 4784 - 4789
  • [6] Integrating visual perception and manipulation for autonomous learning of object representations
    Schiebener, David
    Morimoto, Jun
    Asfour, Tamim
    Ude, Ales
    [J]. ADAPTIVE BEHAVIOR, 2013, 21 (05) : 328 - 345
  • [7] Learning Foresightful Dense Visual Affordance for Deformable Object Manipulation
    Wu, Ruihai
    Ning, Chuanruo
    Dong, Hao
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 10913 - 10922
  • [8] Unsupervised Learning of Visual Object Recognition Models
    Navarrete, Dulce J.
    Morales, Eduardo F.
    Enrique Sucar, Luis
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2012, 2012, 7637 : 511 - 520
  • [9] Multimodal Integration Learning of Object Manipulation Behaviors using Deep Neural Networks
    Noda, Kuniaki
    Arie, Hiroaki
    Suga, Yuki
    Ogata, Testuya
    [J]. 2013 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2013, : 1728 - 1733
  • [10] Learning Semantic Keypoint Representations for Door Opening Manipulation
    Wang, Jiayu
    Lin, Shize
    Hu, Chuxiong
    Zhu, Yu
    Zhu, Limin
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (04): : 6980 - 6987