Visuomotor Control in Multi-Object Scenes Using Object-Aware Representations

被引:1
|
作者
Heravi, Negin [1 ,2 ]
Wahid, Ayzaan [3 ]
Lynch, Corey [3 ]
Florence, Pete [3 ]
Armstrong, Travis [3 ]
Tompson, Jonathan [3 ]
Sermanet, Pierre [3 ]
Bohg, Jeannette [2 ]
Dwibedi, Debidatta [3 ]
机构
[1] Google, Mountain View, CA 94043 USA
[2] Stanford Univ, Stanford, CA 94305 USA
[3] Google, Robot, Mountain View, CA 94043 USA
关键词
D O I
10.1109/ICRA48891.2023.10160888
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Perceptual understanding of the scene and the relationship between its different components is important for successful completion of robotic tasks. Representation learning has been shown to be a powerful technique for this, but most of the current methodologies learn task specific representations that do not necessarily transfer well to other tasks. Furthermore, representations learned by supervised methods require large, labeled datasets for each task that are expensive to collect in the real-world. Using self-supervised learning to obtain representations from unlabeled data can mitigate this problem. However, current self-supervised representation learning methods are mostly object agnostic, and we demonstrate that the resulting representations are insufficient for general purpose robotics tasks as they fail to capture the complexity of scenes with many components. In this paper, we show the effectiveness of using object-aware representation learning techniques for robotic tasks. Our self-supervised representations are learned by observing the agent freely interacting with different parts of the environment and are queried in two different settings: (i) policy learning and (ii) object location prediction. We show that our model learns control policies in a sample-efficient manner and outperforms state-of-the-art object agnostic techniques as well as methods trained on raw RGB images. Our results show a 20% increase in performance in low data regimes (1000 trajectories) in policy training using implicit behavioral cloning (IBC). Furthermore, our method outperforms the baselines for the task of object localization in multi-object scenes. Further qualitative results are available at https://sites.google.com/view/slots4robots.
引用
收藏
页码:9515 / 9522
页数:8
相关论文
共 50 条
  • [41] Object-aware Image Compression with Adversarial Learning
    Du, Yunfei
    Zhao, Nan
    Duan, Yiping
    Han, Chaoyi
    2019 IEEE/CIC INTERNATIONAL CONFERENCE ON COMMUNICATIONS IN CHINA (ICCC), 2019,
  • [42] SFNet: Learning Object-aware Semantic Correspondence
    Lee, Junghyup
    Kim, Dohyung
    Ponce, Jean
    Ham, Bumsub
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 2273 - 2282
  • [43] An Object-Aware Hardware Transactional Memory System
    Khan, Behram
    Horsnell, Matthew
    Rogers, Ian
    Lujan, Mikel
    Dinn, Andrew
    Watson, Ian
    HPCC 2008: 10TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, PROCEEDINGS, 2008, : 93 - 102
  • [44] Scalable Object-Aware Hardware Transactional Memory
    Khan, Behram
    Horsnell, Matthew
    Lujan, Mikel
    Watson, Ian
    EURO-PAR 2010 PARALLEL PROCESSING, PT I, 2010, 6271 : 268 - 279
  • [45] OBJECT-AWARE SALIENCY DETECTION FOR CONSUMER IMAGES
    Tang, Hao
    2012 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2012), 2012, : 1097 - 1100
  • [46] Object-Aware Impedance Control for Human-Robot Collaborative Task With Online Object Parameter Estimation
    Park, Jinseong
    Shin, Yong-Sik
    Kim, Sanghyun
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024,
  • [47] Asynchronous rate control for multi-object videos
    Sun, Y
    Ahmad, I
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2005, 15 (08) : 1007 - 1018
  • [48] DIOR: DIstill Observations to Representations for Multi-Object Tracking and Segmentation
    Cai, Jiarui
    Wang, Yizhou
    Hsu, Hung-Min
    Zhang, Haotian
    Hwang, Jenq-Neng
    2022 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW 2022), 2022, : 520 - 529
  • [49] Disparity contour grouping for multi-object segmentation in dynamically textured scenes
    Sun, Wei
    Spackman, Stephen P.
    VISAPP 2007: PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOLUME IU/MTSV, 2007, : 347 - +
  • [50] Multi-Object Navigation with dynamically learned neural implicit representations
    Marza, Pierre
    Matignon, Laetitia
    Simonin, Olivier
    Wolf, Christian
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 10970 - 10981