Visuomotor Control in Multi-Object Scenes Using Object-Aware Representations

被引:1
|
作者
Heravi, Negin [1 ,2 ]
Wahid, Ayzaan [3 ]
Lynch, Corey [3 ]
Florence, Pete [3 ]
Armstrong, Travis [3 ]
Tompson, Jonathan [3 ]
Sermanet, Pierre [3 ]
Bohg, Jeannette [2 ]
Dwibedi, Debidatta [3 ]
机构
[1] Google, Mountain View, CA 94043 USA
[2] Stanford Univ, Stanford, CA 94305 USA
[3] Google, Robot, Mountain View, CA 94043 USA
关键词
D O I
10.1109/ICRA48891.2023.10160888
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Perceptual understanding of the scene and the relationship between its different components is important for successful completion of robotic tasks. Representation learning has been shown to be a powerful technique for this, but most of the current methodologies learn task specific representations that do not necessarily transfer well to other tasks. Furthermore, representations learned by supervised methods require large, labeled datasets for each task that are expensive to collect in the real-world. Using self-supervised learning to obtain representations from unlabeled data can mitigate this problem. However, current self-supervised representation learning methods are mostly object agnostic, and we demonstrate that the resulting representations are insufficient for general purpose robotics tasks as they fail to capture the complexity of scenes with many components. In this paper, we show the effectiveness of using object-aware representation learning techniques for robotic tasks. Our self-supervised representations are learned by observing the agent freely interacting with different parts of the environment and are queried in two different settings: (i) policy learning and (ii) object location prediction. We show that our model learns control policies in a sample-efficient manner and outperforms state-of-the-art object agnostic techniques as well as methods trained on raw RGB images. Our results show a 20% increase in performance in low data regimes (1000 trajectories) in policy training using implicit behavioral cloning (IBC). Furthermore, our method outperforms the baselines for the task of object localization in multi-object scenes. Further qualitative results are available at https://sites.google.com/view/slots4robots.
引用
下载
收藏
页码:9515 / 9522
页数:8
相关论文
共 50 条
  • [21] Transferable Adversarial Attacks for Object Detection Using Object-Aware Significant Feature Distortion
    Ding, Xinlong
    Chen, Jiansheng
    Yu, Hongwei
    Shang, Yu
    Qin, Yining
    Ma, Huimin
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1546 - 1554
  • [22] Multi-object video rate control
    Achir, N
    Pujolle, G
    NETWORK CONTROL AND ENGINEERING FOR QOS, SECURITY AND MOBILITY II, 2003, 133 : 191 - 202
  • [23] Uncertainty-aware Unsupervised Multi-Object Tracking
    Liu, Kai
    Jin, Sheng
    Fu, Zhihang
    Chen, Ze
    Jiang, Rongxin
    Ye, Jieping
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 9962 - 9971
  • [24] MAT: Motion-aware multi-object tracking
    Han, Shoudong
    Huang, Piao
    Wang, Hongwei
    Yu, En
    Liu, Donghaisheng
    Pan, Xiaofeng
    NEUROCOMPUTING, 2022, 476 : 75 - 86
  • [25] Detection-aware multi-object tracking evaluation
    SanMiguel, Juan C.
    Munoz, Jorge
    Poiesi, Fabio
    2022 18TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS 2022), 2022,
  • [26] GanHand: Predicting Human Grasp Affordances in Multi-Object Scenes
    Corona, Enric
    Pumarola, Albert
    Alenya, Guillem
    Moreno-Noguer, Francesc
    Rogez, Gregory
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 5030 - 5040
  • [27] Multi-Object Detection of Chinese License Plate in Complex Scenes
    Liu, Dan
    Wu, Yajuan
    He, Yuxin
    Qin, Lu
    Zheng, Bochuan
    COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2021, 36 (01): : 145 - 156
  • [28] Pure phase correlation applied to multi-object colour scenes
    Turon, F
    ChalasinskaMacukow, K
    Campos, J
    Yzuel, MJ
    JOURNAL OF OPTICS-NOUVELLE REVUE D OPTIQUE, 1997, 28 (03): : 112 - 117
  • [29] Multi-Object Detection in Traffic Scenes Based on Improved SSD
    Wang, Xinqing
    Hua, Xia
    Xiao, Feng
    Li, Yuyang
    Hu, Xiaodong
    Sun, Pengyu
    ELECTRONICS, 2018, 7 (11)
  • [30] Object-aware semantics of attention for image captioning
    Shiwei Wang
    Long Lan
    Xiang Zhang
    Guohua Dong
    Zhigang Luo
    Multimedia Tools and Applications, 2020, 79 : 2013 - 2030