Connecting Touch and Vision via Cross-Modal Prediction

被引:60
|
作者
Li, Yunzhu [1 ]
Zhu, Jun-Yan [1 ]
Tedrake, Russ [1 ]
Torralba, Antonio [1 ]
机构
[1] MIT, CSAIL, Cambridge, MA 02139 USA
关键词
D O I
10.1109/CVPR.2019.01086
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Humans perceive the world using multi-modal sensory inputs such as vision, audition, and touch. In this work, we investigate the cross-modal connection between vision and touch. The main challenge in this cross-domain modeling task lies in the significant scale discrepancy between the two: while our eyes perceive an entire visual scene at once, humans can only feel a small region of an object at any given moment. To connect vision and touch, we introduce new tasks of synthesizing plausible tactile signals from visual inputs as well as imagining how we interact with objects given tactile data as input. To accomplish our goals, we first equip robots with both visual and tactile sensors and collect a large-scale dataset of corresponding vision and tactile image sequences. To close the scale gap, we present a new conditional adversarial model that incorporates the scale and location information of the touch. Human perceptual studies demonstrate that our model can produce realistic visual images from tactile data and vice versa. Finally, we present both qualitative and quantitative experimental results regarding different system designs, as well as visualizing the learned representations of our model.
引用
收藏
页码:10601 / 10610
页数:10
相关论文
共 50 条
  • [41] Cross-modal interactions between olfaction and vision when grasping
    Castiello, Umberto
    Zucco, Gesualdo M.
    Parma, Valentina
    Ansuini, Caterina
    Tirindelli, Roberto
    CHEMICAL SENSES, 2006, 31 (07) : 665 - 671
  • [42] Transformer-Exclusive Cross-Modal Representation for Vision and Language
    Shin, Andrew
    Narihira, Takuya
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2719 - 2725
  • [43] Cross-modal psychological refractory period in vision, audition, and haptics
    Pei-Luen Patrick Rau
    Jian Zheng
    Attention, Perception, & Psychophysics, 2020, 82 : 1573 - 1585
  • [44] Vision-and-Dialog Navigation by Fusing Cross-modal features
    Nie, Hongxu
    Dong, Min
    Bi, Sheng
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [45] Cross-modal exogenous attention and distance effects in vision and hearing
    Schmitt, M
    Postma, A
    de Haan, EHF
    EUROPEAN JOURNAL OF COGNITIVE PSYCHOLOGY, 2001, 13 (03): : 343 - 368
  • [46] Seeing by Touching: Cross-Modal Matching For Tactile and Vision Measurements
    Liu, Huaping
    Sun, Fuchun
    Fang, Bin
    2017 2ND INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS AND MECHATRONICS (ICARM), 2017, : 257 - 263
  • [47] Editorial: Cross-Modal Learning: Adaptivity, Prediction and Interaction
    Zhang, Jianwei
    Wermter, Stefan
    Sun, Fuchun
    Zhang, Changshui
    Engel, Andreas K.
    Roeder, Brigitte
    Fu, Xiaolan
    Xue, Gui
    FRONTIERS IN NEUROROBOTICS, 2022, 16
  • [48] Shared Cross-Modal Trajectory Prediction for Autonomous Driving
    Choi, Chiho
    Choi, Joon Hee
    Li, Jiachen
    Malla, Srikanth
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 244 - 253
  • [49] Cross-modal prediction in audio-visual communication
    Rao, RR
    Chen, TH
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 2056 - 2059
  • [50] Exploring Cross-Modal Training via Touch to Learn a Mid-Air Marking Menu Gesture Set
    Henderson, Jay
    Mizobuchi, Sachi
    Li, Wei
    Lank, Edward
    PROCEEDINGS OF THE 21ST INTERNATIONAL CONFERENCE ON HUMAN-COMPUTER INTERACTION WITH MOBILE DEVICES AND SERVICES (MOBILEHCI'19), 2019,