Connecting Touch and Vision via Cross-Modal Prediction

被引:60
|
作者
Li, Yunzhu [1 ]
Zhu, Jun-Yan [1 ]
Tedrake, Russ [1 ]
Torralba, Antonio [1 ]
机构
[1] MIT, CSAIL, Cambridge, MA 02139 USA
关键词
D O I
10.1109/CVPR.2019.01086
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Humans perceive the world using multi-modal sensory inputs such as vision, audition, and touch. In this work, we investigate the cross-modal connection between vision and touch. The main challenge in this cross-domain modeling task lies in the significant scale discrepancy between the two: while our eyes perceive an entire visual scene at once, humans can only feel a small region of an object at any given moment. To connect vision and touch, we introduce new tasks of synthesizing plausible tactile signals from visual inputs as well as imagining how we interact with objects given tactile data as input. To accomplish our goals, we first equip robots with both visual and tactile sensors and collect a large-scale dataset of corresponding vision and tactile image sequences. To close the scale gap, we present a new conditional adversarial model that incorporates the scale and location information of the touch. Human perceptual studies demonstrate that our model can produce realistic visual images from tactile data and vice versa. Finally, we present both qualitative and quantitative experimental results regarding different system designs, as well as visualizing the learned representations of our model.
引用
收藏
页码:10601 / 10610
页数:10
相关论文
共 50 条
  • [31] Unimodal and cross-modal prediction is enhanced in musicians
    Vassena, Eliana
    Kochman, Katty
    Latomme, Julie
    Verguts, Tom
    SCIENTIFIC REPORTS, 2016, 6
  • [32] An event-related brain potential study of cross-modal links in spatial attention between vision and touch
    Eimer, M
    Driver, J
    PSYCHOPHYSIOLOGY, 2000, 37 (05) : 697 - 705
  • [33] Uncertainty-Aware Multi-modal Learning via Cross-Modal Random Network Prediction
    Wang, Hu
    Zhang, Jianpeng
    Chen, Yuanhong
    Ma, Congbo
    Avery, Jodie
    Hull, Louise
    Carneiro, Gustavo
    COMPUTER VISION, ECCV 2022, PT XXXVII, 2022, 13697 : 200 - 217
  • [34] Cross-Modal Object Detection Via UAV
    Li, Ang
    Ni, Shouxiang
    Chen, Yanan
    Chen, Jianxin
    Wei, Xin
    Zhou, Liang
    Guizani, Mohsen
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023, 72 (08) : 10894 - 10905
  • [35] Cross-modal associations involving colour and touch Does hue matter?
    Jraissati, Yasmina
    Wright, Oliver
    PROGRESS IN COLOUR STUDIES: COGNITION, LANGUAGE AND BEYOND, 2018, : 147 - 161
  • [36] VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix
    Wang, Teng
    Jiang, Wenhao
    Lu, Zhichao
    Zheng, Feng
    Cheng, Ran
    Yin, Chengguo
    Luo, Ping
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [37] Transformer vision-language tracking via proxy token guided cross-modal fusion
    Zhao, Haojie
    Wang, Xiao
    Wang, Dong
    Lu, Huchuan
    Ruan, Xiang
    PATTERN RECOGNITION LETTERS, 2023, 168 : 10 - 16
  • [38] ViSTA: Vision and Scene Text Aggregation for Cross-Modal Retrieval
    Cheng, Mengjun
    Sun, Yipeng
    Wang, Longchao
    Zhu, Xiongwei
    Yao, Kun
    Chen, Jie
    Song, Guoli
    Han, Junyu
    Liu, Jingtuo
    Ding, Errui
    Wang, Jingdong
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5174 - 5183
  • [39] Cross-modal psychological refractory period in vision, audition, and haptics
    Rau, Pei-Luen Patrick
    Zheng, Jian
    ATTENTION PERCEPTION & PSYCHOPHYSICS, 2020, 82 (04) : 1573 - 1585
  • [40] Cross-modal correspondence between vision and olfaction: The color of smells
    Gilbert, AN
    Martin, R
    Kemp, SE
    AMERICAN JOURNAL OF PSYCHOLOGY, 1996, 109 (03): : 335 - 351