Connecting Touch and Vision via Cross-Modal Prediction

被引:60
|
作者
Li, Yunzhu [1 ]
Zhu, Jun-Yan [1 ]
Tedrake, Russ [1 ]
Torralba, Antonio [1 ]
机构
[1] MIT, CSAIL, Cambridge, MA 02139 USA
关键词
D O I
10.1109/CVPR.2019.01086
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Humans perceive the world using multi-modal sensory inputs such as vision, audition, and touch. In this work, we investigate the cross-modal connection between vision and touch. The main challenge in this cross-domain modeling task lies in the significant scale discrepancy between the two: while our eyes perceive an entire visual scene at once, humans can only feel a small region of an object at any given moment. To connect vision and touch, we introduce new tasks of synthesizing plausible tactile signals from visual inputs as well as imagining how we interact with objects given tactile data as input. To accomplish our goals, we first equip robots with both visual and tactile sensors and collect a large-scale dataset of corresponding vision and tactile image sequences. To close the scale gap, we present a new conditional adversarial model that incorporates the scale and location information of the touch. Human perceptual studies demonstrate that our model can produce realistic visual images from tactile data and vice versa. Finally, we present both qualitative and quantitative experimental results regarding different system designs, as well as visualizing the learned representations of our model.
引用
收藏
页码:10601 / 10610
页数:10
相关论文
共 50 条
  • [21] Improving Cross-Modal Alignment in Vision Language Navigation via Syntactic Information
    Li, Jialu
    Tan, Hao
    Bansal, Mohit
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 1041 - 1050
  • [22] Cross-Modal Correspondence Among Vision, Audition, and Touch in Natural Objects: An Investigation of the Perceptual Properties of Wood
    Kanaya, Shoko
    Kariya, Kenji
    Fujisaki, Waka
    PERCEPTION, 2016, 45 (10) : 1099 - 1114
  • [23] Cross-modal localization via sparsity
    Kidron, Einat
    Schechner, Yoav Y.
    Elad, Michael
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2007, 55 (04) : 1390 - 1404
  • [24] Cross-modal decoupling in temporal attention between audition and touch
    Stefanie Mühlberg
    Salvador Soto-Faraco
    Psychological Research, 2019, 83 : 1626 - 1639
  • [25] Cross-modal decoupling in temporal attention between audition and touch
    Muhlberg, Stefanie
    Soto-Faraco, Salvador
    PSYCHOLOGICAL RESEARCH-PSYCHOLOGISCHE FORSCHUNG, 2019, 83 (08): : 1626 - 1639
  • [26] A cross-modal aftereffect reveals merging of proprioception and vision
    Bertamini, M.
    Thraves, E.
    Bruno, N.
    PERCEPTION, 2007, 36 (09) : 1404 - 1404
  • [27] Cross-modal adapter for vision-language retrieval
    Jiang, Haojun
    Zhang, Jianke
    Huang, Rui
    Ge, Chunjiang
    Ni, Zanlin
    Song, Shiji
    Huang, Gao
    PATTERN RECOGNITION, 2025, 159
  • [28] Cross-modal Map Learning for Vision and Language Navigation
    Georgakis, Georgios
    Schmeckpeper, Karl
    Wanchoo, Karan
    Dan, Soham
    Miltsakaki, Eleni
    Roth, Dan
    Daniilidis, Kostas
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15439 - 15449
  • [29] Cross-Modal Image Registration via Rasterized Parameter Prediction for Object Tracking
    Zhang, Qing
    Xiang, Wei
    APPLIED SCIENCES-BASEL, 2023, 13 (09):
  • [30] Unimodal and cross-modal prediction is enhanced in musicians
    Eliana Vassena
    Katty Kochman
    Julie Latomme
    Tom Verguts
    Scientific Reports, 6