Connecting Touch and Vision via Cross-Modal Prediction

被引：60

作者：

Li, Yunzhu ^{[1
]}

Zhu, Jun-Yan ^{[1
]}

Tedrake, Russ ^{[1
]}

Torralba, Antonio ^{[1
]}

机构：

[1] MIT, CSAIL, Cambridge, MA 02139 USA

来源：

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年

关键词：

D O I：

10.1109/CVPR.2019.01086

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Humans perceive the world using multi-modal sensory inputs such as vision, audition, and touch. In this work, we investigate the cross-modal connection between vision and touch. The main challenge in this cross-domain modeling task lies in the significant scale discrepancy between the two: while our eyes perceive an entire visual scene at once, humans can only feel a small region of an object at any given moment. To connect vision and touch, we introduce new tasks of synthesizing plausible tactile signals from visual inputs as well as imagining how we interact with objects given tactile data as input. To accomplish our goals, we first equip robots with both visual and tactile sensors and collect a large-scale dataset of corresponding vision and tactile image sequences. To close the scale gap, we present a new conditional adversarial model that incorporates the scale and location information of the touch. Human perceptual studies demonstrate that our model can produce realistic visual images from tactile data and vice versa. Finally, we present both qualitative and quantitative experimental results regarding different system designs, as well as visualizing the learned representations of our model.

引用

页码：10601 / 10610

页数：10

共 50 条

[21] Improving Cross-Modal Alignment in Vision Language Navigation via Syntactic Information
Li, Jialu
Tan, Hao
Bansal, Mohit
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 1041 - 1050
[22] Cross-Modal Correspondence Among Vision, Audition, and Touch in Natural Objects: An Investigation of the Perceptual Properties of Wood
Kanaya, Shoko
Kariya, Kenji
Fujisaki, Waka
PERCEPTION, 2016, 45 (10) : 1099 - 1114
[23] Cross-modal localization via sparsity
Kidron, Einat
Schechner, Yoav Y.
Elad, Michael
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2007, 55 (04) : 1390 - 1404
[24] Cross-modal decoupling in temporal attention between audition and touch
Stefanie Mühlberg
Salvador Soto-Faraco
Psychological Research, 2019, 83 : 1626 - 1639
[25] Cross-modal decoupling in temporal attention between audition and touch
Muhlberg, Stefanie
Soto-Faraco, Salvador
PSYCHOLOGICAL RESEARCH-PSYCHOLOGISCHE FORSCHUNG, 2019, 83 (08): : 1626 - 1639
[26] A cross-modal aftereffect reveals merging of proprioception and vision
Bertamini, M.
Thraves, E.
Bruno, N.
PERCEPTION, 2007, 36 (09) : 1404 - 1404
[27] Cross-modal adapter for vision-language retrieval
Jiang, Haojun
Zhang, Jianke
Huang, Rui
Ge, Chunjiang
Ni, Zanlin
Song, Shiji
Huang, Gao
PATTERN RECOGNITION, 2025, 159
[28] Cross-modal Map Learning for Vision and Language Navigation
Georgakis, Georgios
Schmeckpeper, Karl
Wanchoo, Karan
Dan, Soham
Miltsakaki, Eleni
Roth, Dan
Daniilidis, Kostas
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15439 - 15449
[29] Cross-Modal Image Registration via Rasterized Parameter Prediction for Object Tracking
Zhang, Qing
Xiang, Wei
APPLIED SCIENCES-BASEL, 2023, 13 (09):
[30] Unimodal and cross-modal prediction is enhanced in musicians
Eliana Vassena
Katty Kochman
Julie Latomme
Tom Verguts
Scientific Reports, 6

← 1 2 3 4 5 →