Visual-Haptic-Kinesthetic Object Recognition with Multimodal Transformer

被引：1

作者：

Zhou, Xinyuan ^{[1
]}

Lan, Shiyong ^{[1
]}

Wa, Wenwu ^{[2
]}

Li, Xinyang ^{[1
]}

Zhou, Siyuan ^{[1
]}

Yang, Hongyu ^{[1
]}

机构：

[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China

[2] Univ Surrey, Guildford GU2 7XH, Surrey, England

来源：

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VII | 2023年 / 14260卷

关键词：

Object Recognition; Multimodal Deep Learning; Multimodal Fusion; Attention Mechanism; TACTILE FUSION; NETWORK;

D O I：

10.1007/978-3-031-44195-0_20

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Humans recognize objects by combining multi-sensory information in a coordinated fashion. However, visual-based and haptic-based object recognition remain two separate research directions in robotics. Visual images and haptic time series have different properties, which can be difficult for robots to fuse for object recognition as humans do. In this work, we propose an architecture to fuse visual, haptic and kinesthetic data for object recognition, based on the multimodal Convolutional Recurrent Neural Networks with Transformer. We use Convolutional Neural Networks (CNNs) to learn spatial representation, Recurrent Neural Networks (RNNs) to model temporal relationships, and Transformer's self-attention and cross-attention structures to focus on global and cross-modal information. We propose two fusion methods and conduct experiments on the multimodal AU dataset. The results show that our model offers higher accuracy than the latest multimodal object recognition methods. We conduct an ablation study on the individual components of the inputs to demonstrate the importance of multimodal information in object recognition. The codes will be available at https://github.com/SYLan2019/VHKOR.

引用

页码：233 / 245

页数：13

共 50 条

[21] Visual, haptic and crossmodal recognition of scenes
Fiona N. Newell
Andrew T. Woods
Marion Mernagh
Heinrich H. Bülthoff
Experimental Brain Research, 2005, 161 : 233 - 242
[22] Visual object recognition
Logothetis, NK
Sheinberg, DL
ANNUAL REVIEW OF NEUROSCIENCE, 1996, 19 : 577 - 621
[23] Haptic Object Recognition using Passive Joints and Haptic Key Features
Gorges, Nicolas
Navarro, Stefan Escaida
Goeger, Dirk
Woern, Heinz
2010 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2010, : 2349 - 2355
[24] The visual and haptic perception of natural object shape
Norman, JF
Norman, HF
Clayton, AM
Lianekhammy, J
Zielke, G
PERCEPTION & PSYCHOPHYSICS, 2004, 66 (02): : 342 - 351
[25] Mental rotation in visual and haptic object comparison
Schinauer, T.
Lachmann, T.
PERCEPTION, 2013, 42 : 174 - 174
[26] Comparative Study of Haptic Training Versus Visual Training for Kinesthetic Navigation Tasks
Singapogu, Ravikiran B.
Sander, Samuel T.
Burg, Timothy C.
Cobb, William S.
MEDICINE MEETS VIRTUAL REALITY 16: PARALLEL, COMBINATORIAL, CONVERGENT: NEXTMED BY DESIGN, 2008, 132 : 469 - 471
[27] The visual and haptic perception of natural object shape
J. Farley Norman
Hideko F. Norman
Anna Marie Clayton
Joann Lianekhammy
Gina Zielke
Perception & Psychophysics, 2004, 66 : 342 - 351
[28] The effects of size changes on haptic object recognition
Matt Craddock
Rebecca Lawson
Attention, Perception, & Psychophysics, 2009, 71 : 910 - 923
[29] HAND MOVEMENTS - A WINDOW INTO HAPTIC OBJECT RECOGNITION
KLATZKY, R
LEDERMAN, S
BULLETIN OF THE PSYCHONOMIC SOCIETY, 1985, 23 (04) : 278 - 278
[30] Is the lateral occipital complex necessary for haptic object recognition? Object shape representation in a visual agnosic with bilateral occipitotemporal lesions
Snow, Jacqueline C.
Culham, Jody C.
VISUAL COGNITION, 2011, 19 (10) : 1318 - 1322

← 1 2 3 4 5 →