Visual-Haptic-Kinesthetic Object Recognition with Multimodal Transformer

被引:1
|
作者
Zhou, Xinyuan [1 ]
Lan, Shiyong [1 ]
Wa, Wenwu [2 ]
Li, Xinyang [1 ]
Zhou, Siyuan [1 ]
Yang, Hongyu [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[2] Univ Surrey, Guildford GU2 7XH, Surrey, England
关键词
Object Recognition; Multimodal Deep Learning; Multimodal Fusion; Attention Mechanism; TACTILE FUSION; NETWORK;
D O I
10.1007/978-3-031-44195-0_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Humans recognize objects by combining multi-sensory information in a coordinated fashion. However, visual-based and haptic-based object recognition remain two separate research directions in robotics. Visual images and haptic time series have different properties, which can be difficult for robots to fuse for object recognition as humans do. In this work, we propose an architecture to fuse visual, haptic and kinesthetic data for object recognition, based on the multimodal Convolutional Recurrent Neural Networks with Transformer. We use Convolutional Neural Networks (CNNs) to learn spatial representation, Recurrent Neural Networks (RNNs) to model temporal relationships, and Transformer's self-attention and cross-attention structures to focus on global and cross-modal information. We propose two fusion methods and conduct experiments on the multimodal AU dataset. The results show that our model offers higher accuracy than the latest multimodal object recognition methods. We conduct an ablation study on the individual components of the inputs to demonstrate the importance of multimodal information in object recognition. The codes will be available at https://github.com/SYLan2019/VHKOR.
引用
收藏
页码:233 / 245
页数:13
相关论文
共 50 条
  • [21] Visual, haptic and crossmodal recognition of scenes
    Fiona N. Newell
    Andrew T. Woods
    Marion Mernagh
    Heinrich H. Bülthoff
    Experimental Brain Research, 2005, 161 : 233 - 242
  • [22] Visual object recognition
    Logothetis, NK
    Sheinberg, DL
    ANNUAL REVIEW OF NEUROSCIENCE, 1996, 19 : 577 - 621
  • [23] Haptic Object Recognition using Passive Joints and Haptic Key Features
    Gorges, Nicolas
    Navarro, Stefan Escaida
    Goeger, Dirk
    Woern, Heinz
    2010 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2010, : 2349 - 2355
  • [24] The visual and haptic perception of natural object shape
    Norman, JF
    Norman, HF
    Clayton, AM
    Lianekhammy, J
    Zielke, G
    PERCEPTION & PSYCHOPHYSICS, 2004, 66 (02): : 342 - 351
  • [25] Mental rotation in visual and haptic object comparison
    Schinauer, T.
    Lachmann, T.
    PERCEPTION, 2013, 42 : 174 - 174
  • [26] Comparative Study of Haptic Training Versus Visual Training for Kinesthetic Navigation Tasks
    Singapogu, Ravikiran B.
    Sander, Samuel T.
    Burg, Timothy C.
    Cobb, William S.
    MEDICINE MEETS VIRTUAL REALITY 16: PARALLEL, COMBINATORIAL, CONVERGENT: NEXTMED BY DESIGN, 2008, 132 : 469 - 471
  • [27] The visual and haptic perception of natural object shape
    J. Farley Norman
    Hideko F. Norman
    Anna Marie Clayton
    Joann Lianekhammy
    Gina Zielke
    Perception & Psychophysics, 2004, 66 : 342 - 351
  • [28] The effects of size changes on haptic object recognition
    Matt Craddock
    Rebecca Lawson
    Attention, Perception, & Psychophysics, 2009, 71 : 910 - 923
  • [29] HAND MOVEMENTS - A WINDOW INTO HAPTIC OBJECT RECOGNITION
    KLATZKY, R
    LEDERMAN, S
    BULLETIN OF THE PSYCHONOMIC SOCIETY, 1985, 23 (04) : 278 - 278
  • [30] Is the lateral occipital complex necessary for haptic object recognition? Object shape representation in a visual agnosic with bilateral occipitotemporal lesions
    Snow, Jacqueline C.
    Culham, Jody C.
    VISUAL COGNITION, 2011, 19 (10) : 1318 - 1322