Visual-Haptic-Kinesthetic Object Recognition with Multimodal Transformer

被引:1
|
作者
Zhou, Xinyuan [1 ]
Lan, Shiyong [1 ]
Wa, Wenwu [2 ]
Li, Xinyang [1 ]
Zhou, Siyuan [1 ]
Yang, Hongyu [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[2] Univ Surrey, Guildford GU2 7XH, Surrey, England
关键词
Object Recognition; Multimodal Deep Learning; Multimodal Fusion; Attention Mechanism; TACTILE FUSION; NETWORK;
D O I
10.1007/978-3-031-44195-0_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Humans recognize objects by combining multi-sensory information in a coordinated fashion. However, visual-based and haptic-based object recognition remain two separate research directions in robotics. Visual images and haptic time series have different properties, which can be difficult for robots to fuse for object recognition as humans do. In this work, we propose an architecture to fuse visual, haptic and kinesthetic data for object recognition, based on the multimodal Convolutional Recurrent Neural Networks with Transformer. We use Convolutional Neural Networks (CNNs) to learn spatial representation, Recurrent Neural Networks (RNNs) to model temporal relationships, and Transformer's self-attention and cross-attention structures to focus on global and cross-modal information. We propose two fusion methods and conduct experiments on the multimodal AU dataset. The results show that our model offers higher accuracy than the latest multimodal object recognition methods. We conduct an ablation study on the individual components of the inputs to demonstrate the importance of multimodal information in object recognition. The codes will be available at https://github.com/SYLan2019/VHKOR.
引用
收藏
页码:233 / 245
页数:13
相关论文
共 50 条
  • [31] HAND MOVEMENTS - A WINDOW INTO HAPTIC OBJECT RECOGNITION
    LEDERMAN, SJ
    KLATZKY, RL
    COGNITIVE PSYCHOLOGY, 1987, 19 (03) : 342 - 368
  • [32] Early blindness modulates haptic object recognition
    Leo, Fabrizio
    Gori, Monica
    Sciutti, Alessandra
    FRONTIERS IN HUMAN NEUROSCIENCE, 2022, 16
  • [33] The effects of size changes on haptic object recognition
    Craddock, Matt
    Lawson, Rebecca
    ATTENTION PERCEPTION & PSYCHOPHYSICS, 2009, 71 (04) : 910 - 923
  • [34] Controlled Tactile Exploration and Haptic Object Recognition
    Regoli, Massimo
    Jamali, Nawid
    Metta, Giorgio
    Natale, Lorenzo
    2017 18TH INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS (ICAR), 2017, : 47 - 54
  • [35] Importance of kinesthetic-haptic integration when perceiving hand-held object
    Ohara, Kazuki
    Hagura, Nobuhiro
    Naito, Eiichi
    Matsumura, Michikazu
    NEUROSCIENCE RESEARCH, 2008, 61 : S171 - S171
  • [36] Haptic, Audio, and Visual: Multimodal Distribution for Interactive Games
    Gaudina, Marco
    Zappi, Victor
    Brogni, Andrea
    Caldwell, Darwin G.
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2012, 61 (11) : 3103 - 3111
  • [37] TVT-Transformer: A Tactile-visual-textual fusion network for object recognition
    Li, Baojiang
    Li, Liang
    Wang, Haiyan
    Chen, Guochu
    Wang, Bin
    Qiu, Shengjie
    INFORMATION FUSION, 2025, 118
  • [38] Multimodal Computational Modeling of Visual Object Recognition Deficits but Intact Repetition Priming in Schizophrenia
    Sehatpour, Pejman
    Bassir Nia, Anahita
    Adair, Devin
    Wang, Zhishun
    DeBaun, Heloise M.
    Silipo, Gail
    Martinez, Antigona
    Javitt, Daniel C.
    FRONTIERS IN PSYCHIATRY, 2020, 11
  • [39] Multimodal data fusion for object recognition
    Knyaz, Vladimir
    MULTIMODAL SENSING: TECHNOLOGIES AND APPLICATIONS, 2019, 11059
  • [40] Multimodal Incremental Transformer with Visual Grounding for Visual Dialogue Generation
    Chen, Feilong
    Meng, Fandong
    Chen, Xiuyi
    Li, Peng
    Zhou, Jie
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 436 - 446