ModDrop: Adaptive Multi-Modal Gesture Recognition

被引:193
|
作者
Neverova, Natalia [1 ]
Wolf, Christian [1 ]
Taylor, Graham [2 ]
Nebout, Florian [3 ]
机构
[1] INSA Lyon, LIRIS, UMR5205, F-69621 Villeurbanne, France
[2] Univ Guelph, Sch Engn, Guelph, ON N1G 2W1, Canada
[3] Awabot, Villeurbanne, Rhone Alpes, France
关键词
Gesture recognition; convolutional neural networks; multi-modal learning; deep learning; POSE; MODELS;
D O I
10.1109/TPAMI.2015.2461544
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a method for gesture detection and localisation based on multi-scale and multi-modal deep learning. Each visual modality captures spatial information at a particular spatial scale (such as motion of the upper body or a hand), and the whole system operates at three temporal scales. Key to our technique is a training strategy which exploits: i) careful initialization of individual modalities; and ii) gradual fusion involving random dropping of separate channels (dubbed ModDrop) for learning cross-modality correlations while preserving uniqueness of each modality-specific representation. We present experiments on the ChaLearn 2014 Looking at People Challenge gesture recognition track, in which we placed first out of 17 teams. Fusing multiple modalities at several spatial and temporal scales leads to a significant increase in recognition rates, allowing the model to compensate for errors of the individual classifiers as well as noise in the separate channels. Futhermore, the proposed ModDrop training technique ensures robustness of the classifier to missing signals in one or several channels to produce meaningful predictions from any number of available modalities. In addition, we demonstrate the applicability of the proposed fusion scheme to modalities of arbitrary nature by experiments on the same dataset augmented with audio.
引用
收藏
页码:1692 / 1706
页数:15
相关论文
共 50 条
  • [41] Mudra: A Multi-Modal Smartwatch Interactive System with Hand Gesture Recognition and User Identification
    Guo, Kaiwen
    Zhou, Hao
    Tian, Ye
    Zhou, Wangqiu
    Ji, Yusheng
    Li, Xiang-Yang
    [J]. IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2022), 2022, : 100 - 109
  • [42] Erratum to: Multi-modal Gesture Recognition using Integrated Model of Motion, Audio and Video
    GOUTSU Yusuke
    KOBAYASHI Takaki
    OBARA Junya
    KUSAJIMA Ikuo
    TAKEICHI Kazunari
    TAKANO Wataru
    NAKAMURA Yoshihiko
    [J]. Chinese Journal of Mechanical Engineering, 2017, 30 : 1473 - 1473
  • [43] An enhanced artificial neural network for hand gesture recognition using multi-modal features
    Uke, Shailaja N.
    Zade, Amol V.
    [J]. COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING-IMAGING AND VISUALIZATION, 2023, 11 (06): : 2278 - 2289
  • [44] TMMF: Temporal Multi-Modal Fusion for Single-Stage Continuous Gesture Recognition
    Gammulle, Harshala
    Denman, Simon
    Sridharan, Sridha
    Fookes, Clinton
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 7689 - 7701
  • [45] Adaptive Automatic Object Recognition in Single and Multi-Modal Sensor Data
    Khuon, Timothy
    Rand, Robert
    Truslow, Eric
    [J]. 2014 IEEE APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP (AIPR), 2014,
  • [46] Multi-Modal Knowledge Distillation for Domain-Adaptive Action Recognition
    Zhu, Xiaoyu
    Liu, Wenhe
    de Melo, Celso M.
    Hauptmann, Alexander
    [J]. SYNTHETIC DATA FOR ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING: TOOLS, TECHNIQUES, AND APPLICATIONS II, 2024, 13035
  • [47] Unequal adaptive visual recognition by learning from multi-modal data
    Cai, Ziyun
    Zhang, Tengfei
    Jing, Xiao-Yuan
    Shao, Ling
    [J]. INFORMATION SCIENCES, 2022, 600 : 1 - 21
  • [48] Nonparametric Gesture Labeling from Multi-modal Data
    Chang, Ju Yong
    [J]. COMPUTER VISION - ECCV 2014 WORKSHOPS, PT I, 2015, 8925 : 503 - 517
  • [49] Innovative Multi-Modal Control for Surveillance Spider Robot: An Integration of Voice and Hand Gesture Recognition
    Dang Khoa Phan
    Phuong-Nam Tran
    Nhat Truong Pham
    Tra Huong Thi Le
    Duc Ngoc Minh Dang
    [J]. PROCEEDINGS OF THE 2024 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION TECHNOLOGY, ICIIT 2024, 2024, : 141 - 148
  • [50] Dynamic Hand Gesture Recognition from Multi-modal Streams Using Deep Neural Network
    Thanh-Hai Tran
    Hoang-Nhat Tran
    Huong-Giang Doan
    [J]. MULTI-DISCIPLINARY TRENDS IN ARTIFICIAL INTELLIGENCE, 2019, 11909 : 156 - 167