ModDrop: Adaptive Multi-Modal Gesture Recognition

被引:193
|
作者
Neverova, Natalia [1 ]
Wolf, Christian [1 ]
Taylor, Graham [2 ]
Nebout, Florian [3 ]
机构
[1] INSA Lyon, LIRIS, UMR5205, F-69621 Villeurbanne, France
[2] Univ Guelph, Sch Engn, Guelph, ON N1G 2W1, Canada
[3] Awabot, Villeurbanne, Rhone Alpes, France
关键词
Gesture recognition; convolutional neural networks; multi-modal learning; deep learning; POSE; MODELS;
D O I
10.1109/TPAMI.2015.2461544
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a method for gesture detection and localisation based on multi-scale and multi-modal deep learning. Each visual modality captures spatial information at a particular spatial scale (such as motion of the upper body or a hand), and the whole system operates at three temporal scales. Key to our technique is a training strategy which exploits: i) careful initialization of individual modalities; and ii) gradual fusion involving random dropping of separate channels (dubbed ModDrop) for learning cross-modality correlations while preserving uniqueness of each modality-specific representation. We present experiments on the ChaLearn 2014 Looking at People Challenge gesture recognition track, in which we placed first out of 17 teams. Fusing multiple modalities at several spatial and temporal scales leads to a significant increase in recognition rates, allowing the model to compensate for errors of the individual classifiers as well as noise in the separate channels. Futhermore, the proposed ModDrop training technique ensures robustness of the classifier to missing signals in one or several channels to produce meaningful predictions from any number of available modalities. In addition, we demonstrate the applicability of the proposed fusion scheme to modalities of arbitrary nature by experiments on the same dataset augmented with audio.
引用
收藏
页码:1692 / 1706
页数:15
相关论文
共 50 条
  • [31] Multi-modal gesture recognition with voting-based dynamic time warping
    Kuang, Yiqun
    Cheng, Hong
    Hao, Jiasheng
    Xie, Ruimeng
    Cui, Fang
    [J]. INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2019, 16 (06):
  • [32] Multi-modal Gesture Recognition using Integrated Model of Motion, Audio and Video
    GOUTSU Yusuke
    KOBAYASHI Takaki
    OBARA Junya
    KUSAJIMA Ikuo
    TAKEICHI Kazunari
    TAKANO Wataru
    NAKAMURA Yoshihiko
    [J]. Chinese Journal of Mechanical Engineering., 2015, 28 (04) - 665
  • [33] Multi-modal user interface combining eye tracking and hand gesture recognition
    Hansol Kim
    Kun Ha Suh
    Eui Chul Lee
    [J]. Journal on Multimodal User Interfaces, 2017, 11 : 241 - 250
  • [34] Adaptive information fusion network for multi-modal personality recognition
    Bao, Yongtang
    Liu, Xiang
    Qi, Yue
    Liu, Ruijun
    Li, Haojie
    [J]. COMPUTER ANIMATION AND VIRTUAL WORLDS, 2024, 35 (03)
  • [35] AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition
    Panda, Rameswar
    Chen, Chun-Fu
    Fan, Quanfu
    Sun, Ximeng
    Saenko, Kate
    Oliva, Aude
    Feris, Rogerio
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7556 - 7565
  • [36] Multi-Modal Face Recognition
    Shen, Haihong
    Ma, Liqun
    Zhang, Qishan
    [J]. 2ND IEEE INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER CONTROL (ICACC 2010), VOL. 5, 2010, : 612 - 616
  • [37] Multi-Modal Face Recognition
    Shen, Haihong
    Ma, Liqun
    Zhang, Qishan
    [J]. 2010 8TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2010, : 720 - 723
  • [38] Pen-based gesture recognition in multi-modal human-computer interaction
    Wang, Y.J.
    Yuan, B.Z.
    [J]. Beifang Jiaotong Daxue Xuebao/Journal of Northern Jiaotong University, 2001, 25 (02):
  • [39] Modality-convolutions: Multi-modal Gesture Recognition Based on Convolutional Neural Network
    Huo, Da
    Chen, Yufeng
    Li, Fengxia
    Lei, Zhengchao
    [J]. 2017 12TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND EDUCATION (ICCSE 2017), 2017, : 349 - 353
  • [40] A Multi-modal Gesture Recognition System Using Audio, Video, and Skeletal Joint Data
    Nandakumar, Karthik
    Wah, Wan Kong
    Alice, Chan Siu Man
    Terence, Ng Wen Zheng
    Gang, Wang Jian
    Yun, Yau Wei
    [J]. ICMI'13: PROCEEDINGS OF THE 2013 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2013, : 475 - 482