ModDrop: Adaptive Multi-Modal Gesture Recognition

被引:193
|
作者
Neverova, Natalia [1 ]
Wolf, Christian [1 ]
Taylor, Graham [2 ]
Nebout, Florian [3 ]
机构
[1] INSA Lyon, LIRIS, UMR5205, F-69621 Villeurbanne, France
[2] Univ Guelph, Sch Engn, Guelph, ON N1G 2W1, Canada
[3] Awabot, Villeurbanne, Rhone Alpes, France
关键词
Gesture recognition; convolutional neural networks; multi-modal learning; deep learning; POSE; MODELS;
D O I
10.1109/TPAMI.2015.2461544
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a method for gesture detection and localisation based on multi-scale and multi-modal deep learning. Each visual modality captures spatial information at a particular spatial scale (such as motion of the upper body or a hand), and the whole system operates at three temporal scales. Key to our technique is a training strategy which exploits: i) careful initialization of individual modalities; and ii) gradual fusion involving random dropping of separate channels (dubbed ModDrop) for learning cross-modality correlations while preserving uniqueness of each modality-specific representation. We present experiments on the ChaLearn 2014 Looking at People Challenge gesture recognition track, in which we placed first out of 17 teams. Fusing multiple modalities at several spatial and temporal scales leads to a significant increase in recognition rates, allowing the model to compensate for errors of the individual classifiers as well as noise in the separate channels. Futhermore, the proposed ModDrop training technique ensures robustness of the classifier to missing signals in one or several channels to produce meaningful predictions from any number of available modalities. In addition, we demonstrate the applicability of the proposed fusion scheme to modalities of arbitrary nature by experiments on the same dataset augmented with audio.
引用
收藏
页码:1692 / 1706
页数:15
相关论文
共 50 条
  • [1] MULTI-MODAL LEARNING FOR GESTURE RECOGNITION
    Cao, Congqi
    Zhang, Yifan
    Lu, Hanqing
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO (ICME), 2015,
  • [2] Adaptive cross-fusion learning for multi-modal gesture recognition
    Benjia ZHOU
    Jun WAN
    Yanyan LIANG
    Guodong GUO
    [J]. 虚拟现实与智能硬件(中英文), 2021, 3 (03) : 235 - 247
  • [3] On Multi-modal Fusion for Freehand Gesture Recognition
    Schak, Monika
    Gepperth, Alexander
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT I, 2020, 12396 : 862 - 873
  • [4] Fusing Multi-modal Features for Gesture Recognition
    Wu, Jiaxiang
    Cheng, Jian
    Zhao, Chaoyang
    Lu, Hanqing
    [J]. ICMI'13: PROCEEDINGS OF THE 2013 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2013, : 453 - 459
  • [5] Gesture Recognition on a New Multi-Modal Hand Gesture Dataset
    Schak, Monika
    Gepperth, Alexander
    [J]. PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS (ICPRAM), 2021, : 122 - 131
  • [6] Gesture Recognition and Multi-modal Fusion on a New Hand Gesture Dataset
    Schak, Monika
    Gepperth, Alexander
    [J]. PATTERN RECOGNITION APPLICATIONS AND METHODS, ICPRAM 2021, ICPRAM 2022, 2023, 13822 : 76 - 97
  • [7] Gesture recognition based on multi-modal feature weight
    Duan, Haojie
    Sun, Ying
    Cheng, Wentao
    Jiang, Du
    Yun, Juntong
    Liu, Ying
    Liu, Yibo
    Zhou, Dalin
    [J]. Concurrency and Computation: Practice and Experience, 2021, 33 (05)
  • [8] A Unified Framework for Multi-Modal Isolated Gesture Recognition
    Duan, Jiali
    Wan, Jun
    Zhou, Shuai
    Guo, Xiaoyuan
    Li, Stan Z.
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2018, 14 (01)
  • [9] Gesture recognition based on multi-modal feature weight
    Duan, Haojie
    Sun, Ying
    Cheng, Wentao
    Jiang, Du
    Yun, Juntong
    Liu, Ying
    Liu, Yibo
    Zhou, Dalin
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (05):
  • [10] Convolutional Transformer Fusion Blocks for Multi-Modal Gesture Recognition
    Hampiholi, Basavaraj
    Jarvers, Christian
    Mader, Wolfgang
    Neumann, Heiko
    [J]. IEEE ACCESS, 2023, 11 : 34094 - 34103