ModDrop: Adaptive Multi-Modal Gesture Recognition

被引：193

作者：

Neverova, Natalia ^{[1
]}

Wolf, Christian ^{[1
]}

Taylor, Graham ^{[2
]}

Nebout, Florian ^{[3
]}

机构：

[1] INSA Lyon, LIRIS, UMR5205, F-69621 Villeurbanne, France

[2] Univ Guelph, Sch Engn, Guelph, ON N1G 2W1, Canada

[3] Awabot, Villeurbanne, Rhone Alpes, France

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2016年 / 38卷 / 08期

关键词：

Gesture recognition; convolutional neural networks; multi-modal learning; deep learning; POSE; MODELS;

D O I：

10.1109/TPAMI.2015.2461544

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a method for gesture detection and localisation based on multi-scale and multi-modal deep learning. Each visual modality captures spatial information at a particular spatial scale (such as motion of the upper body or a hand), and the whole system operates at three temporal scales. Key to our technique is a training strategy which exploits: i) careful initialization of individual modalities; and ii) gradual fusion involving random dropping of separate channels (dubbed ModDrop) for learning cross-modality correlations while preserving uniqueness of each modality-specific representation. We present experiments on the ChaLearn 2014 Looking at People Challenge gesture recognition track, in which we placed first out of 17 teams. Fusing multiple modalities at several spatial and temporal scales leads to a significant increase in recognition rates, allowing the model to compensate for errors of the individual classifiers as well as noise in the separate channels. Futhermore, the proposed ModDrop training technique ensures robustness of the classifier to missing signals in one or several channels to produce meaningful predictions from any number of available modalities. In addition, we demonstrate the applicability of the proposed fusion scheme to modalities of arbitrary nature by experiments on the same dataset augmented with audio.

引用

页码：1692 / 1706

页数：15

共 50 条

[21] Multi-modal user interface combining eye tracking and hand gesture recognition
Kim, Hansol
Suh, Kun Ha
Lee, Eui Chul
[J]. JOURNAL ON MULTIMODAL USER INTERFACES, 2017, 11 (03) : 241 - 250
[22] Multi-modal Gesture Recognition using Integrated Model of Motion, Audio and Video
Goutsu, Yusuke
Kobayashi, Takaki
Obara, Junya
Kusajima, Ikuo
Takeichi, Kazunari
Takano, Wataru
Nakamura, Yoshihiko
[J]. CHINESE JOURNAL OF MECHANICAL ENGINEERING, 2015, 28 (04) : 657 - 665
[23] A Multi-modal Gesture Recognition System in a Human-Robot Interaction Scenario
Li, Zhi
Jarvis, Ray
[J]. 2009 IEEE INTERNATIONAL WORKSHOP ON ROBOTIC AND SENSORS ENVIRONMENTS (ROSE 2009), 2009, : 41 - 46
[24] Multi-modal Gesture Recognition using Integrated Model of Motion, Audio and Video
GOUTSU Yusuke
KOBAYASHI Takaki
OBARA Junya
KUSAJIMA Ikuo
TAKEICHI Kazunari
TAKANO Wataru
NAKAMURA Yoshihiko
[J]. Chinese Journal of Mechanical Engineering, 2015, (04) : 657 - 665
[25] Multi-modal fusion for robust hand gesture recognition based on heterogeneous networks
ZOU YongXiang
CHENG Long
HAN LiJun
LI ZhengWei
[J]. Science China Technological Sciences, 2023, (11) : 3219 - 3230
[26] Multi-modal gesture recognition using integrated model of motion, audio and video
Yusuke Goutsu
Takaki Kobayashi
Junya Obara
Ikuo Kusajima
Kazunari Takeichi
Wataru Takano
Yoshihiko Nakamura
[J]. Chinese Journal of Mechanical Engineering, 2015, 28 : 657 - 665
[27] Multi-modal fusion for robust hand gesture recognition based on heterogeneous networks
YongXiang Zou
Long Cheng
LiJun Han
ZhengWei Li
[J]. Science China Technological Sciences, 2023, 66 : 3219 - 3230
[28] Multi-modal Gesture Recognition Using Skeletal Joints and Motion Trail Model
Liang, Bin
Zheng, Lihong
[J]. COMPUTER VISION - ECCV 2014 WORKSHOPS, PT I, 2015, 8925 : 623 - 638
[29] Multi-modal fusion for robust hand gesture recognition based on heterogeneous networks
Zou, Yongxiang
Cheng, Long
Han, Lijun
Li, Zhengwei
[J]. SCIENCE CHINA-TECHNOLOGICAL SCIENCES, 2023, 66 (11) : 3219 - 3230
[30] Multi-modal fusion for robust hand gesture recognition based on heterogeneous networks
ZOU YongXiang
CHENG Long
HAN LiJun
LI ZhengWei
[J]. Science China(Technological Sciences)., 2023, 66 (11) - 3230

← 1 2 3 4 5 →