A Multi Modal Approach to Gesture Recognition from Audio and Video Data

被引：9

作者：

Bayer, Immanuel ^{[1
]}

Silbermann, Thierry ^{[1
]}

机构：

[1] Univ Konstanz, D-78457 Constance, Germany

来源：

ICMI'13: PROCEEDINGS OF THE 2013 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION | 2013年

关键词：

Multi-modal interaction; speech and gesture recognition; fusion;

D O I：

10.1145/2522848.2532592

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

We describe in this paper our approach for the Multi-modal gesture recognition challenge organized by ChaLearn in conjunction with the ICMI 2013 conference. The competition's task was to learn a vocabulary of 20 types of Italian gestures performed from different persons and to detect them in sequences. We develop an algorithm to find the gesture intervals in the audio data, extract audio features from those intervals and train two different models. We engineer features from the skeleton data and use the gesture intervals in the training data to train a model that we afterwards apply to the test sequences using a sliding window. We combine the models through weighted averaging. We find that this way to combine information from two different sources boosts the models performance significantly.

引用

页码：461 / 465

页数：5

共 50 条

[31] Multi-Modal Multi-Action Video Recognition
Shi, Zhensheng
Liang, Ju
Li, Qianqian
Zheng, Haiyong
Gu, Zhaorui
Dong, Junyu
Zheng, Bing
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13658 - 13667
[32] A Multi-Modal Egocentric Activity Recognition Approach towards Video Domain Generalization
Papadakis, Antonios
Spyrou, Evaggelos
SENSORS, 2024, 24 (08)
[33] Analysis of Deep Fusion Strategies for Multi-modal Gesture Recognition
Roitberg, Alina
Pollert, Tim
Haurilet, Monica
Martin, Manuel
Stiefelhagen, Rainer
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 198 - 206
[34] Convolutional Transformer Fusion Blocks for Multi-Modal Gesture Recognition
Hampiholi, Basavaraj
Jarvers, Christian
Mader, Wolfgang
Neumann, Heiko
IEEE ACCESS, 2023, 11 : 34094 - 34103
[35] Bayesian Co-Boosting for Multi-modal Gesture Recognition
Wu, Jiaxiang
Cheng, Jian
JOURNAL OF MACHINE LEARNING RESEARCH, 2014, 15 : 3013 - 3036
[36] Multi-modal Gesture Recognition Challenge 2013: Dataset and Results
Escalera, Sergio
Gonzalez, Jordi
Baro, Xavier
Reyes, Miguel
Lopes, Oscar
Guyon, Isabelle
Athitsos, Vassilis
Escalante, Hugo J.
ICMI'13: PROCEEDINGS OF THE 2013 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2013, : 445 - 452
[37] Ubiquitous Emotion Recognition Using Audio and Video Data
Jannat, Rahatul
Tynes, Iyonna
LaLime, Lott
Adorno, Juan
Canavan, Shaun
PROCEEDINGS OF THE 2018 ACM INTERNATIONAL JOINT CONFERENCE ON PERVASIVE AND UBIQUITOUS COMPUTING AND PROCEEDINGS OF THE 2018 ACM INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTERS (UBICOMP/ISWC'18 ADJUNCT), 2018, : 956 - 959
[38] Bayesian co-boosting for multi-modal gesture recognition
Wu, Jiaxiang
Cheng, Jian
Journal of Machine Learning Research, 2014, 15 : 3013 - 3036
[39] Multi Speaker Detection and Tracking using Audio and Video Sensors with Gesture Analysis
Hariharan, Balaji
Hari, S.
Gopalakrishnan, Uma
2013 TENTH INTERNATIONAL CONFERENCE ON WIRELESS AND OPTICAL COMMUNICATIONS NETWORKS (WOCN), 2013,
[40] Multi-modal Laughter Recognition in Video Conversations
Escalera, Sergio
Puertas, Eloi
Radeva, Petia
Pujol, Oriol
2009 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPR WORKSHOPS 2009), VOLS 1 AND 2, 2009, : 869 - 874

← 1 2 3 4 5 →