A Multi Modal Approach to Gesture Recognition from Audio and Video Data

被引：9

作者：

Bayer, Immanuel ^{[1
]}

Silbermann, Thierry ^{[1
]}

机构：

[1] Univ Konstanz, D-78457 Constance, Germany

来源：

ICMI'13: PROCEEDINGS OF THE 2013 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION | 2013年

关键词：

Multi-modal interaction; speech and gesture recognition; fusion;

D O I：

10.1145/2522848.2532592

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

We describe in this paper our approach for the Multi-modal gesture recognition challenge organized by ChaLearn in conjunction with the ICMI 2013 conference. The competition's task was to learn a vocabulary of 20 types of Italian gestures performed from different persons and to detect them in sequences. We develop an algorithm to find the gesture intervals in the audio data, extract audio features from those intervals and train two different models. We engineer features from the skeleton data and use the gesture intervals in the training data to train a model that we afterwards apply to the test sequences using a sliding window. We combine the models through weighted averaging. We find that this way to combine information from two different sources boosts the models performance significantly.

引用

页码：461 / 465

页数：5

共 50 条

[21] An Interface for Audio Control Using Gesture Recognition and IMU Data
Vimos, Victor H.
Valdivieso Caraguay, Angel Leonardo
Barona Lopez, Lorena Isabel
Pozo Espin, David
Benalcazar, Marco E.
TRENDS IN ARTIFICIAL INTELLIGENCE AND COMPUTER ENGINEERING (ICAETT 2021), 2022, 407 : 168 - 180
[22] A Unified Framework for Multi-Modal Isolated Gesture Recognition
Duan, Jiali
Wan, Jun
Zhou, Shuai
Guo, Xiaoyuan
Li, Stan Z.
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2018, 14 (01)
[23] Gesture recognition based on multi-modal feature weight
Duan, Haojie
Sun, Ying
Cheng, Wentao
Jiang, Du
Yun, Juntong
Liu, Ying
Liu, Yibo
Zhou, Dalin
Concurrency and Computation: Practice and Experience, 2021, 33 (05)
[24] Gesture recognition based on multi-modal feature weight
Duan, Haojie
Sun, Ying
Cheng, Wentao
Jiang, Du
Yun, Juntong
Liu, Ying
Liu, Yibo
Zhou, Dalin
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (05):
[25] Fusion of audio and video information for multi modal person authentication
Duc, B
Bigun, ES
Bigun, J
Maitre, G
Fischer, S
PATTERN RECOGNITION LETTERS, 1997, 18 (09) : 835 - 843
[26] Video and audio are images: A cross-modal mixer for original data on video-audio retrieval
Yuan, Zichen
Shen, Qi
Zheng, Bingyi
Liu, Yuting
Jiang, Linying
Guo, Guibing
KNOWLEDGE-BASED SYSTEMS, 2024, 299
[27] GESTURE RECOGNITION USING VIDEO AND FLOOR PRESSURE DATA
Qian, Gang
Peng, Bo
Zhang, Jiqing
2012 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2012), 2012, : 173 - 176
[28] Gesture Based Audio/Video Player
Vadgama, Indrajeet
Khot, Yash
Thaker, Yash
Jouras, Pranali
Mane, Yogita
ADVANCES IN OPTICAL SCIENCE AND ENGINEERING, 2017, 194 : 369 - 378
[29] Implementation of Multi-modal Speech Emotion Recognition Using Text Data and Audio Signals
Adesola, Falade
Adeyinka, Omirinlewo
Kayode, Akindeji
Ayodele, Adebiyi
2023 International Conference on Science, Engineering and Business for Sustainable Development Goals, SEB-SDG 2023, 2023,
[30] Static Hand Gesture Recognition from a Video
Rokade, Rajeshree S.
Doye, Dharmpal
INTERNATIONAL CONFERENCE ON GRAPHIC AND IMAGE PROCESSING (ICGIP 2011), 2011, 8285

← 1 2 3 4 5 →