A Multi Modal Approach to Gesture Recognition from Audio and Video Data

被引:9
|
作者
Bayer, Immanuel [1 ]
Silbermann, Thierry [1 ]
机构
[1] Univ Konstanz, D-78457 Constance, Germany
关键词
Multi-modal interaction; speech and gesture recognition; fusion;
D O I
10.1145/2522848.2532592
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We describe in this paper our approach for the Multi-modal gesture recognition challenge organized by ChaLearn in conjunction with the ICMI 2013 conference. The competition's task was to learn a vocabulary of 20 types of Italian gestures performed from different persons and to detect them in sequences. We develop an algorithm to find the gesture intervals in the audio data, extract audio features from those intervals and train two different models. We engineer features from the skeleton data and use the gesture intervals in the training data to train a model that we afterwards apply to the test sequences using a sliding window. We combine the models through weighted averaging. We find that this way to combine information from two different sources boosts the models performance significantly.
引用
收藏
页码:461 / 465
页数:5
相关论文
共 50 条
  • [21] An Interface for Audio Control Using Gesture Recognition and IMU Data
    Vimos, Victor H.
    Valdivieso Caraguay, Angel Leonardo
    Barona Lopez, Lorena Isabel
    Pozo Espin, David
    Benalcazar, Marco E.
    TRENDS IN ARTIFICIAL INTELLIGENCE AND COMPUTER ENGINEERING (ICAETT 2021), 2022, 407 : 168 - 180
  • [22] A Unified Framework for Multi-Modal Isolated Gesture Recognition
    Duan, Jiali
    Wan, Jun
    Zhou, Shuai
    Guo, Xiaoyuan
    Li, Stan Z.
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2018, 14 (01)
  • [23] Gesture recognition based on multi-modal feature weight
    Duan, Haojie
    Sun, Ying
    Cheng, Wentao
    Jiang, Du
    Yun, Juntong
    Liu, Ying
    Liu, Yibo
    Zhou, Dalin
    Concurrency and Computation: Practice and Experience, 2021, 33 (05)
  • [24] Gesture recognition based on multi-modal feature weight
    Duan, Haojie
    Sun, Ying
    Cheng, Wentao
    Jiang, Du
    Yun, Juntong
    Liu, Ying
    Liu, Yibo
    Zhou, Dalin
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (05):
  • [25] Fusion of audio and video information for multi modal person authentication
    Duc, B
    Bigun, ES
    Bigun, J
    Maitre, G
    Fischer, S
    PATTERN RECOGNITION LETTERS, 1997, 18 (09) : 835 - 843
  • [26] Video and audio are images: A cross-modal mixer for original data on video-audio retrieval
    Yuan, Zichen
    Shen, Qi
    Zheng, Bingyi
    Liu, Yuting
    Jiang, Linying
    Guo, Guibing
    KNOWLEDGE-BASED SYSTEMS, 2024, 299
  • [27] GESTURE RECOGNITION USING VIDEO AND FLOOR PRESSURE DATA
    Qian, Gang
    Peng, Bo
    Zhang, Jiqing
    2012 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP 2012), 2012, : 173 - 176
  • [28] Gesture Based Audio/Video Player
    Vadgama, Indrajeet
    Khot, Yash
    Thaker, Yash
    Jouras, Pranali
    Mane, Yogita
    ADVANCES IN OPTICAL SCIENCE AND ENGINEERING, 2017, 194 : 369 - 378
  • [29] Implementation of Multi-modal Speech Emotion Recognition Using Text Data and Audio Signals
    Adesola, Falade
    Adeyinka, Omirinlewo
    Kayode, Akindeji
    Ayodele, Adebiyi
    2023 International Conference on Science, Engineering and Business for Sustainable Development Goals, SEB-SDG 2023, 2023,
  • [30] Static Hand Gesture Recognition from a Video
    Rokade, Rajeshree S.
    Doye, Dharmpal
    INTERNATIONAL CONFERENCE ON GRAPHIC AND IMAGE PROCESSING (ICGIP 2011), 2011, 8285