A Multi Modal Approach to Gesture Recognition from Audio and Video Data

被引:9
|
作者
Bayer, Immanuel [1 ]
Silbermann, Thierry [1 ]
机构
[1] Univ Konstanz, D-78457 Constance, Germany
关键词
Multi-modal interaction; speech and gesture recognition; fusion;
D O I
10.1145/2522848.2532592
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We describe in this paper our approach for the Multi-modal gesture recognition challenge organized by ChaLearn in conjunction with the ICMI 2013 conference. The competition's task was to learn a vocabulary of 20 types of Italian gestures performed from different persons and to detect them in sequences. We develop an algorithm to find the gesture intervals in the audio data, extract audio features from those intervals and train two different models. We engineer features from the skeleton data and use the gesture intervals in the training data to train a model that we afterwards apply to the test sequences using a sliding window. We combine the models through weighted averaging. We find that this way to combine information from two different sources boosts the models performance significantly.
引用
收藏
页码:461 / 465
页数:5
相关论文
共 50 条
  • [31] Multi-Modal Multi-Action Video Recognition
    Shi, Zhensheng
    Liang, Ju
    Li, Qianqian
    Zheng, Haiyong
    Gu, Zhaorui
    Dong, Junyu
    Zheng, Bing
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13658 - 13667
  • [32] A Multi-Modal Egocentric Activity Recognition Approach towards Video Domain Generalization
    Papadakis, Antonios
    Spyrou, Evaggelos
    SENSORS, 2024, 24 (08)
  • [33] Analysis of Deep Fusion Strategies for Multi-modal Gesture Recognition
    Roitberg, Alina
    Pollert, Tim
    Haurilet, Monica
    Martin, Manuel
    Stiefelhagen, Rainer
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 198 - 206
  • [34] Convolutional Transformer Fusion Blocks for Multi-Modal Gesture Recognition
    Hampiholi, Basavaraj
    Jarvers, Christian
    Mader, Wolfgang
    Neumann, Heiko
    IEEE ACCESS, 2023, 11 : 34094 - 34103
  • [35] Bayesian Co-Boosting for Multi-modal Gesture Recognition
    Wu, Jiaxiang
    Cheng, Jian
    JOURNAL OF MACHINE LEARNING RESEARCH, 2014, 15 : 3013 - 3036
  • [36] Multi-modal Gesture Recognition Challenge 2013: Dataset and Results
    Escalera, Sergio
    Gonzalez, Jordi
    Baro, Xavier
    Reyes, Miguel
    Lopes, Oscar
    Guyon, Isabelle
    Athitsos, Vassilis
    Escalante, Hugo J.
    ICMI'13: PROCEEDINGS OF THE 2013 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2013, : 445 - 452
  • [37] Ubiquitous Emotion Recognition Using Audio and Video Data
    Jannat, Rahatul
    Tynes, Iyonna
    LaLime, Lott
    Adorno, Juan
    Canavan, Shaun
    PROCEEDINGS OF THE 2018 ACM INTERNATIONAL JOINT CONFERENCE ON PERVASIVE AND UBIQUITOUS COMPUTING AND PROCEEDINGS OF THE 2018 ACM INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTERS (UBICOMP/ISWC'18 ADJUNCT), 2018, : 956 - 959
  • [38] Bayesian co-boosting for multi-modal gesture recognition
    Wu, Jiaxiang
    Cheng, Jian
    Journal of Machine Learning Research, 2014, 15 : 3013 - 3036
  • [39] Multi Speaker Detection and Tracking using Audio and Video Sensors with Gesture Analysis
    Hariharan, Balaji
    Hari, S.
    Gopalakrishnan, Uma
    2013 TENTH INTERNATIONAL CONFERENCE ON WIRELESS AND OPTICAL COMMUNICATIONS NETWORKS (WOCN), 2013,
  • [40] Multi-modal Laughter Recognition in Video Conversations
    Escalera, Sergio
    Puertas, Eloi
    Radeva, Petia
    Pujol, Oriol
    2009 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPR WORKSHOPS 2009), VOLS 1 AND 2, 2009, : 869 - 874