BEATS: Bengali Speech Acts Recognition using Multimodal Attention Fusion

被引:0
|
作者
Deb, Ahana [1 ]
Nag, Sayan [2 ]
Mahapatra, Ayan [1 ]
Chattopadhyay, Soumitri [1 ]
Marik, Aritra [1 ]
Gayen, Pijush Kanti [1 ]
Sanyal, Shankha [1 ]
Banerjee, Archi [3 ]
Karmakar, Samir [1 ]
机构
[1] Jadavpur Univ, Kolkata, India
[2] Univ Toronto, Toronto, ON, Canada
[3] IIT Kharagpur, Kharagpur, W Bengal, India
来源
关键词
speech act; multimodal fusion; transformer; low-resource language; EMOTION; EXPRESSION; FEATURES;
D O I
10.21437/Interspeech.2023-1146
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Spoken languages often utilise intonation, rhythm, intensity, and structure, to communicate intention, which can be interpreted differently depending on the rhythm of speech of their utterance. These speech acts provide the foundation of communication and are unique in expression to the language. Recent advancements in attention-based models, demonstrating their ability to learn powerful representations from multilingual datasets, have performed well in speech tasks and are ideal to model specific tasks in low resource languages. Here, we develop a novel multimodal approach combining two models, wav2vec2.0 for audio and MarianMT for text translation, by using multimodal attention fusion to predict speech acts in our prepared Bengali speech corpus. We also show that our model BeAts (Bengali speech acts recognition using Multimodal Attention Fusion) significantly outperforms both the unimodal baseline using only speech data and a simpler bimodal fusion using both speech and text data. Project page: https://soumitri2001.github.io/BeAts
引用
收藏
页码:3392 / 3396
页数:5
相关论文
共 50 条
  • [1] MSER: Multimodal speech emotion recognition using cross-attention with deep fusion
    Khan, Mustaqeem
    Gueaieb, Wail
    El Saddik, Abdulmotaleb
    Kwon, Soonil
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 245
  • [2] Automatic Speech Recognition of Bengali Using Kaldi
    Guchhait, Subhadeep
    Hans, Arnold Sachith A.
    Augustine, Jacob
    PROCEEDINGS OF SECOND INTERNATIONAL CONFERENCE ON SUSTAINABLE EXPERT SYSTEMS (ICSES 2021), 2022, 351 : 153 - 166
  • [3] CONTINUOUS VISUAL SPEECH RECOGNITION FOR MULTIMODAL FUSION
    Benhaim, Eric
    Sahbi, Hichem
    Vitte, Guillaume
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [4] Bengali Speech Emotion Recognition
    Mohanta, Abhijit
    Sharma, Uzzal
    PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 2812 - 2814
  • [5] Multimodal Speech Emotion Recognition using Cross Attention with Aligned Audio and Text
    Lee, Yoonhyung
    Yoon, Seunghyun
    Jung, Kyomin
    INTERSPEECH 2020, 2020, : 2717 - 2721
  • [6] Speech emotion recognition using multimodal feature fusion with machine learning approach
    Sandeep Kumar Panda
    Ajay Kumar Jena
    Mohit Ranjan Panda
    Susmita Panda
    Multimedia Tools and Applications, 2023, 82 : 42763 - 42781
  • [7] Alzheimer's Dementia Recognition Using Multimodal Fusion of Speech and Text Embeddings
    Pandey, Sandeep Kumar
    Shekhawat, Hanumant Singh
    Bhasin, Shalendar
    Jasuja, Ravi
    Prasanna, S. R. M.
    INTELLIGENT HUMAN COMPUTER INTERACTION, IHCI 2021, 2022, 13184 : 718 - 728
  • [8] Interpretable multimodal emotion recognition using hybrid fusion of speech and image data
    Kumar, Puneet
    Malik, Sarthak
    Raman, Balasubramanian
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (10) : 28373 - 28394
  • [9] Speech emotion recognition using multimodal feature fusion with machine learning approach
    Panda, Sandeep Kumar
    Jena, Ajay Kumar
    Panda, Mohit Ranjan
    Panda, Susmita
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (27) : 42763 - 42781
  • [10] Interpretable multimodal emotion recognition using hybrid fusion of speech and image data
    Puneet Kumar
    Sarthak Malik
    Balasubramanian Raman
    Multimedia Tools and Applications, 2024, 83 : 28373 - 28394