BEATS: Bengali Speech Acts Recognition using Multimodal Attention Fusion

被引:0
|
作者
Deb, Ahana [1 ]
Nag, Sayan [2 ]
Mahapatra, Ayan [1 ]
Chattopadhyay, Soumitri [1 ]
Marik, Aritra [1 ]
Gayen, Pijush Kanti [1 ]
Sanyal, Shankha [1 ]
Banerjee, Archi [3 ]
Karmakar, Samir [1 ]
机构
[1] Jadavpur Univ, Kolkata, India
[2] Univ Toronto, Toronto, ON, Canada
[3] IIT Kharagpur, Kharagpur, W Bengal, India
来源
关键词
speech act; multimodal fusion; transformer; low-resource language; EMOTION; EXPRESSION; FEATURES;
D O I
10.21437/Interspeech.2023-1146
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Spoken languages often utilise intonation, rhythm, intensity, and structure, to communicate intention, which can be interpreted differently depending on the rhythm of speech of their utterance. These speech acts provide the foundation of communication and are unique in expression to the language. Recent advancements in attention-based models, demonstrating their ability to learn powerful representations from multilingual datasets, have performed well in speech tasks and are ideal to model specific tasks in low resource languages. Here, we develop a novel multimodal approach combining two models, wav2vec2.0 for audio and MarianMT for text translation, by using multimodal attention fusion to predict speech acts in our prepared Bengali speech corpus. We also show that our model BeAts (Bengali speech acts recognition using Multimodal Attention Fusion) significantly outperforms both the unimodal baseline using only speech data and a simpler bimodal fusion using both speech and text data. Project page: https://soumitri2001.github.io/BeAts
引用
收藏
页码:3392 / 3396
页数:5
相关论文
共 50 条
  • [21] Recognition of English Vowels in Isolated Speech using Characteristics of Bengali Accent
    Toma, Tanjin Taher
    Rubaiyat, Abu Hasnat Md.
    Huq, A. H. M. Asadul
    2013 2ND INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRICAL ENGINEERING (ICAEE 2013), 2013, : 405 - +
  • [22] A Speech Recognition System for Bengali Language using Recurrent Neural Network
    Islam, Jahirul
    Mubassira, Masiath
    Islam, Md. Rakibul
    Das, Amit Kumar
    2019 IEEE 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION SYSTEMS (ICCCS 2019), 2019, : 73 - 76
  • [23] MULTIMODAL ATTENTION MERGING FOR IMPROVED SPEECH RECOGNITION AND AUDIO EVENT CLASSIFICATION
    Sundar, Anirudh S.
    Yang, Chao-Han Huck
    Chan, David M.
    Ghosh, Shalini
    Ravichandran, Venkatesh
    Nidadavolu, Phani Sankar
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 655 - 659
  • [24] Advancing classroom fatigue recognition: A multimodal fusion approach using self-attention mechanism
    Cao, Lei
    Wang, Wenrong
    Dong, Yilin
    Fan, Chunjiang
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 89
  • [25] Multimodal Fusion Framework Based on Statistical Attention and Contrastive Attention for Sign Language Recognition
    Zhang, Jiangtao
    Wang, Qingshan
    Wang, Qi
    Zheng, Zhiwen
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (02) : 1431 - 1443
  • [26] Improve Accuracy of Speech Emotion Recognition with Attention Head Fusion
    Xu, Mingke
    Zhang, Fan
    Khan, Samee U.
    2020 10TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2020, : 1058 - 1064
  • [27] A BCI system for imagined Bengali speech recognition
    Hossain, Arman
    Das, Kathak
    Khan, Protima
    Kader, Md. Fazlul
    MACHINE LEARNING WITH APPLICATIONS, 2023, 13
  • [28] Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition
    Papandreou, George
    Katsamanis, Athanassios
    Pitsikalis, Vassilis
    Maragos, Petros
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (03): : 423 - 435
  • [29] MFGCN: Multimodal fusion graph convolutional network for speech emotion recognition
    Qi, Xin
    Wen, Yujun
    Zhang, Pengzhou
    Huang, Heyan
    NEUROCOMPUTING, 2025, 611
  • [30] A Multimodal Fusion Model Based on Hybrid Attention Mechanism for Gesture Recognition
    Li, Yajie
    Chen, Yiqiang
    Gu, Yang
    Ouyang, Jianquan
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, S+SSPR 2020, 2021, 12644 : 302 - 312