Hierarchical Fusion for Online Multimodal Dialog Act Classification

被引:0
|
作者
Miah, Md Messal Monem [1 ]
Pyarelal, Adarsh [2 ]
Huang, Ruihong [1 ]
机构
[1] Texas A&M Univ, College Stn, TX 77843 USA
[2] Univ Arizona, Tucson, AZ 85721 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a framework for online multimodal dialog act (DA) classification based on raw audio and ASR-generated transcriptions of current and past utterances. Existing multimodal DA classification approaches are limited by ineffective audio modeling and late-stage fusion. We showcase significant improvements in multimodal DA classification by integrating modalities at a more granular level and incorporating recent advancements in large language and audio models for audio feature extraction. We further investigate the effectiveness of self-attention and cross-attention mechanisms in modeling utterances and dialogs for DA classification. We achieve a substantial increase of 3 percentage points in the F1 score relative to current state-of-the-art models on two prominent DA classification datasets, MRDA and EMOTyDA.
引用
收藏
页码:7532 / 7545
页数:14
相关论文
共 50 条
  • [31] An online algorithm for hierarchical phoneme classification
    Dekel, O
    Keshet, J
    Singer, Y
    MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2005, 3361 : 146 - 158
  • [32] Multimodal Hierarchical CNN Feature Fusion for Stress Detection
    Kuttala, Radhika
    Subramanian, Ramanathan
    Oruganti, Venkata Ramana Murthy
    IEEE ACCESS, 2023, 11 : 6867 - 6878
  • [33] Multimodal Sentiment Analysis Based on Composite Hierarchical Fusion
    Lei, Yu
    Qu, Keshuai
    Zhao, Yifan
    Han, Qing
    Wang, Xuguang
    COMPUTER JOURNAL, 2024, 67 (06): : 2230 - 2245
  • [34] Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis
    Han, Wei
    Chen, Hui
    Poria, Soujanya
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 9180 - 9192
  • [35] A HIERARCHICAL MODEL FOR DIALOG ACT RECOGNITION CONSIDERING ACOUSTIC AND LEXICAL CONTEXT INFORMATION
    Si, Yuke
    Wang, Longbiao
    Dang, Jianwu
    Wu, Mengfei
    Li, Aijun
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7994 - 7998
  • [36] DIALOG ACT CLASSIFICATION USING ACOUSTIC AND DISCOURSE INFORMATION OF MAPTASK DATA
    Julia, Fatema
    Iftekharuddin, Khan
    Islam, Atiq
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2010, 9 (04) : 289 - 311
  • [37] Deep Multimodal Fusion for Surgical Feedback Classification
    Kocielnik, Rafal
    Wong, Elyssa Y.
    Chu, Timothy N.
    Lin, Lydia
    Huang, De-An
    Wang, Jiayun
    Anandkumar, Anima
    Hung, Andrew J.
    MACHINE LEARNING FOR HEALTH, ML4H, VOL 225, 2023, 225 : 256 - 267
  • [38] ECG Heartbeat Classification Using Multimodal Fusion
    Ahmad, Zeeshan
    Tabassum, Anika
    Guan, Ling
    Khan, Naimul Mefraz
    IEEE ACCESS, 2021, 9 : 100615 - 100626
  • [39] Multimodal Keyless Attention Fusion for Video Classification
    Long, Xiang
    Gan, Chuang
    de Melo, Gerard
    Liu, Xiao
    Li, Yandong
    Li, Fu
    Wen, Shilei
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 7202 - 7209
  • [40] Neural-based Context Representation Learning for Dialog Act Classification
    Ortega, Daniel
    Ngoc Thang Vu
    18TH ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE (SIGDIAL 2017), 2017, : 247 - 252