Hierarchical Fusion for Online Multimodal Dialog Act Classification

被引:0
|
作者
Miah, Md Messal Monem [1 ]
Pyarelal, Adarsh [2 ]
Huang, Ruihong [1 ]
机构
[1] Texas A&M Univ, College Stn, TX 77843 USA
[2] Univ Arizona, Tucson, AZ 85721 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a framework for online multimodal dialog act (DA) classification based on raw audio and ASR-generated transcriptions of current and past utterances. Existing multimodal DA classification approaches are limited by ineffective audio modeling and late-stage fusion. We showcase significant improvements in multimodal DA classification by integrating modalities at a more granular level and incorporating recent advancements in large language and audio models for audio feature extraction. We further investigate the effectiveness of self-attention and cross-attention mechanisms in modeling utterances and dialogs for DA classification. We achieve a substantial increase of 3 percentage points in the F1 score relative to current state-of-the-art models on two prominent DA classification datasets, MRDA and EMOTyDA.
引用
收藏
页码:7532 / 7545
页数:14
相关论文
共 50 条
  • [41] CLASSIFICATION OF BREAST CANCER IN MRI WITH MULTIMODAL FUSION
    Morais, Margarida
    Calisto, Francisco Maria
    Santiago, Carlos
    Aleluia, Clara
    Nascimento, Jacinto C.
    2023 IEEE 20TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI, 2023,
  • [42] Colposcopic multimodal fusion for the classification of cervical lesions
    Fan, Yinuo
    Ma, Huizhan
    Fu, Yuanbin
    Liang, Xiaoyun
    Yu, Hui
    Liu, Yuzhen
    PHYSICS IN MEDICINE AND BIOLOGY, 2022, 67 (13):
  • [43] Interpretation on Deep Multimodal Fusion for Diagnostic Classification
    Xin, Bowen
    Huang, Jing
    Zhou, Yun
    Lu, Jie
    Wang, Xiuying
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [44] Emotion Classification of Text Based Conversations Through Dialog Act Modeling
    Gorer, Binnur
    2014 22ND SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2014, : 2221 - 2224
  • [45] OPTIMIZING NEURAL NETWORK HYPERPARAMETERS WITH GAUSSIAN PROCESSES FOR DIALOG ACT CLASSIFICATION
    Dernoncourt, Franck
    Lee, Ji Young
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 406 - 413
  • [46] Dialog Act Classification using Acoustic and Discourse Information of MapTask Data
    Julia, Fatema N.
    Iftekharuddin, Khan M.
    2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 1472 - 1479
  • [47] Graph-based multimodal fusion with metric learning for multimodal classification
    Angelou, Michalis
    Solachidis, Vassilis
    Vretos, Nicholas
    Daras, Petros
    PATTERN RECOGNITION, 2019, 95 : 296 - 307
  • [48] FUSION - An Online Method for Multistream Classification
    Haque, Ahsanul
    Wang, Zhuoyi
    Chandra, Swarup
    Dong, Bo
    Khan, Latifur
    Hamlen, Kevin W.
    CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 919 - 928
  • [49] Combining lexical, syntactic and prosodic cues for improved online dialog act tagging
    Sridhar, Vivek Kumar Rangarajan
    Bangalore, Srinivas
    Narayanan, Shrikanth
    COMPUTER SPEECH AND LANGUAGE, 2009, 23 (04): : 407 - 422
  • [50] Multimodal Physiological Signals Fusion for Online Emotion Recognition
    Pan, Tongjie
    Ye, Yalan
    Cai, Hecheng
    Huang, Shudong
    Yang, Yang
    Wang, Guoqing
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5879 - 5888