Hierarchical Fusion for Online Multimodal Dialog Act Classification

被引:0
|
作者
Miah, Md Messal Monem [1 ]
Pyarelal, Adarsh [2 ]
Huang, Ruihong [1 ]
机构
[1] Texas A&M Univ, College Stn, TX 77843 USA
[2] Univ Arizona, Tucson, AZ 85721 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a framework for online multimodal dialog act (DA) classification based on raw audio and ASR-generated transcriptions of current and past utterances. Existing multimodal DA classification approaches are limited by ineffective audio modeling and late-stage fusion. We showcase significant improvements in multimodal DA classification by integrating modalities at a more granular level and incorporating recent advancements in large language and audio models for audio feature extraction. We further investigate the effectiveness of self-attention and cross-attention mechanisms in modeling utterances and dialogs for DA classification. We achieve a substantial increase of 3 percentage points in the F1 score relative to current state-of-the-art models on two prominent DA classification datasets, MRDA and EMOTyDA.
引用
收藏
页码:7532 / 7545
页数:14
相关论文
共 50 条
  • [1] Multimodal Dialog Act Classification for Conversations With Digital Characters
    Witzig, Philine
    Constantin, Rares
    Kovacevic, Nikola
    Wampfler, Rafael
    PROCEEDINGS OF THE 6TH CONFERENCE ON ACM CONVERSATIONAL USER INTERFACES, CUI 2024, 2024,
  • [2] Dialog Act Segmentation and Classification in Vietnamese
    Luong, Tho Chi
    Tran, Oanh Thi
    INTELLIGENT COMPUTING, VOL 2, 2022, 507 : 594 - 604
  • [3] Dialog act classification with the help of prosody
    Mast, M
    Kompe, R
    Harbeck, S
    Kiessling, A
    Niemann, H
    Noth, E
    SchukatTalamazzini, EG
    Warnke, V
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1732 - 1735
  • [4] Multimodal biomedical image retrieval using hierarchical classification and modality fusion
    Rahman M.M.
    You D.
    Simpson M.S.
    Antani S.K.
    Demner-Fushman D.
    Thoma G.R.
    International Journal of Multimedia Information Retrieval, 2013, 2 (3) : 159 - 173
  • [5] Intangible cultural heritage image classification with multimodal attention and hierarchical fusion
    Fan, Tao
    Wang, Hao
    Deng, Sanhong
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 231
  • [6] A Context-Aware Hierarchical BERT Fusion Network for Multi-turn Dialog Act Detection
    Wu, Ting-Wei
    Su, Ruolin
    Juang, Biing-Hwang
    INTERSPEECH 2021, 2021, : 1239 - 1243
  • [7] Cross-Modal Complementary Network with Hierarchical Fusion for Multimodal Sentiment Classification
    Cheng Peng
    Chunxia Zhang
    Xiaojun Xue
    Jiameng Gao
    Hongjian Liang
    Zhengdong Niu
    TsinghuaScienceandTechnology, 2022, 27 (04) : 664 - 679
  • [8] Cross-modal complementary network with hierarchical fusion for multimodal sentiment classification
    Peng, Cheng
    Zhang, Chunxia
    Xue, Xiaojun
    Gao, Jiameng
    Liang, Hongjian
    Niu, Zhengdong
    TSINGHUA SCIENCE AND TECHNOLOGY, 2022, 27 (04) : 664 - 679
  • [9] Hierarchical Multimodal Metric Learning for Multimodal Classification
    Zhang, Heng
    Patel, Vishal M.
    Chellappa, Rama
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2925 - 2933
  • [10] A hierarchical approach to multimodal classification
    Skowron, A
    Wang, H
    Wojna, A
    Bazan, J
    ROUGH SETS, FUZZY SETS, DATA MINING, AND GRANULAR COMPUTING, PT 2, PROCEEDINGS, 2005, 3642 : 119 - 127