Hierarchical Fusion for Online Multimodal Dialog Act Classification

被引:0
|
作者
Miah, Md Messal Monem [1 ]
Pyarelal, Adarsh [2 ]
Huang, Ruihong [1 ]
机构
[1] Texas A&M Univ, College Stn, TX 77843 USA
[2] Univ Arizona, Tucson, AZ 85721 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a framework for online multimodal dialog act (DA) classification based on raw audio and ASR-generated transcriptions of current and past utterances. Existing multimodal DA classification approaches are limited by ineffective audio modeling and late-stage fusion. We showcase significant improvements in multimodal DA classification by integrating modalities at a more granular level and incorporating recent advancements in large language and audio models for audio feature extraction. We further investigate the effectiveness of self-attention and cross-attention mechanisms in modeling utterances and dialogs for DA classification. We achieve a substantial increase of 3 percentage points in the F1 score relative to current state-of-the-art models on two prominent DA classification datasets, MRDA and EMOTyDA.
引用
收藏
页码:7532 / 7545
页数:14
相关论文
共 50 条
  • [21] Multimodal skin lesion classification in dermoscopy and clinical images using a hierarchical attention fusion network
    He, X.
    Wang, Y.
    Zhao, S.
    Chen, X.
    JOURNAL OF INVESTIGATIVE DERMATOLOGY, 2021, 141 (05) : S52 - S52
  • [22] Fusion with Hierarchical Graphs for Multimodal Emotion Recognition
    Tang, Shuyun
    Luo, Zhaojie
    Nan, Guoshun
    Baba, Jun
    Yoshikawa, Yuichiro
    Ishiguro, Hiroshi
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1288 - 1296
  • [23] Hierarchical Multi-Label Dialog Act Recognition on Spanish Data
    Ribeiro, Eugenio
    Ribeiro, Ricardo
    de Matos, David Martins
    LINGUAMATICA, 2019, 11 (01): : 17 - 40
  • [24] Incorporation of Contextual Information into BERT for Dialog Act Classification in Japanese
    Katada, Shun
    Shirai, Kiyoaki
    Okada, Shogo
    16th International Joint Symposium on Artificial Intelligence and Natural Language Processing, iSAI-NLP 2021, 2021,
  • [25] Incorporation of Contextual Information into BERT for Dialog Act Classification in Japanese
    Katada, Shun
    Shirai, Kiyoaki
    Okada, Shogo
    16TH INTERNATIONAL JOINT SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE PROCESSING (ISAI-NLP 2021), 2021,
  • [26] Multimodal Co-Attention Fusion Network With Online Data Augmentation for Cancer Subtype Classification
    Ding, Saisai
    Li, Juncheng
    Wang, Jun
    Ying, Shihui
    Shi, Jun
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (11) : 3977 - 3989
  • [27] Hierarchical Bayesian classification of multimodal medical images
    Mardia, KV
    Hainsworth, TJ
    Kirkbride, J
    Hurn, MA
    Berry, E
    PROCEEDINGS OF THE IEEE WORKSHOP ON MATHEMATICAL METHODS IN BIOMEDICAL IMAGE ANALYSIS, 1996, : 53 - 63
  • [28] Multimodal Hierarchical Reinforcement Learning Policy for Task-Oriented Visual Dialog
    Zhang, Jiaping
    Zhao, Tiancheng
    Yu, Zhou
    19TH ANNUAL MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE (SIGDIAL 2018), 2018, : 140 - 150
  • [29] Modeling the intonation of discourse segments for improved online dialog act tagging
    Sridhar, Vivek Kumar Rangarajan
    Narayanan, Shrikanth
    Bangalore, Srinivas
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 5033 - +
  • [30] A hierarchical fuzzy classification of online customers
    Werro, Nicolas
    Stormer, Henrik
    Meier, Andreas
    ICEBE 2006: IEEE INTERNATIONAL CONFERENCE ON E-BUSINESS ENGINEERING, PROCEEDINGS, 2006, : 256 - +