Hierarchical Fusion for Online Multimodal Dialog Act Classification

被引：0

作者：

Miah, Md Messal Monem ^{[1
]}

Pyarelal, Adarsh ^{[2
]}

Huang, Ruihong ^{[1
]}

机构：

[1] Texas A&M Univ, College Stn, TX 77843 USA

[2] Univ Arizona, Tucson, AZ 85721 USA

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023 | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose a framework for online multimodal dialog act (DA) classification based on raw audio and ASR-generated transcriptions of current and past utterances. Existing multimodal DA classification approaches are limited by ineffective audio modeling and late-stage fusion. We showcase significant improvements in multimodal DA classification by integrating modalities at a more granular level and incorporating recent advancements in large language and audio models for audio feature extraction. We further investigate the effectiveness of self-attention and cross-attention mechanisms in modeling utterances and dialogs for DA classification. We achieve a substantial increase of 3 percentage points in the F1 score relative to current state-of-the-art models on two prominent DA classification datasets, MRDA and EMOTyDA.

引用

页码：7532 / 7545

页数：14

共 50 条

[1] Multimodal Dialog Act Classification for Conversations With Digital Characters
Witzig, Philine
Constantin, Rares
Kovacevic, Nikola
Wampfler, Rafael
PROCEEDINGS OF THE 6TH CONFERENCE ON ACM CONVERSATIONAL USER INTERFACES, CUI 2024, 2024,
[2] Dialog Act Segmentation and Classification in Vietnamese
Luong, Tho Chi
Tran, Oanh Thi
INTELLIGENT COMPUTING, VOL 2, 2022, 507 : 594 - 604
[3] Dialog act classification with the help of prosody
Mast, M
Kompe, R
Harbeck, S
Kiessling, A
Niemann, H
Noth, E
SchukatTalamazzini, EG
Warnke, V
ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1732 - 1735
[4] Multimodal biomedical image retrieval using hierarchical classification and modality fusion
Rahman M.M.
You D.
Simpson M.S.
Antani S.K.
Demner-Fushman D.
Thoma G.R.
International Journal of Multimedia Information Retrieval, 2013, 2 (3) : 159 - 173
[5] Intangible cultural heritage image classification with multimodal attention and hierarchical fusion
Fan, Tao
Wang, Hao
Deng, Sanhong
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 231
[6] A Context-Aware Hierarchical BERT Fusion Network for Multi-turn Dialog Act Detection
Wu, Ting-Wei
Su, Ruolin
Juang, Biing-Hwang
INTERSPEECH 2021, 2021, : 1239 - 1243
[7] Cross-Modal Complementary Network with Hierarchical Fusion for Multimodal Sentiment Classification
Cheng Peng
Chunxia Zhang
Xiaojun Xue
Jiameng Gao
Hongjian Liang
Zhengdong Niu
TsinghuaScienceandTechnology, 2022, 27 (04) : 664 - 679
[8] Cross-modal complementary network with hierarchical fusion for multimodal sentiment classification
Peng, Cheng
Zhang, Chunxia
Xue, Xiaojun
Gao, Jiameng
Liang, Hongjian
Niu, Zhengdong
TSINGHUA SCIENCE AND TECHNOLOGY, 2022, 27 (04) : 664 - 679
[9] Hierarchical Multimodal Metric Learning for Multimodal Classification
Zhang, Heng
Patel, Vishal M.
Chellappa, Rama
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2925 - 2933
[10] A hierarchical approach to multimodal classification
Skowron, A
Wang, H
Wojna, A
Bazan, J
ROUGH SETS, FUZZY SETS, DATA MINING, AND GRANULAR COMPUTING, PT 2, PROCEEDINGS, 2005, 3642 : 119 - 127

← 1 2 3 4 5 →