Multi-modal embeddings using multi-task learning for emotion recognition

被引：8

作者：

Khare, Aparna ^{[1
]}

Parthasarathy, Srinivas ^{[1
]}

Sundaram, Shiva ^{[1
]}

机构：

[1] Amazon Com, Sunnyvale, CA 94089 USA

来源：

INTERSPEECH 2020 | 2020年

关键词：

general embeddings; multi-modal; emotion recognition;

D O I：

10.21437/Interspeech.2020-1827

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

General embeddings like word2vec, GloVe and ELMo have shown a lot of success in natural language tasks. The embeddings are typically extracted from models that are built on general tasks such as skip-gram models and natural language generation. In this paper, we extend the work from natural language understanding to multi-modal architectures that use audio, visual and textual information for machine learning tasks. The embeddings in our network are extracted using the encoder of a transformer model trained using multi-task training. We use person identification and automatic speech recognition as the tasks in our embedding generation framework. We tune and evaluate the embeddings on the downstream task of emotion recognition and demonstrate that on the CMU-MOSEI dataset, the embeddings can be used to improve over previous state of the art results.

引用

页码：384 / 388

页数：5

共 50 条

[41] Multi-modal multi-task feature fusion for RGBT tracking
Cai, Yujue
Sui, Xiubao
Gu, Guohua
INFORMATION FUSION, 2023, 97
[42] Fake News Detection in Social Media based on Multi-Modal Multi-Task Learning
Cui, Xinyu
Li, Yang
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (07) : 912 - 918
[43] Multi-Task Federated Split Learning Across Multi-Modal Data with Privacy Preservation
Dong, Yipeng
Luo, Wei
Wang, Xiangyang
Zhang, Lei
Xu, Lin
Zhou, Zehao
Wang, Lulu
SENSORS, 2025, 25 (01)
[44] MmAP : Multi-Modal Alignment Prompt for Cross-Domain Multi-Task Learning
Xin, Yi
Du, Junlong
Wang, Qiang
Yan, Ke
Ding, Shouhong
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 16076 - 16084
[45] MBFusion: Multi-modal balanced fusion and multi-task learning for cancer diagnosis and prognosis
Zhang, Ziye
Yin, Wendong
Wang, Shijin
Zheng, Xiaorou
Dong, Shoubin
Computers in Biology and Medicine, 2024, 181
[46] Align vision-language semantics by multi-task learning for multi-modal summarization
Cui C.
Liang X.
Wu S.
Li Z.
Neural Computing and Applications, 2024, 36 (25) : 15653 - 15666
[47] MTMSN: Multi-Task and Multi-Modal Sequence Network for Facial Action Unit and Expression Recognition
Jin, Yue
Zheng, Tianqing
Gao, Chao
Xu, Guoqiang
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3590 - 3595
[48] Multi-modal emotion recognition using EEG and speech signals
Wang, Qian
Wang, Mou
Yang, Yan
Zhang, Xiaolei
COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 149
[49] Emotion recognition in conversations with emotion shift detection based on multi-task learning
Gao, Qingqing
Cao, Biwei
Guan, Xin
Gu, Tianyun
Bao, Xing
Wu, Junyan
Liu, Bo
Cao, Jiuxin
KNOWLEDGE-BASED SYSTEMS, 2022, 248
[50] Multi-modal Attention for Speech Emotion Recognition
Pan, Zexu
Luo, Zhaojie
Yang, Jichen
Li, Haizhou
INTERSPEECH 2020, 2020, : 364 - 368

← 1 2 3 4 5 →