Multi-modal embeddings using multi-task learning for emotion recognition

被引：8

作者：

Khare, Aparna ^{[1
]}

Parthasarathy, Srinivas ^{[1
]}

Sundaram, Shiva ^{[1
]}

机构：

[1] Amazon Com, Sunnyvale, CA 94089 USA

来源：

INTERSPEECH 2020 | 2020年

关键词：

general embeddings; multi-modal; emotion recognition;

D O I：

10.21437/Interspeech.2020-1827

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

General embeddings like word2vec, GloVe and ELMo have shown a lot of success in natural language tasks. The embeddings are typically extracted from models that are built on general tasks such as skip-gram models and natural language generation. In this paper, we extend the work from natural language understanding to multi-modal architectures that use audio, visual and textual information for machine learning tasks. The embeddings in our network are extracted using the encoder of a transformer model trained using multi-task training. We use person identification and automatic speech recognition as the tasks in our embedding generation framework. We tune and evaluate the embeddings on the downstream task of emotion recognition and demonstrate that on the CMU-MOSEI dataset, the embeddings can be used to improve over previous state of the art results.

引用

页码：384 / 388

页数：5

共 50 条

[21] Meta Multi-task Learning for Speech Emotion Recognition
Cai, Ruichu
Guo, Kaibin
Xu, Boyan
Yang, Xiaoyan
Zhang, Zhenjie
INTERSPEECH 2020, 2020, : 3336 - 3340
[22] Emotion Recognition With Sequential Multi-task Learning Technique
Phan Tran Dac Thinh
Hoang Manh Hung
Yang, Hyung-Jeong
Kim, Soo-Hyung
Lee, Guee-Sang
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3586 - 3589
[23] Speech Emotion Recognition based on Multi-Task Learning
Zhao, Huijuan
Han Zhijie
Wang, Ruchuan
2019 IEEE 5TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD (BIGDATASECURITY) / IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING (HPSC) / IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY (IDS), 2019, : 186 - 188
[24] Speech Emotion Recognition in the Wild using Multi-task and Adversarial Learning
Parry, Jack
DeMattos, Eric
Klementiev, Anita
Ind, Axel
Morse-Kopp, Daniela
Clarke, Georgia
Palaz, Dimitri
INTERSPEECH 2022, 2022, : 1158 - 1162
[25] MultiMAE: Multi-modal Multi-task Masked Autoencoders
Bachmann, Roman
Mizrahi, David
Atanov, Andrei
Zamir, Amir
COMPUTER VISION, ECCV 2022, PT XXXVII, 2022, 13697 : 348 - 367
[26] Improved Accented Speech Recognition Using Accent Embeddings and Multi-task Learning
Jain, Abhinav
Upreti, Minali
Jyothi, Preethi
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2454 - 2458
[27] Multi-Modal Meta Multi-Task Learning For Social Media Rumor Detection
Poornima, R.
Nagavarapu, Sateesh
Navya, Soleti
Katkoori, Arun Kumar
Mohsen, Karrar Shareef
Saikumar, K.
2024 2ND WORLD CONFERENCE ON COMMUNICATION & COMPUTING, WCONF 2024, 2024,
[28] FORWARD DIFFUSION GUIDED RECONSTRUCTION AS A MULTI-MODAL MULTI-TASK LEARNING SCHEME
Sarker, Najibul Haque
Rahman, M. Sohel
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 3180 - 3184
[29] Multi-Modal Meta Multi-Task Learning for Social Media Rumor Detection
Zhang, Huaiwen
Qian, Shengsheng
Fang, Quan
Xu, Changsheng
IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1449 - 1459
[30] Multi-task multi-modal learning for joint diagnosis and prognosis of human cancers
Shao, Wei
Wang, Tongxin
Sun, Liang
Dong, Tianhan
Han, Zhi
Huang, Zhi
Zhang, Jie
Zhang, Daoqiang
Huang, Kun
MEDICAL IMAGE ANALYSIS, 2020, 65 (65)

← 1 2 3 4 5 →