Multi-modal embeddings using multi-task learning for emotion recognition

被引：8

作者：

Khare, Aparna ^{[1
]}

Parthasarathy, Srinivas ^{[1
]}

Sundaram, Shiva ^{[1
]}

机构：

[1] Amazon Com, Sunnyvale, CA 94089 USA

来源：

INTERSPEECH 2020 | 2020年

关键词：

general embeddings; multi-modal; emotion recognition;

D O I：

10.21437/Interspeech.2020-1827

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

General embeddings like word2vec, GloVe and ELMo have shown a lot of success in natural language tasks. The embeddings are typically extracted from models that are built on general tasks such as skip-gram models and natural language generation. In this paper, we extend the work from natural language understanding to multi-modal architectures that use audio, visual and textual information for machine learning tasks. The embeddings in our network are extracted using the encoder of a transformer model trained using multi-task training. We use person identification and automatic speech recognition as the tasks in our embedding generation framework. We tune and evaluate the embeddings on the downstream task of emotion recognition and demonstrate that on the CMU-MOSEI dataset, the embeddings can be used to improve over previous state of the art results.

引用

页码：384 / 388

页数：5

共 50 条

[31] Hierarchical Multi-Task Learning for Diagram Question Answering with Multi-Modal Transformer
Yuan, Zhaoquan
Peng, Xiao
Wu, Xiao
Xu, Changsheng
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1313 - 1321
[32] A Dirty Multi-task Learning Method for Multi-modal Brain Imaging Genetics
Du, Lei
Liu, Fang
Liu, Kefei
Yao, Xiaohui
Risacher, Shannon L.
Han, Junwei
Guo, Lei
Saykin, Andrew J.
Shen, Li
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2019, PT IV, 2019, 11767 : 447 - 455
[33] Facial emotion recognition using multi-modal information
De Silva, LC
Miyasato, T
Nakatsu, R
ICICS - PROCEEDINGS OF 1997 INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS AND SIGNAL PROCESSING, VOLS 1-3: THEME: TRENDS IN INFORMATION SYSTEMS ENGINEERING AND WIRELESS MULTIMEDIA COMMUNICATIONS, 1997, : 397 - 401
[34] Tailor Versatile Multi-Modal Learning for Multi-Label Emotion Recognition
Zhang, Yi
Chen, Mingyuan
Shen, Jundong
Wang, Chongjun
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 9100 - 9108
[35] A multi-modal deep learning system for Arabic emotion recognition
Abu Shaqra F.
Duwairi R.
Al-Ayyoub M.
International Journal of Speech Technology, 2023, 26 (01) : 123 - 139
[36] Multi-task Learning using Multi-modal Encoder-Decoder Networks with Shared Skip Connections
Kuga, Ryohei
Kanezaki, Asako
Samejima, Masaki
Sugano, Yusuke
Matsushita, Yasuyuki
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 403 - 411
[37] Speech Emotion Recognition using Decomposed Speech via Multi-task Learning
Hsu, Jia-Hao
Wu, Chung-Hsien
Wei, Yu-Hung
INTERSPEECH 2023, 2023, : 4553 - 4557
[38] MMER: Multimodal Multi-task Learning for Speech Emotion Recognition
Ghosh, Sreyan
Tyagi, Utkarsh
Ramaneswaran, S.
Srivastava, Harshvardhan
Manocha, Dinesh
INTERSPEECH 2023, 2023, : 1209 - 1213
[39] A Multi-modal Multi-task based Approach for Movie Recommendation
Raj, Subham
Mondal, Prabir
Chakder, Daipayan
Saha, Sriparna
Onoe, Naoyuki
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[40] Multi-task Multi-modal Models for Collective Anomaly Detection
Ide, Tsuyoshi
Phan, Dzung T.
Kalagnanam, Jayant
2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2017, : 177 - 186

← 1 2 3 4 5 →