Multi-modal embeddings using multi-task learning for emotion recognition

被引:8
|
作者
Khare, Aparna [1 ]
Parthasarathy, Srinivas [1 ]
Sundaram, Shiva [1 ]
机构
[1] Amazon Com, Sunnyvale, CA 94089 USA
来源
关键词
general embeddings; multi-modal; emotion recognition;
D O I
10.21437/Interspeech.2020-1827
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
General embeddings like word2vec, GloVe and ELMo have shown a lot of success in natural language tasks. The embeddings are typically extracted from models that are built on general tasks such as skip-gram models and natural language generation. In this paper, we extend the work from natural language understanding to multi-modal architectures that use audio, visual and textual information for machine learning tasks. The embeddings in our network are extracted using the encoder of a transformer model trained using multi-task training. We use person identification and automatic speech recognition as the tasks in our embedding generation framework. We tune and evaluate the embeddings on the downstream task of emotion recognition and demonstrate that on the CMU-MOSEI dataset, the embeddings can be used to improve over previous state of the art results.
引用
收藏
页码:384 / 388
页数:5
相关论文
共 50 条
  • [21] Meta Multi-task Learning for Speech Emotion Recognition
    Cai, Ruichu
    Guo, Kaibin
    Xu, Boyan
    Yang, Xiaoyan
    Zhang, Zhenjie
    INTERSPEECH 2020, 2020, : 3336 - 3340
  • [22] Emotion Recognition With Sequential Multi-task Learning Technique
    Phan Tran Dac Thinh
    Hoang Manh Hung
    Yang, Hyung-Jeong
    Kim, Soo-Hyung
    Lee, Guee-Sang
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3586 - 3589
  • [23] Speech Emotion Recognition based on Multi-Task Learning
    Zhao, Huijuan
    Han Zhijie
    Wang, Ruchuan
    2019 IEEE 5TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD (BIGDATASECURITY) / IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING (HPSC) / IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY (IDS), 2019, : 186 - 188
  • [24] Speech Emotion Recognition in the Wild using Multi-task and Adversarial Learning
    Parry, Jack
    DeMattos, Eric
    Klementiev, Anita
    Ind, Axel
    Morse-Kopp, Daniela
    Clarke, Georgia
    Palaz, Dimitri
    INTERSPEECH 2022, 2022, : 1158 - 1162
  • [25] MultiMAE: Multi-modal Multi-task Masked Autoencoders
    Bachmann, Roman
    Mizrahi, David
    Atanov, Andrei
    Zamir, Amir
    COMPUTER VISION, ECCV 2022, PT XXXVII, 2022, 13697 : 348 - 367
  • [26] Improved Accented Speech Recognition Using Accent Embeddings and Multi-task Learning
    Jain, Abhinav
    Upreti, Minali
    Jyothi, Preethi
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2454 - 2458
  • [27] Multi-Modal Meta Multi-Task Learning For Social Media Rumor Detection
    Poornima, R.
    Nagavarapu, Sateesh
    Navya, Soleti
    Katkoori, Arun Kumar
    Mohsen, Karrar Shareef
    Saikumar, K.
    2024 2ND WORLD CONFERENCE ON COMMUNICATION & COMPUTING, WCONF 2024, 2024,
  • [28] FORWARD DIFFUSION GUIDED RECONSTRUCTION AS A MULTI-MODAL MULTI-TASK LEARNING SCHEME
    Sarker, Najibul Haque
    Rahman, M. Sohel
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 3180 - 3184
  • [29] Multi-Modal Meta Multi-Task Learning for Social Media Rumor Detection
    Zhang, Huaiwen
    Qian, Shengsheng
    Fang, Quan
    Xu, Changsheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1449 - 1459
  • [30] Multi-task multi-modal learning for joint diagnosis and prognosis of human cancers
    Shao, Wei
    Wang, Tongxin
    Sun, Liang
    Dong, Tianhan
    Han, Zhi
    Huang, Zhi
    Zhang, Jie
    Zhang, Daoqiang
    Huang, Kun
    MEDICAL IMAGE ANALYSIS, 2020, 65 (65)