Multi-modal embeddings using multi-task learning for emotion recognition

被引:8
|
作者
Khare, Aparna [1 ]
Parthasarathy, Srinivas [1 ]
Sundaram, Shiva [1 ]
机构
[1] Amazon Com, Sunnyvale, CA 94089 USA
来源
关键词
general embeddings; multi-modal; emotion recognition;
D O I
10.21437/Interspeech.2020-1827
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
General embeddings like word2vec, GloVe and ELMo have shown a lot of success in natural language tasks. The embeddings are typically extracted from models that are built on general tasks such as skip-gram models and natural language generation. In this paper, we extend the work from natural language understanding to multi-modal architectures that use audio, visual and textual information for machine learning tasks. The embeddings in our network are extracted using the encoder of a transformer model trained using multi-task training. We use person identification and automatic speech recognition as the tasks in our embedding generation framework. We tune and evaluate the embeddings on the downstream task of emotion recognition and demonstrate that on the CMU-MOSEI dataset, the embeddings can be used to improve over previous state of the art results.
引用
收藏
页码:384 / 388
页数:5
相关论文
共 50 条
  • [41] Multi-modal multi-task feature fusion for RGBT tracking
    Cai, Yujue
    Sui, Xiubao
    Gu, Guohua
    INFORMATION FUSION, 2023, 97
  • [42] Fake News Detection in Social Media based on Multi-Modal Multi-Task Learning
    Cui, Xinyu
    Li, Yang
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (07) : 912 - 918
  • [43] Multi-Task Federated Split Learning Across Multi-Modal Data with Privacy Preservation
    Dong, Yipeng
    Luo, Wei
    Wang, Xiangyang
    Zhang, Lei
    Xu, Lin
    Zhou, Zehao
    Wang, Lulu
    SENSORS, 2025, 25 (01)
  • [44] MmAP : Multi-Modal Alignment Prompt for Cross-Domain Multi-Task Learning
    Xin, Yi
    Du, Junlong
    Wang, Qiang
    Yan, Ke
    Ding, Shouhong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 16076 - 16084
  • [45] MBFusion: Multi-modal balanced fusion and multi-task learning for cancer diagnosis and prognosis
    Zhang, Ziye
    Yin, Wendong
    Wang, Shijin
    Zheng, Xiaorou
    Dong, Shoubin
    Computers in Biology and Medicine, 2024, 181
  • [46] Align vision-language semantics by multi-task learning for multi-modal summarization
    Cui C.
    Liang X.
    Wu S.
    Li Z.
    Neural Computing and Applications, 2024, 36 (25) : 15653 - 15666
  • [47] MTMSN: Multi-Task and Multi-Modal Sequence Network for Facial Action Unit and Expression Recognition
    Jin, Yue
    Zheng, Tianqing
    Gao, Chao
    Xu, Guoqiang
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3590 - 3595
  • [48] Multi-modal emotion recognition using EEG and speech signals
    Wang, Qian
    Wang, Mou
    Yang, Yan
    Zhang, Xiaolei
    COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 149
  • [49] Emotion recognition in conversations with emotion shift detection based on multi-task learning
    Gao, Qingqing
    Cao, Biwei
    Guan, Xin
    Gu, Tianyun
    Bao, Xing
    Wu, Junyan
    Liu, Bo
    Cao, Jiuxin
    KNOWLEDGE-BASED SYSTEMS, 2022, 248
  • [50] Multi-modal Attention for Speech Emotion Recognition
    Pan, Zexu
    Luo, Zhaojie
    Yang, Jichen
    Li, Haizhou
    INTERSPEECH 2020, 2020, : 364 - 368