Hierarchical Transformer Network for Utterance-Level Emotion Recognition

被引:12
|
作者
Li, Qingbiao [1 ]
Wu, Chunhua [1 ]
Wang, Zhe [1 ]
Zheng, Kangfeng [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Cyberspace Secur, Beijing 100876, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 13期
关键词
emotion recognition; text classification; dialog; transformer; pretrained model;
D O I
10.3390/app10134447
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
While there have been significant advances in detecting emotions in text, in the field of utterance-level emotion recognition (ULER), there are still many problems to be solved. In this paper, we address some challenges in ULER in dialog systems. (1) The same utterance can deliver different emotions when it is in different contexts. (2) Long-range contextual information is hard to effectively capture. (3) Unlike the traditional text classification problem, for most datasets of this task, they contain inadequate conversations or speech. (4) To better model the emotional interaction between speakers, speaker information is necessary. To address the problems of (1) and (2), we propose a hierarchical transformer framework (apart from the description of other studies, the "transformer" in this paper usually refers to the encoder part of the transformer) with a lower-level transformer to model the word-level input and an upper-level transformer to capture the context of utterance-level embeddings. For problem (3), we use bidirectional encoder representations from transformers (BERT), a pretrained language model, as the lower-level transformer, which is equivalent to introducing external data into the model and solves the problem of data shortage to some extent. For problem (4), we add speaker embeddings to the model for the first time, which enables our model to capture the interaction between speakers. Experiments on three dialog emotion datasets, Friends, EmotionPush, and EmoryNLP, demonstrate that our proposed hierarchical transformer network models obtain competitive results compared with the state-of-the-art methods in terms of the macro-averaged F1-score (macro-F1).
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Utterance-level Normalization for Relative Articulation Rate Analysis
    Saarni, Tuomo
    Hakokari, Jussi
    Isoaho, Jouni
    Salakoski, Tapio
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 538 - +
  • [22] Utterance-level adverbs: How to define and subclassify them
    Molinier, Christian
    LANGUE FRANCAISE, 2009, (161): : 9 - +
  • [23] H-VECTORS: Improving the robustness in utterance-level speaker embeddings using a hierarchical attention model
    Shi, Yanpei
    Huang, Qiang
    Hain, Thomas
    NEURAL NETWORKS, 2021, 142 (142) : 329 - 339
  • [24] EMOTION CLASSIFICATION VIA UTTERANCE-LEVEL DYNAMICS: A PATTERN-BASED APPROACH TO CHARACTERIZING AFFECTIVE EXPRESSIONS
    Kim, Yelin
    Provost, Emily Mower
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 3677 - 3681
  • [25] LIGHTLY-SUPERVISED UTTERANCE-LEVEL EMOTION IDENTIFICATION USING LATENT TOPIC MODELING OF MULTIMODAL WORDS
    Yang, Zhaojun
    Narayanan, Shrikanth
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 2767 - 2771
  • [26] Utterance and Syllable Level Prosodic Features for Automatic Emotion Recognition
    Ben Alex, Starlet
    Babu, Ben P.
    Mary, Leena
    2018 IEEE RECENT ADVANCES IN INTELLIGENT COMPUTATIONAL SYSTEMS (RAICS), 2018, : 31 - 35
  • [27] Dynamic classifier combination in hybrid speech recognition systems using utterance-level confidence values
    Kirchhoff, K
    Bilmes, JA
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 693 - 696
  • [28] Utterance-Level Word Class Characteristics in Normal Elderly Women
    Kim, Jiyoung
    Seo, Sangkyu
    Cho, Sung-Rae
    Kim, HyangHee
    COMMUNICATION SCIENCES AND DISORDERS-CSD, 2014, 19 (03): : 265 - 273
  • [29] Training Utterance-level Embedding Networks for Speaker Identification and Verification
    Park, Heewoong
    Cho, Sukhyun
    Park, Kyubyong
    Kim, Namju
    Park, Jonghun
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3563 - 3567
  • [30] CTNet: Conversational Transformer Network for Emotion Recognition
    Lian, Zheng
    Liu, Bin
    Tao, Jianhua
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 985 - 1000