Language Model Adaptation for Emotional Speech Recognition using Tweet data

被引:0
|
作者
Saeki, Kazuya [1 ]
Kato, Masaharu [1 ]
Kosaka, Tetsuo [1 ]
机构
[1] Yamagata Univ, Grad Sch Sci & Engn, Yonezawa, Yamagata, Japan
基金
日本学术振兴会;
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Generally, emotional speech recognition is considered more difficult than non-emotional speech recognition. This is because the acoustic features of emotional speech are different from those of non-emotional speech, and these features vary greatly depending on the emotion type and intensity. In addition, it is difficult to recognize colloquial expressions included in emotional utterances using a language model trained on a corpus such as lecture speech. We have been studying emotional speech recognition for an emotional speech corpus, Japanese Twitter-based emotional speech (JTES). This corpus consists of tweets on Twitter with an emotional label assigned to each sentence. In this study, we aim to improve the performance of emotional speech recognition for the JTES through language model adaptation, which will require a text corpus containing emotional expressions and colloquial expressions. However, there is no such large-scale Japanese corpus. To solve this problem, we propose a language model adaptation using tweet data. Expectedly, tweet data contains many emotional and colloquial expressions. The sentences used for adaptation were extracted from the collected tweet data based on some rules. Following filtering based on these specified rules, a large amount of tweet data of 25.86M words could be obtained. In the recognition experiments, the baseline word error rate was 36.11%, whereas that of the language model adaptation was 25.68%. In addition, that of the combined use of the acoustic model adaptation and language model adaptation was 17.77%. These results established the effectiveness of the proposed method.
引用
收藏
页码:371 / 375
页数:5
相关论文
共 50 条
  • [1] Simultaneous Adaptation of Acoustic and Language Models for Emotional Speech Recognition Using Tweet Data
    Kosaka, Tetsuo
    Saeki, Kazuya
    Aizawa, Yoshitaka
    Kato, Masaharu
    Nose, Takashi
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (03) : 363 - 373
  • [2] Unsupervised Language Model Adaptation by Data Selection for Speech Recognition
    Khassanov, Yerbolat
    Chong, Tze Yuang
    Bigot, Benjamin
    Chng, Eng Siong
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2017, PT I, 2017, 10191 : 508 - 517
  • [3] Language model adaptation in speech recognition using document maps
    Lagus, K
    Kurimo, M
    [J]. NEURAL NETWORKS FOR SIGNAL PROCESSING XII, PROCEEDINGS, 2002, : 627 - 636
  • [4] Acoustic Model Adaptation for Emotional Speech Recognition Using Twitter-Based Emotional Speech Corpus
    Kosaka, Tetsuo
    Aizawa, Yoshitaka
    Kato, Masaharu
    Nose, Takashi
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1747 - 1751
  • [5] Factored Language Model Adaptation Using Dirichlet Class Language Model for Speech Recognition
    Hatami, Ali
    Akbari, Ahmad
    Nasersharif, Babak
    [J]. 2013 5TH CONFERENCE ON INFORMATION AND KNOWLEDGE TECHNOLOGY (IKT), 2013, : 438 - 442
  • [6] Boosting of speech recognition performance by language model adaptation
    Korkmazsky, Filipp
    Jojic, Oliver
    Shevade, Bageshree
    [J]. 2007 IEEE AEROSPACE CONFERENCE, VOLS 1-9, 2007, : 1592 - 1601
  • [7] STATISTICAL LANGUAGE MODEL ADAPTATION FOR ESTONIAN SPEECH RECOGNITION
    Alumaee, Tanel
    [J]. EESTI RAKENDUSLINGVISTIKA UHINGU AASTARAAMAT, 2008, 4 : 5 - 16
  • [8] Dynamic Language Model Adaptation Using Presentation Slides for Lecture Speech Recognition
    Yamazaki, Hiroki
    Iwano, Koji
    Shinoda, Koichi
    Furui, Sadaoki
    Yokota, Haruo
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 89 - 92
  • [9] Just-in-time latent semantic adaptation on language model for Chinese speech recognition using web data
    Gao, Qin
    Lin, Xiaojun
    Wu, Xihong
    [J]. 2006 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, 2006, : 50 - +
  • [10] Chameleon: A Language Model Adaptation Toolkit for Automatic Speech Recognition of Conversational Speech
    Song, Yuanfeng
    Jiang, Di
    Zhao, Weiwei
    Xu, Qian
    Wong, Raymond Chi-Wing
    Yang, Qiang
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, 2019, : 37 - 42