Multitask Learning with Local Attention for Tibetan Speech Recognition

被引:0
|
作者
Wang, Hui [1 ]
Gao, Fei [1 ]
Zhao, Yue [1 ]
Yang, Li [1 ]
Yue, Jianjian [1 ]
Ma, Huilin [1 ]
机构
[1] Minzu Univ China, Sch Informat Engn, Beijing 100081, Peoples R China
关键词
Speech recognition;
D O I
10.1155/2020/8894566
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
In this paper, we propose to incorporate the local attention in WaveNet-CTC to improve the performance of Tibetan speech recognition in multitask learning. With an increase in task number, such as simultaneous Tibetan speech content recognition, dialect identification, and speaker recognition, the accuracy rate of a single WaveNet-CTC decreases on speech recognition. Inspired by the attention mechanism, we introduce the local attention to automatically tune the weights of feature frames in a window and pay different attention on context information for multitask learning. The experimental results show that our method improves the accuracies of speech recognition for all Tibetan dialects in three-task learning, compared with the baseline model. Furthermore, our method significantly improves the accuracy for low-resource dialect by 5.11% against the specific-dialect model.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Neural Simile Recognition with Cyclic Multitask Learning and Local Attention
    Zeng, Jiali
    Song, Linfeng
    Su, Jinsong
    Xie, Jun
    Song, Wei
    Luo, Jiebo
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9515 - 9522
  • [2] End-to-End-Based Tibetan Multitask Speech Recognition
    Zhao, Yue
    Yue, Jianjian
    Xu, Xiaona
    Wu, Licheng
    Li, Xiali
    [J]. IEEE ACCESS, 2019, 7 : 162519 - 162529
  • [3] An open speech resource for Tibetan multi-dialect and multitask recognition
    Zhao, Yue
    Xu, Xiaona
    Yue, Jianjian
    Song, Wei
    Li, Xiali
    Wu, Licheng
    Ji, Qiang
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2020, 22 (2-3) : 297 - 304
  • [4] Multitask Learning with CTC and Segmental CRF for Speech Recognition
    Lu, Liang
    Kong, Lingpeng
    Dyer, Chris
    Smith, Noah A.
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 954 - 958
  • [5] MULTITASK LEARNING AND SYSTEM COMBINATION FOR AUTOMATIC SPEECH RECOGNITION
    Siohan, Olivier
    Rybach, David
    [J]. 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 589 - 595
  • [6] Improved Depression Recognition Using Attention and Multitask Learning of Gender Recognition
    Liu, Yang
    Lu, Xiaoyong
    Hi, Daimin S.
    Yuan, Jingyi
    Pan, Tao
    An, Haizhen
    [J]. 2021 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2021, : 57 - 61
  • [7] Improved End-to-End Speech Emotion Recognition Using Self Attention Mechanism and Multitask Learning
    Li, Yuanchao
    Zhao, Tianyu
    Kawahara, Tatsuya
    [J]. INTERSPEECH 2019, 2019, : 2803 - 2807
  • [8] A Multitask Learning Approach Based on Cascaded Attention Network and Self-Adaption Loss for Speech Emotion Recognition
    Liu, Yang
    Xia, Yuqi
    Sun, Haoqin
    Meng, Xiaolei
    Bai, Jianxiong
    Guan, Wenbo
    Zhao, Zhen
    LI, Yongwei
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2023, E106A (06) : 876 - 885
  • [9] End-to-End Audiovisual Speech Recognition System With Multitask Learning
    Tao, Fei
    Busso, Carlos
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 1 - 11
  • [10] Upgraded Attention-Based Local Feature Learning Block for Speech Emotion Recognition
    Zhao, Huan
    Gao, Yingxue
    Xiao, Yufeng
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2021, PT II, 2021, 12713 : 118 - 130