Multitask Learning with Local Attention for Tibetan Speech Recognition

被引:0
|
作者
Wang, Hui [1 ]
Gao, Fei [1 ]
Zhao, Yue [1 ]
Yang, Li [1 ]
Yue, Jianjian [1 ]
Ma, Huilin [1 ]
机构
[1] Minzu Univ China, Sch Informat Engn, Beijing 100081, Peoples R China
关键词
Speech recognition;
D O I
10.1155/2020/8894566
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
In this paper, we propose to incorporate the local attention in WaveNet-CTC to improve the performance of Tibetan speech recognition in multitask learning. With an increase in task number, such as simultaneous Tibetan speech content recognition, dialect identification, and speaker recognition, the accuracy rate of a single WaveNet-CTC decreases on speech recognition. Inspired by the attention mechanism, we introduce the local attention to automatically tune the weights of feature frames in a window and pay different attention on context information for multitask learning. The experimental results show that our method improves the accuracies of speech recognition for all Tibetan dialects in three-task learning, compared with the baseline model. Furthermore, our method significantly improves the accuracy for low-resource dialect by 5.11% against the specific-dialect model.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] AGMMLN: An Attention-Guided Multiscale Multitask Learning Network for Simultaneous Gesture and Force Level Recognition
    Chen, Zhangyi
    Yu, Yilin
    Wang, Long
    Zhou, Shanjun
    Wang, Kai
    Li, Hongwei
    Li, Xiaoling
    [J]. IEEE Sensors Journal, 2024, 24 (16) : 26825 - 26835
  • [42] Compact and Efficient Multitask Learning in Vision, Language and Speech
    Al-Rawi, Mohammed
    Valveny, Ernest
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 2933 - 2942
  • [43] Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition
    Toshniwal, Shubham
    Tang, Hao
    Lu, Liang
    Livescu, Karen
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3532 - 3536
  • [44] End-to-end Tibetan Ando dialect speech recognition based on hybrid CTC/attention architecture
    Sun, Jingwen
    Zhou, Gang
    Yang, Hongwu
    Wang, Man
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 628 - 632
  • [45] Deep Feature Learning for Tibetan Speech Recognition using Sparse Auto-encoder
    Wang, H.
    Zhao, Y.
    Liu, X. F.
    Xu, X. N.
    Wang, L.
    Zhou, N.
    Xu, Y. M.
    [J]. PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON ELECTRICAL, AUTOMATION AND MECHANICAL ENGINEERING (EAME 2015), 2015, 13 : 342 - 345
  • [46] A language model for Amdo Tibetan speech recognition
    Suan, Taiben
    Cai, Rangzhuoma
    Cai, Zhijie
    Zu, Ba
    Gong, Baojia
    [J]. 2020 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE COMMUNICATION AND NETWORK SECURITY (CSCNS2020), 2021, 336
  • [47] Sequence-to-Sequence Learning via Attention Transfer for Incremental Speech Recognition
    Novitasari, Sashi
    Tjandra, Andros
    Sakti, Sakriani
    Nakamura, Satoshi
    [J]. INTERSPEECH 2019, 2019, : 3835 - 3839
  • [48] Noise-robust Attention Learning for End-to-End Speech Recognition
    Higuchi, Yosuke
    Tawara, Naohiro
    Ogawa, Atsunori
    Iwata, Tomoharu
    Kobayashi, Tetsunori
    Ogawa, Tetsuji
    [J]. 28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 311 - 315
  • [49] REPRESENTATION LEARNING WITH SPECTRO-TEMPORAL-CHANNEL ATTENTION FOR SPEECH EMOTION RECOGNITION
    Guo, Lili
    Wang, Longbiao
    Xu, Chenglin
    Dang, Jianwu
    Chng, Eng Siong
    Li, Haizhou
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6304 - 6308
  • [50] Advancing Multiple Instance Learning with Attention Modeling for Categorical Speech Emotion Recognition
    Mao, Shuiyang
    Ching, P. C.
    Kuo, C-C Jay
    Lee, Tan
    [J]. INTERSPEECH 2020, 2020, : 2357 - 2361