Multitask Learning with Local Attention for Tibetan Speech Recognition

被引：0

作者：

Wang, Hui ^{[1
]}

Gao, Fei ^{[1
]}

Zhao, Yue ^{[1
]}

Yang, Li ^{[1
]}

Yue, Jianjian ^{[1
]}

Ma, Huilin ^{[1
]}

机构：

[1] Minzu Univ China, Sch Informat Engn, Beijing 100081, Peoples R China

来源：

COMPLEXITY | 2020年 / 2020卷

关键词：

Speech recognition;

D O I：

10.1155/2020/8894566

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

In this paper, we propose to incorporate the local attention in WaveNet-CTC to improve the performance of Tibetan speech recognition in multitask learning. With an increase in task number, such as simultaneous Tibetan speech content recognition, dialect identification, and speaker recognition, the accuracy rate of a single WaveNet-CTC decreases on speech recognition. Inspired by the attention mechanism, we introduce the local attention to automatically tune the weights of feature frames in a window and pay different attention on context information for multitask learning. The experimental results show that our method improves the accuracies of speech recognition for all Tibetan dialects in three-task learning, compared with the baseline model. Furthermore, our method significantly improves the accuracy for low-resource dialect by 5.11% against the specific-dialect model.

引用

页数：10

共 50 条

[41] AGMMLN: An Attention-Guided Multiscale Multitask Learning Network for Simultaneous Gesture and Force Level Recognition
Chen, Zhangyi
Yu, Yilin
Wang, Long
Zhou, Shanjun
Wang, Kai
Li, Hongwei
Li, Xiaoling
[J]. IEEE Sensors Journal, 2024, 24 (16) : 26825 - 26835
[42] Compact and Efficient Multitask Learning in Vision, Language and Speech
Al-Rawi, Mohammed
Valveny, Ernest
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 2933 - 2942
[43] Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition
Toshniwal, Shubham
Tang, Hao
Lu, Liang
Livescu, Karen
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3532 - 3536
[44] End-to-end Tibetan Ando dialect speech recognition based on hybrid CTC/attention architecture
Sun, Jingwen
Zhou, Gang
Yang, Hongwu
Wang, Man
[J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 628 - 632
[45] Deep Feature Learning for Tibetan Speech Recognition using Sparse Auto-encoder
Wang, H.
Zhao, Y.
Liu, X. F.
Xu, X. N.
Wang, L.
Zhou, N.
Xu, Y. M.
[J]. PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON ELECTRICAL, AUTOMATION AND MECHANICAL ENGINEERING (EAME 2015), 2015, 13 : 342 - 345
[46] A language model for Amdo Tibetan speech recognition
Suan, Taiben
Cai, Rangzhuoma
Cai, Zhijie
Zu, Ba
Gong, Baojia
[J]. 2020 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE COMMUNICATION AND NETWORK SECURITY (CSCNS2020), 2021, 336
[47] Sequence-to-Sequence Learning via Attention Transfer for Incremental Speech Recognition
Novitasari, Sashi
Tjandra, Andros
Sakti, Sakriani
Nakamura, Satoshi
[J]. INTERSPEECH 2019, 2019, : 3835 - 3839
[48] Noise-robust Attention Learning for End-to-End Speech Recognition
Higuchi, Yosuke
Tawara, Naohiro
Ogawa, Atsunori
Iwata, Tomoharu
Kobayashi, Tetsunori
Ogawa, Tetsuji
[J]. 28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 311 - 315
[49] REPRESENTATION LEARNING WITH SPECTRO-TEMPORAL-CHANNEL ATTENTION FOR SPEECH EMOTION RECOGNITION
Guo, Lili
Wang, Longbiao
Xu, Chenglin
Dang, Jianwu
Chng, Eng Siong
Li, Haizhou
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6304 - 6308
[50] Advancing Multiple Instance Learning with Attention Modeling for Categorical Speech Emotion Recognition
Mao, Shuiyang
Ching, P. C.
Kuo, C-C Jay
Lee, Tan
[J]. INTERSPEECH 2020, 2020, : 2357 - 2361

← 1 2 3 4 5 →