A Streaming End-to-End Speech Recognition Approach Based on WeNet for Tibetan Amdo Dialect

被引:0
|
作者
Wang, Chao [1 ]
Wen, Yao [1 ]
Lhamo, Phurba [1 ]
Tashi, Nyima [1 ]
机构
[1] Tibet Univ, Sch Informat Sci & Technol, Tibet, Peoples R China
关键词
tibetan speech recognition; amdo dialect; end-to-end; WeNet;
D O I
10.1145/3578741.3578801
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech recognition is a technique to transcribe acoustic features into text sequences. However, traditional speech recognition model cannot get an effective performance, when dealed with Tibetan Amdo dialect dataset which requires a large amount of linguistic knowledge. In order to solve this issue, we propose an end-to-end speech streaming recognition model which can not only realize the transcription of the Tibetan Amdo dialect but also solve the Tibetan Amdo dialect streaming recognition problem. In the model, we choose Tibetan syllables as the modeling unit and MFCC as the acoustic features. Furthermore, extensive experimental results show good results on our self-built thousand-hour level dataset. Finally, the Character Error Rate(CER) of speech streaming recognition on ourself dataset is 10.73%, which is a relative improvement is 17.65% compared to the baseline model. The CER of speech transcription is 10.23%, which is a relative improvement is 16.01% compared to the baseline model.
引用
收藏
页码:317 / 322
页数:6
相关论文
共 50 条
  • [1] End-to-End Amdo-Tibetan Speech Recognition Based on Knowledge Transfer
    Zhu, Xiaojun
    Huang, Heming
    [J]. IEEE ACCESS, 2020, 8 : 170991 - 171000
  • [2] WeNet: Production Oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit
    Yao, Zhuoyuan
    Wu, Di
    Wang, Xiong
    Zhang, Binbin
    Yu, Fan
    Yang, Chao
    Peng, Zhendong
    Chen, Xiaoyu
    Xie, Lei
    Lei, Xin
    [J]. INTERSPEECH 2021, 2021, : 4054 - 4058
  • [3] End-to-end Tibetan Ando dialect speech recognition based on hybrid CTC/attention architecture
    Sun, Jingwen
    Zhou, Gang
    Yang, Hongwu
    Wang, Man
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 628 - 632
  • [4] End-to-end Speech Synthesis for Tibetan Lhasa Dialect
    Luo, Lisai
    Li, Guanyu
    Gong, Chunwei
    Ding, Hailan
    [J]. 2018 INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS AND CONTROL ENGINEERING (ISPECE 2018), 2019, 1187
  • [5] WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit
    Zhang, Binbin
    Wu, Di
    Peng, Zhendong
    Song, Xingchen
    Yao, Zhuoyuan
    Lv, Hang
    Xie, Lei
    Yang, Chao
    Pan, Fuping
    Niu, Jianwei
    [J]. INTERSPEECH 2022, 2022, : 1661 - 1665
  • [6] Review of End-to-End Streaming Speech Recognition
    Wang, Aohui
    Zhang, Long
    Song, Wenyu
    Meng, Jie
    [J]. Computer Engineering and Applications, 1600, 2 (22-33):
  • [7] STREAMING END-TO-END SPEECH RECOGNITION FOR MOBILE DEVICES
    He, Yanzhang
    Sainath, Tara N.
    Prabhavalkar, Rohit
    McGraw, Ian
    Alvarez, Raziel
    Zhao, Ding
    Rybach, David
    Kannan, Anjuli
    Wu, Yonghui
    Pang, Ruoming
    Liang, Qiao
    Bhatia, Deepti
    Yuan Shangguan
    Li, Bo
    Pundak, Golan
    Sim, Khe Chai
    Bagby, Tom
    Chang, Shuo-yiin
    Rao, Kanishka
    Gruenstein, Alexander
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6381 - 6385
  • [8] Lightweight End-to-End Architecture for Streaming Speech Recognition
    Yang, Shuying
    Li, Xin
    [J]. Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2023, 36 (03): : 268 - 279
  • [9] Dialect-Aware Modeling for End-to-End Japanese Dialect Speech Recognition
    Imaizumi, Ryo
    Masumura, Ryo
    Shiota, Sayaka
    Kiya, Hitoshi
    [J]. 2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 297 - 301
  • [10] Tibetan speech recognition based on wenet
    Zhe, Runyu
    Li, Guanyu
    Ma, Like
    [J]. PROCEEDINGS OF 2024 INTERNATIONAL CONFERENCE ON COMPUTER AND MULTIMEDIA TECHNOLOGY, ICCMT 2024, 2024, : 554 - 557