Grammar-Supervised End-to-End Speech Recognition with Part-of-Speech Tagging and Dependency Parsing

被引:1
|
作者
Wan, Genshun [1 ,2 ]
Mao, Tingzhi [2 ]
Zhang, Jingxuan [2 ]
Chen, Hang [1 ]
Gao, Jianqing [2 ]
Ye, Zhongfu [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Res Ctr Speech & Language Informat Proc, Hefei 230088, Peoples R China
[2] iFLYTEK Co Ltd, iFLYTEK Res, Hefei 230088, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 07期
关键词
speech recognition; grammar knowledge; multiple evaluation methodology of grammar; grammatical deviation distance;
D O I
10.3390/app13074243
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
For most automatic speech recognition systems, many unacceptable hypothesis errors still make the recognition results absurd and difficult to understand. In this paper, we introduce the grammar information to improve the performance of the grammatical deviation distance and increase the readability of the hypothesis. The reinforcement of word embedding with grammar embedding is presented to intensify the grammar expression. An auxiliary text-to-grammar task is provided to improve the performance of the recognition results with the downstream task evaluation. Furthermore, the multiple evaluation methodology of grammar is used to explore an expandable usage paradigm with grammar knowledge. Experiments on the small open-source Mandarin speech corpus AISHELL-1 and large private-source Mandarin speech corpus TRANS-M tasks show that our method can perform very well with no additional data. Our method achieves relative character error rate reductions of 3.2% and 5.0%, a relative grammatical deviation distance reduction of 4.7% and 5.9% on AISHELL-1 and TRANS-M tasks, respectively. Moreover, the grammar-based mean opinion score of our method is about 4.29 and 3.20, significantly superior to the baseline of 4.11 and 3.02.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] End-to-End Speech Recognition For Arabic Dialects
    Nasr, Seham
    Duwairi, Rehab
    Quwaider, Muhannad
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2023, 48 (08) : 10617 - 10633
  • [42] End-to-End Speech Recognition in Agglutinative Languages
    Mamyrbayev, Orken
    Alimhan, Keylan
    Zhumazhanov, Bagashar
    Turdalykyzy, Tolganay
    Gusmanova, Farida
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2020), PT II, 2020, 12034 : 391 - 401
  • [43] End-to-end Korean Digits Speech Recognition
    Roh, Jong-hyuk
    Cho, Kwantae
    Kim, Youngsam
    Cho, Sangrae
    2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 1137 - 1139
  • [44] End-to-end audio-visual speech recognition for overlapping speech
    Rose, Richard
    Siohan, Olivier
    Tripathi, Anshuman
    Braga, Otavio
    INTERSPEECH 2021, 2021, : 3016 - 3020
  • [45] Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
    Amodei, Dario
    Ananthanarayanan, Sundaram
    Anubhai, Rishita
    Bai, Jingliang
    Battenberg, Eric
    Case, Carl
    Casper, Jared
    Catanzaro, Bryan
    Cheng, Qiang
    Chen, Guoliang
    Chen, Jie
    Chen, Jingdong
    Chen, Zhijie
    Chrzanowski, Mike
    Coates, Adam
    Diamos, Greg
    Ding, Ke
    Du, Niandong
    Elsen, Erich
    Engel, Jesse
    Fang, Weiwei
    Fan, Linxi
    Fougner, Christopher
    Gao, Liang
    Gong, Caixia
    Hannun, Awni
    Han, Tony
    Johannes, Lappi Vaino
    Jiang, Bing
    Ju, Cai
    Jun, Billy
    LeGresley, Patrick
    Lin, Libby
    Liu, Junjie
    Liu, Yang
    Li, Weigao
    Li, Xiangang
    Ma, Dongpeng
    Narang, Sharan
    Ng, Andrew
    Ozair, Sherjil
    Peng, Yiping
    Prenger, Ryan
    Qian, Sheng
    Quan, Zongfeng
    Raiman, Jonathan
    Rao, Vinay
    Satheesh, Sanjeev
    Seetapun, David
    Sengupta, Shubho
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [46] EXTENDING PARROTRON: AN END-TO-END, SPEECH CONVERSION AND SPEECH RECOGNITION MODEL FOR ATYPICAL SPEECH
    Doshi, Rohan
    Chen, Youzheng
    Jiang, Liyang
    Zhang, Xia
    Biadsy, Fadi
    Ramabhadran, Bhuvana
    Chu, Fang
    Rosenberg, Andrew
    Moreno, Pedro J.
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6988 - 6992
  • [47] Cross-Language Dependency Parsing Using Part-of-Speech Patterns
    Bednar, Peter
    TEXT, SPEECH, AND DIALOGUE, 2016, 9924 : 117 - 124
  • [48] SEMI-SUPERVISED END-TO-END SPEECH RECOGNITION VIA LOCAL PRIOR MATCHING
    Hsu, Wei-Ning
    Lee, Ann
    Synnaeve, Gabriel
    Hannun, Awni
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 125 - 132
  • [49] Framewise Supervised Training towards End-to-End Speech Recognition Models: First Results
    Li, Mohan
    Cao, Yuanjiang
    Zhou, Weicong
    Liu, Min
    INTERSPEECH 2019, 2019, : 1641 - 1645
  • [50] END-TO-END TRAINING OF A LARGE VOCABULARY END-TO-END SPEECH RECOGNITION SYSTEM
    Kim, Chanwoo
    Kim, Sungsoo
    Kim, Kwangyoun
    Kumar, Mehul
    Kim, Jiyeon
    Lee, Kyungmin
    Han, Changwoo
    Garg, Abhinav
    Kim, Eunhyang
    Shin, Minkyoo
    Singh, Shatrughan
    Heck, Larry
    Gowda, Dhananjaya
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 562 - 569