Grammar-Supervised End-to-End Speech Recognition with Part-of-Speech Tagging and Dependency Parsing

被引:1
|
作者
Wan, Genshun [1 ,2 ]
Mao, Tingzhi [2 ]
Zhang, Jingxuan [2 ]
Chen, Hang [1 ]
Gao, Jianqing [2 ]
Ye, Zhongfu [1 ]
机构
[1] Univ Sci & Technol China, Natl Engn Res Ctr Speech & Language Informat Proc, Hefei 230088, Peoples R China
[2] iFLYTEK Co Ltd, iFLYTEK Res, Hefei 230088, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 07期
关键词
speech recognition; grammar knowledge; multiple evaluation methodology of grammar; grammatical deviation distance;
D O I
10.3390/app13074243
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
For most automatic speech recognition systems, many unacceptable hypothesis errors still make the recognition results absurd and difficult to understand. In this paper, we introduce the grammar information to improve the performance of the grammatical deviation distance and increase the readability of the hypothesis. The reinforcement of word embedding with grammar embedding is presented to intensify the grammar expression. An auxiliary text-to-grammar task is provided to improve the performance of the recognition results with the downstream task evaluation. Furthermore, the multiple evaluation methodology of grammar is used to explore an expandable usage paradigm with grammar knowledge. Experiments on the small open-source Mandarin speech corpus AISHELL-1 and large private-source Mandarin speech corpus TRANS-M tasks show that our method can perform very well with no additional data. Our method achieves relative character error rate reductions of 3.2% and 5.0%, a relative grammatical deviation distance reduction of 4.7% and 5.9% on AISHELL-1 and TRANS-M tasks, respectively. Moreover, the grammar-based mean opinion score of our method is about 4.29 and 3.20, significantly superior to the baseline of 4.11 and 3.02.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] End-to-End Korean Part-of-Speech Tagging Using Copying Mechanism
    Jung, Sangkeun
    Lee, Changki
    Hwang, Hyunsun
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2018, 17 (03)
  • [2] Exploiting meta features for dependency parsing and part-of-speech tagging
    Chen, Wenliang
    Zhang, Min
    Zhang, Yue
    Duan, Xiangyu
    ARTIFICIAL INTELLIGENCE, 2016, 230 : 173 - 191
  • [3] Part-of-speech tagging and partial parsing
    Abney, S
    CORPUS-BASED METHODS IN LANGUAGE AND SPEECH PROCESSING, 1997, 2 : 118 - 136
  • [4] Semi-Supervised End-to-End Speech Recognition
    Karita, Shigeki
    Watanabe, Shinji
    Iwata, Tomoharu
    Ogawa, Atsunori
    Delcroix, Marc
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2 - 6
  • [5] Semi-supervised Part-of-speech Tagging in Speech Applications
    Dufour, Richard
    Favre, Benoit
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1373 - 1376
  • [6] SCaLa: Supervised Contrastive Learning for End-to-End Speech Recognition
    Fu, Li
    Li, Xiaoxiao
    Wang, Runyu
    Fan, Lu
    Zhang, Zhengchen
    Chen, Meng
    Wu, Youzheng
    He, Xiaodong
    INTERSPEECH 2022, 2022, : 1006 - 1010
  • [8] Part-of-speech tagging
    Martinez, Angel R.
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2012, 4 (01): : 107 - 113
  • [9] Part-of-speech tagging for table of contents recognition
    Belaïd, A
    Pierron, L
    Valverde, N
    15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 4, PROCEEDINGS: APPLICATIONS, ROBOTICS SYSTEMS AND ARCHITECTURES, 2000, : 451 - 454
  • [10] SEMI-SUPERVISED END-TO-END SPEECH RECOGNITION USING TEXT-TO-SPEECH AND AUTOENCODERS
    Karita, Shigeki
    Watanabe, Shinji
    Iwata, Tomoharu
    Delcroix, Marc
    Ogawa, Atsunori
    Nakatani, Tomohiro
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6166 - 6170