Pinyin as a Feature of Neural Machine Translation for Chinese Speech Recognition Error Correction

被引:3
|
作者
Duan, Dagao [1 ]
Liang, Shaohu [1 ]
Han, Zhongming [1 ]
Yang, Weijie [1 ]
机构
[1] Beijing Technol & Business Univ, Beijing, Peoples R China
来源
关键词
Automatic speech recognition; Neural machine translation; Attention mechanism; Pinyin encoding; Chinese error correct;
D O I
10.1007/978-3-030-32381-3_52
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text correction after automatic speech recognition (ASR) is an important method to improve the speech recognition system. We regard the speech error correction as a translation task-from the language of bad Chinese to the language of good Chinese. We propose a speech recognition error correction algorithm based on neural machine translation (NMT) model. The algorithm is characterized by Chinese Pinyin coding, using a multilayer convolutional encoder-decoder with attention neural network. In the WeChat speech transcription data set we collected, our model substantially outperforms all prior neural approaches on this data set as well as the strong statistical machine translation-based systems. Our analysis shows the superiority of convolutional neural networks in capturing the local context via attention and thereby improving the coverage in speech transcription errors. By boosting multiple modes, using data augmentation and 3-gram language model tricks, our novel algorithm makes the error rate on the test set decreased by 26.2% on average. Our results show that using a multilayer convolutional encoder-decoder with Pinyin feature is able to achieve state-of-the-art performance in text correction after speech recognition.
引用
收藏
页码:651 / 663
页数:13
相关论文
共 50 条
  • [1] Neural Machine Translation with Error Correction
    Song, Kaitao
    Tan, Xu
    Lu, Jianfeng
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3891 - 3897
  • [2] Language modeling in speech recognition for grammatical error detection based on neural machine translation
    Fu, Jiang
    Chiba, Yuya
    Nose, Takashi
    Ito, Akinori
    [J]. ACOUSTICAL SCIENCE AND TECHNOLOGY, 2020, 41 (05) : 788 - 791
  • [3] A Study on Error Feature Analysis and Error Correction in English Translation Through Machine Translation
    Tao, Guifang
    [J]. Informatica (Slovenia), 2023, 47 (08): : 13 - 18
  • [4] Incorporating Pinyin into Pipeline Named Entity Recognition from Chinese Speech
    Zhang, Min
    Qiao, Xiaosong
    Zhao, Yanqing
    Su, Chang
    Li, Yinglu
    Zhu, Ming
    Zhu, Junhao
    Li, Yuang
    Zhao, Xiaofeng
    Liu, Yilun
    Ma, Wenbing
    Piao, Mengyao
    Yu, Jiawei
    Lv, Xinglin
    Peng, Song
    Tao, Shimin
    Yang, Hao
    Jiang, Yanfei
    [J]. 2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 947 - 953
  • [5] Integration of speech recognition and machine translation: Speech recognition word lattice translation
    Zhang, RQ
    Kikui, G
    [J]. SPEECH COMMUNICATION, 2006, 48 (3-4) : 321 - 334
  • [6] Neural Incremental Speech Recognition Toward Real-Time Machine Speech Translation
    Novitasari, Sashi
    Sakti, Sakriani
    Nakamura, Satoshi
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2021, E104D (12) : 2195 - 2208
  • [7] Translation Quality and Error Recognition in Professional Neural Machine Translation Post-Editing
    Vardaro, Jennifer
    Schaeffer, Moritz
    Hansen-Schirra, Silvia
    [J]. INFORMATICS-BASEL, 2019, 6 (03):
  • [8] Mongolian-Chinese Unsupervised Neural Machine Translation with Lexical Feature
    Wu, Ziyu
    Hou, Hongxu
    Guo, Ziyue
    Wang, Xuejiao
    Sun, Shuo
    [J]. CHINESE COMPUTATIONAL LINGUISTICS, CCL 2019, 2019, 11856 : 334 - 345
  • [9] Integrating speech recognition and machine translation
    Matsoukas, Spyros
    Bulyko, Ivan
    Xiang, Bing
    Nguyen, Kham
    Schwartz, Richard
    Makhoul, John
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 1281 - +
  • [10] LANGUAGE MODEL BOOTSTRAPPING USING NEURAL MACHINE TRANSLATION FOR CONVERSATIONAL SPEECH RECOGNITION
    Punjabi, Surabhi
    Arsikere, Harish
    Garimella, Sri
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 487 - 493