An Introduction to the Chinese Speech Recognition Front-End of the NICT/ATR Multi-Lingual Speech Translation System

被引:2
|
作者
张劲松
Takatoshi Jitsuhiro
Hirofumi Yamamoto
胡新辉
Satoshi Nakamura
机构
[1] Knowledge Creating Communication Research Center, National Institute of Information and Communications Technology, 2-2-2 Keihanna Science City, Kyoto 619-0288, Japan ATR Spoken Language Translation Research Laboratories, 2-2-2 Keihanna Science City, Kyoto
[2] ATR Knowledge Science Laboratories, 2-2-2 Keihanna Science City, Kyoto 619-0288, Japan
关键词
Chinese speech recognition; mutual information; phoneme set design; hidden Markov network; minimum description length; successive state splitting; multi-class composite N-grams;
D O I
暂无
中图分类号
TP391.42 [];
学科分类号
0811 ; 081101 ; 081104 ; 1405 ;
摘要
This paper introduces several important features of the Chinese large vocabulary continuous speech recognition system in the NICT/ATR multi-lingual speech-to-speech translation system. The features include: (1) a flexible way to derive an information rich phoneme set based on mutual information between a text corpus and its phoneme set; (2) a hidden Markov network acoustic model and a successive state split-ting algorithm to generate its model topology based on a minimum description length criterion; and (3) ad-vanced language modeling using multi-class composite N-grams. These features allow a recognition per-formance of 90% character accuracy in tourism related dialogue with a real time response speed.
引用
收藏
页码:545 / 552
页数:8
相关论文
共 50 条
  • [1] An Introduction to the Chinese Speech Recognition Front-End of the NICT/ATR Multi-Lingual Speech Translation System
    Knowledge Creating Communication Research Center, National Institute of Information and Communications Technology, 2-2-2 Keihanna Science City, Kyoto, 619-0288, Japan
    不详
    不详
    [J]. Tsinghua Sci. Tech., 2008, 4 (545-552):
  • [2] NICT/ATR Chinese-Japanese-English Speech-to-Speech Translation System
    Tohru Shimizu
    Yutaka Ashikari
    Eiichiro Sumita
    张劲松
    Satoshi Nakamura
    [J]. Tsinghua Science and Technology, 2008, (04) : 540 - 544
  • [3] NICT/ATR Chinese-Japanese-English Speech-to-Speech Translation System
    Shimizu, Tohru
    Ashikari, Yutaka
    Sumita, Eiichiro
    Zhang, Jinsong
    Nakamura, Satoshi
    [J]. Tsinghua Science and Technology, 2008, 13 (04) : 540 - 544
  • [4] Development of the "VoiceTra" Multi-Lingual Speech Translation System
    Matsuda, Shigeki
    Hayashi, Teruaki
    Ashikari, Yutaka
    Shiga, Yoshinori
    Kashioka, Hidenori
    Yasuda, Keiji
    Okuma, Hideo
    Uchiyama, Masao
    Sumita, Eiichiro
    Kawai, Hisashi
    Nakamura, Satoshi
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2017, E100D (04): : 621 - 632
  • [5] A Front-End Speech Enhancement System for Robust Automotive Speech Recognition
    Wang, Haikun
    Ye, Zhongfu
    Chen, Jingdong
    [J]. 2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 1 - 5
  • [6] Transfer Learning from Multi-Lingual Speech Translation Benefits Low-Resource Speech Recognition
    Vanderreydt, Geoffroy
    Remy, Francois
    Demuynck, Kris
    [J]. INTERSPEECH 2022, 2022, : 3053 - 3057
  • [7] A Reassigned Front-End for Speech Recognition
    Tryfou, Georgina
    Omologo, Maurizio
    [J]. 2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2017, : 553 - 557
  • [8] A multi-lingual speech recognition system using a neural network approach
    Chen, OTC
    Chen, CY
    Chang, HT
    Hsu, FR
    Yang, HL
    Lee, YG
    [J]. ICNN - 1996 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOLS. 1-4, 1996, : 1576 - 1581
  • [9] SERAB: A MULTI-LINGUAL BENCHMARK FOR SPEECH EMOTION RECOGNITION
    Scheidwasser-Clow, Neil
    Kegler, Mikolaj
    Beckmann, Pierre
    Cernak, Milos
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7697 - 7701
  • [10] An automatic machine translation system for multi-lingual speech to Indian sign language
    Dhanjal, Amandeep Singh
    Singh, Williamjeet
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (03) : 4283 - 4321