Real-time translation of English speech through speech feature extraction

被引:0
|
作者
Lei, Xiaoyan [1 ]
机构
[1] Henan Mech & Elect Vocat Coll, 1 Taishan Rd, Zhengzhou 451191, Henan, Peoples R China
关键词
Speech feature; English speech; Real-time translation; Transformer;
D O I
10.1007/s10015-024-00951-w
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Real-time English speech translation is useful in numerous situations, including business and travel. The goal of this research is to improve real-time English speech translation efficacy. Initially, filter bank (FBank) features were extracted from English speech. Subsequently, an enhanced Transformer model was introduced, incorporating a causal convolution module in the front end of the encoder to capture English speech features with location information. The performance of the optimized model in translating English speech to different target languages was tested using the MuST-C dataset. The results revealed differences in translation results for different target languages using the improved Transformer. The highest bilingual evaluation understudy (BLEU) score was observed for Spanish text at 20.84, while Russian text obtained the lowest score of 10.56. The average BLEU score was 18.51, with an average lag time delay of 1202.33 ms. Compared to the conventional Transformer model, the improved model exhibited higher BLEU scores, lower time delay, and optimal performance when utilizing a convolutional kernel size of 3 x 3. The results demonstrate the dependability of the improved Transformer model in real-time English speech translation, highlighting its practical usefulness.
引用
收藏
页码:410 / 415
页数:6
相关论文
共 50 条
  • [1] Real-time speech-to-speech translation for PDAs
    Prasad, R.
    Krstovski, K.
    Choi, F.
    Saleem, S.
    Natarajan, P.
    Decerbo, M.
    Stallard, D.
    2007 IEEE INTERNATIONAL CONFERENCE ON PORTABLE INFORMATION DEVICES, 2007, : 95 - 99
  • [2] Real-Time Statistical Speech Translation
    Wolk, Krzysztof
    Marasek, Krzysztof
    NEW PERSPECTIVES IN INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 1, 2014, 275 : 107 - 113
  • [3] Real-time pre-processing for improved feature extraction of noisy speech
    Raj, P. P.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (03) : 715 - 728
  • [4] Real-time pre-processing for improved feature extraction of noisy speech
    P. P. Raj
    International Journal of Speech Technology, 2021, 24 : 715 - 728
  • [5] Real-time pitch extraction of voiced speech
    George, DE
    Salari, E
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 1997, 20 (04) : 379 - 387
  • [6] Real-time pitch extraction of voiced speech
    Dept of Physics and Astronomy, University of Toledo, Toledo, OH 43606-3390, United States
    不详
    J Network Comput Appl, 4 (379-387):
  • [7] Neural Incremental Speech Recognition Toward Real-Time Machine Speech Translation
    Novitasari, Sashi
    Sakti, Sakriani
    Nakamura, Satoshi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2021, E104D (12) : 2195 - 2208
  • [8] Real-time Speaker Adapted Speech to Speech Translation System in Mobile Environment
    Guan, Yong
    Zheng, Lin
    Tian, Jilei
    2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 577 - +
  • [9] CMU wearable computers for real-time speech translation
    Smailagic, A
    Siewiorek, D
    Reilly, D
    IEEE PERSONAL COMMUNICATIONS, 2001, 8 (02): : 6 - 12
  • [10] A Comparison of Low-Complexity Real-Time Feature Extraction for Neuromorphic Speech Recognition
    Acharya, Jyotibdha
    Patil, Aakash
    Li, Xiaoya
    Chen, Yi
    Liu, Shih-Chii
    Basu, Arindam
    FRONTIERS IN NEUROSCIENCE, 2018, 12