Real-time translation of English speech through speech feature extraction

被引:0
|
作者
Lei, Xiaoyan [1 ]
机构
[1] Henan Mech & Elect Vocat Coll, 1 Taishan Rd, Zhengzhou 451191, Henan, Peoples R China
关键词
Speech feature; English speech; Real-time translation; Transformer;
D O I
10.1007/s10015-024-00951-w
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Real-time English speech translation is useful in numerous situations, including business and travel. The goal of this research is to improve real-time English speech translation efficacy. Initially, filter bank (FBank) features were extracted from English speech. Subsequently, an enhanced Transformer model was introduced, incorporating a causal convolution module in the front end of the encoder to capture English speech features with location information. The performance of the optimized model in translating English speech to different target languages was tested using the MuST-C dataset. The results revealed differences in translation results for different target languages using the improved Transformer. The highest bilingual evaluation understudy (BLEU) score was observed for Spanish text at 20.84, while Russian text obtained the lowest score of 10.56. The average BLEU score was 18.51, with an average lag time delay of 1202.33 ms. Compared to the conventional Transformer model, the improved model exhibited higher BLEU scores, lower time delay, and optimal performance when utilizing a convolutional kernel size of 3 x 3. The results demonstrate the dependability of the improved Transformer model in real-time English speech translation, highlighting its practical usefulness.
引用
收藏
页码:410 / 415
页数:6
相关论文
共 50 条
  • [21] Real-time speech synthesis system driven by visual speech
    Li, G
    Xie, GM
    Lin, L
    PROCEEDINGS OF THE THIRD INTERNATIONAL SYMPOSIUM ON INSTRUMENTATION SCIENCE AND TECHNOLOGY, VOL 2, 2004, : 397 - 402
  • [22] The Recognition of Whispered Speech in Real-Time
    Hendrickson, Kristi
    Ernest, Danielle
    EAR AND HEARING, 2022, 43 (02): : 554 - 562
  • [23] Recommendations for real-time speech MRI
    Lingala, Sajan Goud
    Sutton, Brad P.
    Miquel, Marc E.
    Nayak, Krishna S.
    JOURNAL OF MAGNETIC RESONANCE IMAGING, 2016, 43 (01) : 28 - 44
  • [24] REAL-TIME SPEECH SYNTHESIS SYSTEM
    AINSWORTH, WA
    IEEE TRANSACTIONS ON AUDIO AND ELECTROACOUSTICS, 1972, AU20 (05): : 397 - +
  • [25] REAL-TIME SPEECH CODING - COMMENT
    GOLD, B
    TIERNEY, J
    IEEE TRANSACTIONS ON COMMUNICATIONS, 1983, 31 (03) : 466 - 468
  • [26] Real-time interfaces for speech and singing
    Hunt, A
    Howard, D
    Worsdall, J
    PROCEEDINGS OF THE 26TH EUROMICRO CONFERENCE, VOLS I AND II, 2000, : A356 - A361
  • [27] Real-Time Text Tracking for Text-to-Speech Translation Camera for the Blind
    Goto, Hideaki
    Hoda, Takuma
    COMPUTERS HELPING PEOPLE WITH SPECIAL NEEDS, ICCHP 2014, PT I, 2014, 8547 : 658 - 661
  • [28] An Evaluation of Emotion Units and Feature Types for Real-Time Speech Emotion Recognition
    Vogt, Thurid
    Andre, Elisabeth
    KUNSTLICHE INTELLIGENZ, 2011, 25 (03): : 213 - 223
  • [29] Japanese to English speech translation
    不详
    ELECTRONICS WORLD, 2004, 110 (1816): : 5 - 5
  • [30] Real-time lexical competitions during speech-in-speech comprehension
    Boulenger, Veronique
    Hoen, Michel
    Ferragne, Emmanuel
    Pellegrino, Francois
    Meunier, Fanny
    SPEECH COMMUNICATION, 2010, 52 (03) : 246 - 253