A Faster Approach For Direct Speech to Speech Translation

被引:1
|
作者
Shankarappa, Rashmi T. [1 ]
Tiwari, Sourabh [1 ]
机构
[1] Samsung R&D Inst, Voice Intelligence Team, Bengaluru, India
关键词
Speech Signal Processing; Machine Learning; Translation System; Encoder-Decoder;
D O I
10.1109/WINTECHCON55229.2022.9832314
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As the world is pacing towards globalization, the demand for automatic language translators is increasing rapidly. Traditional translation systems consist of multiple steps like speech recognition, text to text machine translation, and speech generation. Issue with these systems are, latency due to multiple steps and error propagation from first steps toward last steps. Another challenge is that many spoken languages do not have text representation, so traditional system involving speech to text and text to text translation do not work. In this paper, we are presenting a recurrent neural network (RNN) based translation system that can generate a direct waveform of target language audio. We have used the sparse coding technique for the extraction and inversion of audio features. An attention-based multi-layered sequence to sequence model is trained using a novel technique on a dataset of Spanish to English audio and no intermediate text representation is used while training or inference. We have done performance comparison of proposed approaches using latency, bilingual evaluation understudy (BLEU) score and Perceptual Evaluation of Speech Quality PESQ score analysis. The resulting system provides a very fast translation with good translation accuracy and audio quality.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] Combining Many Alignments for Speech to Speech Translation
    Maskey, Sameer R.
    Rennie, Steven J.
    Zhou, Bowen
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2542 - 2545
  • [32] The NESPOLE! speech-to-speech translation system
    Lavie, A
    Levin, L
    Frederking, R
    Pianesi, F
    [J]. MACHINE TRANSLATION: FROM RESEARCH TO REAL USERS, 2002, 2499 : 240 - 243
  • [33] Language Identification for Speech-to-Speech Translation
    Lim, Daniel Chung Yong
    Lane, Ian
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 204 - 207
  • [34] Contextual reasoning in speech-to-speech translation
    Koch, S
    Küssner, U
    Stede, M
    Tidhar, D
    [J]. NATURAL LANGUAGE PROCESSING-NLP 2000, PROCEEDINGS, 2000, 1835 : 283 - 292
  • [35] Translation, direct quotation and decontextualisation (Reported speech, process of translation, cultural criteria)
    Slembrouck, S
    [J]. PERSPECTIVES-STUDIES IN TRANSLATION THEORY AND PRACTICE, 1999, 7 (01): : 81 - 108
  • [36] A new approach to speech-input statistical translation
    García-Varea, I
    Sanchis, A
    Casacuberta, F
    [J]. 15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 3, PROCEEDINGS: IMAGE, SPEECH AND SIGNAL PROCESSING, 2000, : 90 - 93
  • [37] INSTANCE-BASED MODEL ADAPTATION FOR DIRECT SPEECH TRANSLATION
    Di Gangi, Mattia A.
    Viet-Nhat Nguyen
    Negri, Matteo
    Turchi, Marco
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7914 - 7918
  • [38] INTEGRATING MACHINE TRANSLATION AND SPEECH SYNTHESIS COMPONENT FOR ENGLISH TO DRAVIDIAN LANGUAGE SPEECH TO SPEECH TRANSLATION SYSTEM
    Sangeetha, J.
    Jothilakshmi, S.
    [J]. JOURNAL OF ENGINEERING SCIENCE AND TECHNOLOGY, 2015, 10 (02): : 196 - 211
  • [39] Direct Text to Speech Translation System Using Acoustic Units
    Mingote, Victoria
    Gimeno, Pablo
    Vicente, Luis
    Khurana, Sameer
    Laurent, Antoine
    Duret, Jarod
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1262 - 1266
  • [40] ANALYSIS OF LAYER-WISE TRAINING IN DIRECT SPEECH TO SPEECH TRANSLATION USING BI-LSTM
    Arya, Lalaram
    Agarwal, Ayush
    Mishra, Jagabandhu
    Prasanna, S. R. Mahadeva
    [J]. 2022 25TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA 2022), 2022,