A Faster Approach For Direct Speech to Speech Translation

被引:1
|
作者
Shankarappa, Rashmi T. [1 ]
Tiwari, Sourabh [1 ]
机构
[1] Samsung R&D Inst, Voice Intelligence Team, Bengaluru, India
关键词
Speech Signal Processing; Machine Learning; Translation System; Encoder-Decoder;
D O I
10.1109/WINTECHCON55229.2022.9832314
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As the world is pacing towards globalization, the demand for automatic language translators is increasing rapidly. Traditional translation systems consist of multiple steps like speech recognition, text to text machine translation, and speech generation. Issue with these systems are, latency due to multiple steps and error propagation from first steps toward last steps. Another challenge is that many spoken languages do not have text representation, so traditional system involving speech to text and text to text translation do not work. In this paper, we are presenting a recurrent neural network (RNN) based translation system that can generate a direct waveform of target language audio. We have used the sparse coding technique for the extraction and inversion of audio features. An attention-based multi-layered sequence to sequence model is trained using a novel technique on a dataset of Spanish to English audio and no intermediate text representation is used while training or inference. We have done performance comparison of proposed approaches using latency, bilingual evaluation understudy (BLEU) score and Perceptual Evaluation of Speech Quality PESQ score analysis. The resulting system provides a very fast translation with good translation accuracy and audio quality.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Direct Speech-to-Speech Translation With Discrete Units
    Lee, Ann
    Chen, Peng-Jen
    Wang, Changhan
    Gu, Jiatao
    Popuri, Sravya
    Ma, Xutai
    Polyak, Adam
    Adi, Yossi
    He, Qing
    Tang, Yun
    Pino, Juan
    Hsu, Wei-Ning
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 3327 - 3339
  • [2] Direct Speech-to-Image Translation
    Li, Jiguo
    Zhang, Xinfeng
    Jia, Chuanmin
    Xu, Jizheng
    Zhang, Li
    Wang, Yue
    Ma, Siwei
    Gao, Wen
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2020, 14 (03) : 517 - 529
  • [3] On the Locality of Attention in Direct Speech Translation
    Alastruey, Belen
    Ferrando, Javier
    Gallego, Gerard, I
    Costa-jussa, Marta R.
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): STUDENT RESEARCH WORKSHOP, 2022, : 402 - 412
  • [4] Direct Speech Translation for Automatic Subtitling
    Papi, Sara
    Gaido, Marco
    Karakanta, Alina
    Cettolo, Mauro
    Negri, Matteo
    Turchi, Marco
    [J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2023, 11 : 1355 - 1376
  • [5] TRANSFORMER-BASED DIRECT SPEECH-TO-SPEECH TRANSLATION WITH TRANSCODER
    Kano, Takatomo
    Sakti, Sakriani
    Nakamura, Satoshi
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 958 - 965
  • [6] Direct speech-to-speech translation with a sequence-to-sequence model
    Jia, Ye
    Weiss, Ron J.
    Biadsy, Fadi
    Macherey, Wolfgang
    Johnson, Melvin
    Chen, Zhifeng
    Wu, Yonghui
    [J]. INTERSPEECH 2019, 2019, : 1123 - 1127
  • [7] Direct Vs Cascaded Speech-to-Speech Translation Using Transformer
    Arya, Lalaram
    Chowdhury, Amartya Roy
    Prasanna, S. R. Mahadeva
    [J]. SPEECH AND COMPUTER, SPECOM 2023, PT II, 2023, 14339 : 258 - 270
  • [8] Cascade or Direct Speech Translation? A Case Study
    Etchegoyhen, Thierry
    Arzelus, Haritz
    Gete, Harritxu
    Alvarez, Aitor
    Torre, Ivan G.
    Martin-Donas, Juan Manuel
    Gonzalez-Docasal, Ander
    Fernandez, Edson Benites
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (03):
  • [9] Direct Segmentation Models for Streaming Speech Translation
    Iranzo-Sanchez, Javier
    Pastor, Adria Gimenez
    Silvestre-Cerda, Joan Albert
    Baquero-Arnal, Pau
    Civera, Jorge
    Juan, Alfons
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 2599 - 2611
  • [10] Impacts of machine translation and speech synthesis on speech-to-speech translation
    Hashimoto, Kei
    Yamagishi, Junichi
    Byrne, William
    King, Simon
    Tokuda, Keiichi
    [J]. SPEECH COMMUNICATION, 2012, 54 (07) : 857 - 866