Direct Vs Cascaded Speech-to-Speech Translation Using Transformer

被引:0
|
作者
Arya, Lalaram [1 ]
Chowdhury, Amartya Roy [1 ]
Prasanna, S. R. Mahadeva [1 ]
机构
[1] Indian Inst Technol Dharwad, Dharwad 580011, India
来源
关键词
Direct speech-to-speech translation (DS2ST); Transformer network; Speech-to-speech translation (S2ST); Data augmentation;
D O I
10.1007/978-3-031-48312-7_21
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Direct speech-to-speech translation (DS2ST) is a process of translating speech from one language to another without using a written form of the language. Most of the works attempted for DS2ST utilized the auxiliary network and knowledge from the written form of the language directly or indirectly to improve the performance. This work proposes a transformer-based sequence-to-sequence model to perform the DS2ST task without an auxiliary network. Also, a comparative study is made with a cascaded system. The experiments are performed with the Prabhupadavani dataset in two languages (Hindi and English). The result shows that with our proposed DS2ST model, a BLEU score of 16.46 is achieved without using any auxiliary information. We also augmented the data with speed perturbation and improved the DS2ST performance BLEU score to 18.58.
引用
收藏
页码:258 / 270
页数:13
相关论文
共 50 条
  • [1] TRANSFORMER-BASED DIRECT SPEECH-TO-SPEECH TRANSLATION WITH TRANSCODER
    Kano, Takatomo
    Sakti, Sakriani
    Nakamura, Satoshi
    [J]. 2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 958 - 965
  • [2] Direct Speech-to-Speech Translation With Discrete Units
    Lee, Ann
    Chen, Peng-Jen
    Wang, Changhan
    Gu, Jiatao
    Popuri, Sravya
    Ma, Xutai
    Polyak, Adam
    Adi, Yossi
    He, Qing
    Tang, Yun
    Pino, Juan
    Hsu, Wei-Ning
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 3327 - 3339
  • [3] Direct speech-to-speech translation with a sequence-to-sequence model
    Jia, Ye
    Weiss, Ron J.
    Biadsy, Fadi
    Macherey, Wolfgang
    Johnson, Melvin
    Chen, Zhifeng
    Wu, Yonghui
    [J]. INTERSPEECH 2019, 2019, : 1123 - 1127
  • [4] Multilingual Web Conferencing Using Speech-to-Speech Translation
    Chen, John
    Wen, Shufei
    Sridhar, Vivek Kumar Rangarajan
    Bangalore, Srinivas
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1860 - 1862
  • [5] Impacts of machine translation and speech synthesis on speech-to-speech translation
    Hashimoto, Kei
    Yamagishi, Junichi
    Byrne, William
    King, Simon
    Tokuda, Keiichi
    [J]. SPEECH COMMUNICATION, 2012, 54 (07) : 857 - 866
  • [6] The NESPOLE! speech-to-speech translation system
    Lavie, A
    Levin, L
    Frederking, R
    Pianesi, F
    [J]. MACHINE TRANSLATION: FROM RESEARCH TO REAL USERS, 2002, 2499 : 240 - 243
  • [7] Hierarchical Classification for Speech-to-Speech Translation
    Ettelaie, Emil
    Georgiou, Panayiotis G.
    Narayanan, Shrikanth S.
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2534 - 2537
  • [8] Prosody generation for speech-to-speech translation
    Aguero, Pablo Daniel
    Adell, Jordi
    Bonafonte, Antonio
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 557 - 560
  • [9] Towards Machine Speech-to-speech Translation
    Satoshi, Nakamura
    Sudoh, Katsuhito
    Sakti, Sakriani
    [J]. TRADUMATICA-TRADUCCIO I TECNOLOGIES DE LA INFORMACIO I LA COMUNICACIO, 2019, (17): : 81 - 87
  • [10] Contextual reasoning in speech-to-speech translation
    Koch, S
    Küssner, U
    Stede, M
    Tidhar, D
    [J]. NATURAL LANGUAGE PROCESSING-NLP 2000, PROCEEDINGS, 2000, 1835 : 283 - 292