Direct Vs Cascaded Speech-to-Speech Translation Using Transformer

被引:0
|
作者
Arya, Lalaram [1 ]
Chowdhury, Amartya Roy [1 ]
Prasanna, S. R. Mahadeva [1 ]
机构
[1] Indian Inst Technol Dharwad, Dharwad 580011, India
来源
关键词
Direct speech-to-speech translation (DS2ST); Transformer network; Speech-to-speech translation (S2ST); Data augmentation;
D O I
10.1007/978-3-031-48312-7_21
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Direct speech-to-speech translation (DS2ST) is a process of translating speech from one language to another without using a written form of the language. Most of the works attempted for DS2ST utilized the auxiliary network and knowledge from the written form of the language directly or indirectly to improve the performance. This work proposes a transformer-based sequence-to-sequence model to perform the DS2ST task without an auxiliary network. Also, a comparative study is made with a cascaded system. The experiments are performed with the Prabhupadavani dataset in two languages (Hindi and English). The result shows that with our proposed DS2ST model, a BLEU score of 16.46 is achieved without using any auxiliary information. We also augmented the data with speed perturbation and improved the DS2ST performance BLEU score to 18.58.
引用
收藏
页码:258 / 270
页数:13
相关论文
共 50 条
  • [41] Speech-to-speech translation software on PDAs for travel conversation
    Isotani, Ryosuke
    Yamabana, Kiyoshi
    Ando, Shinichi
    Hanazawa, Ken
    Ishikawa, Shin-Ya
    Iso, Ken-Ichi
    [J]. NEC Research and Development, 2003, 44 (SPEC.): : 197 - 202
  • [42] CVSS Corpus and Massively Multilingual Speech-to-Speech Translation
    Jia, Ye
    Ramanovich, Michelle Tadmor
    Wang, Quan
    Zen, Heiga
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6691 - 6703
  • [43] Rhonda: the architecture of a multilingual speech-to-speech translation pipeline
    Louw, Johannes A.
    Moodley, Avashlin
    [J]. 2018 INTERNATIONAL CONFERENCE ON INTELLIGENT AND INNOVATIVE COMPUTING APPLICATIONS (ICONIC), 2018, : 194 - 200
  • [44] Speech-to-speech translation software on PDAs for travel conversation
    Isotani, R
    Yamabana, K
    Ando, S
    Hanazawa, K
    Ishikawa, S
    Iso, K
    [J]. NEC RESEARCH & DEVELOPMENT, 2003, 44 (02): : 197 - 202
  • [45] TECNOPARLA - Speech technologies for Catalan and its application to Speech-to-speech Translation
    Schulz, Henrik
    Costa-Jussa, Marta R.
    Fonollosa, Jose A. R.
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2008, (41): : 319 - 320
  • [46] NAME AWARE SPEECH-TO-SPEECH TRANSLATION FOR ENGLISH/IRAQI
    Prasad, Rohit
    Moran, Christine
    Choi, Fred
    Meermeier, Ralf
    Saleem, Shirin
    Kao, Chia-lin
    Stallard, Dave
    Natarajan, Prem
    [J]. 2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 249 - 252
  • [47] Real-time speech-to-speech translation for PDAs
    Prasad, R.
    Krstovski, K.
    Choi, F.
    Saleem, S.
    Natarajan, P.
    Decerbo, M.
    Stallard, D.
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON PORTABLE INFORMATION DEVICES, 2007, : 95 - 99
  • [48] Input segmentation of spontaneous speech in JANUS: A speech-to-speech translation system
    Lavie, A
    Gates, D
    Coccaro, N
    Levin, L
    [J]. DIALOGUE PROCESSING IN SPOKEN LANGUAGE SYSTEMS, 1997, 1236 : 86 - 99
  • [49] Enriching machine-mediated speech-to-speech translation using contextual information
    Sridhar, Vivek Kumar Rangarajan
    Bangalore, Srinivas
    Narayanan, Shrikanth
    [J]. COMPUTER SPEECH AND LANGUAGE, 2013, 27 (02): : 492 - 508
  • [50] Approach toward speech-to-speech translation system by using a collection of sentences and utterances
    Sumita, E
    Nakaiwa, H
    Kikui, G
    Yamamoto, S
    [J]. ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, : 652 - 657