From Start to Finish: Latency Reduction Strategies for Incremental Speech Synthesis in Simultaneous Speech-to-Speech Translation

被引:0
|
作者
Liu, Danni [1 ]
Wang, Changhan [2 ]
Gong, Hongyu [2 ]
Ma, Xutai [2 ,3 ]
Tang, Yun [2 ]
Pino, Juan [2 ]
机构
[1] Maastricht Univ, Maastricht, Netherlands
[2] Meta AI, Menlo Pk, CA USA
[3] Johns Hopkins Univ, Baltimore, MD 21218 USA
来源
关键词
speech translation; text-to-speech; low-latency;
D O I
10.21437/Interspeech.2022-10568
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech-to-speech translation (S2ST) converts input speech to speech in another language. A challenge of delivering S2ST in real time is the accumulated delay between the translation and speech synthesis modules. While recently incremental text-to-speech (iTTS) models have shown large quality improvements, they typically require additional future text inputs to reach optimal performance. In this work, we minimize the initial waiting time of iTTS by adapting the upstream speech translator to generate high-quality pseudo lookahead for the speech synthesizer. After mitigating the initial delay, we demonstrate that the duration of synthesized speech also plays a crucial role on latency. We formalize this as a latency metric and then present a simple yet effective duration-scaling approach for latency reduction. Our approaches consistently reduce latency by 0.2-0.5 second without sacrificing speech translation quality.(1)
引用
收藏
页码:1771 / 1775
页数:5
相关论文
共 50 条
  • [1] Incremental Dialog Clustering For Speech-to-Speech Translation
    Stallard, David
    Tsakalidis, Stavros
    Saleem, Shirin
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 428 - 431
  • [2] Impacts of machine translation and speech synthesis on speech-to-speech translation
    Hashimoto, Kei
    Yamagishi, Junichi
    Byrne, William
    King, Simon
    Tokuda, Keiichi
    [J]. SPEECH COMMUNICATION, 2012, 54 (07) : 857 - 866
  • [3] AN ANALYSIS OF MACHINE TRANSLATION AND SPEECH SYNTHESIS IN SPEECH-TO-SPEECH TRANSLATION SYSTEM
    Hashimoto, Kei
    Yamagishi, Junichi
    Byrne, William
    King, Simon
    Tokuda, Keiichi
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5108 - 5111
  • [4] Unsupervised features from text for speech synthesis in a speech-to-speech translation system
    Watts, Oliver
    Zhou, Bowen
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2164 - 2167
  • [5] Fluent and Low-latency Simultaneous Speech-to-Speech Translation with Self-adaptive Training ☆
    Zheng, Renjie
    Ma, Mingbo
    Zheng, Baigong
    Liu, Kaibo
    Yuan, Jiahong
    Church, Kenneth
    Huang, Liang
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 3928 - 3937
  • [6] SIMULTANEOUS SPEECH-TO-SPEECH TRANSLATION SYSTEM WITH TRANSFORMER-BASED INCREMENTAL ASR, MT, AND TTS
    Fukuda, Ryo
    Novitasari, Sashi
    Oka, Yui
    Kano, Yasumasa
    Yano, Yuki
    Ko, Yuka
    Tokuyama, Hirotaka
    Doi, Kosuke
    Yanagita, Tomoya
    Sakti, Sakriani
    Sudoh, Katsuhito
    Nakamura, Satoshi
    [J]. 2021 24TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2021, : 186 - 192
  • [7] From Speech-to-Speech Translation to Automatic Dubbing
    Federico, Marcello
    Enyedi, Robert
    Barra-Chicote, Roberto
    Giri, Ritwik
    Isik, Umut
    Krishnaswamy, Arvindh
    Sawaf, Hassan
    [J]. 17TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION (IWSLT 2020), 2020, : 257 - 264
  • [8] EVALUATING DIFFERENT CONFIRMATION STRATEGIES FOR SPEECH-TO-SPEECH TRANSLATION SYSTEMS
    Stallard, David
    Prasad, Rohit
    Ananthakrishnan, Shankar
    Choi, Fred
    Saleem, Shirin
    Natarajan, Prem
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5218 - 5221
  • [9] Hierarchical Classification for Speech-to-Speech Translation
    Ettelaie, Emil
    Georgiou, Panayiotis G.
    Narayanan, Shrikanth S.
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2534 - 2537
  • [10] The NESPOLE! speech-to-speech translation system
    Lavie, A
    Levin, L
    Frederking, R
    Pianesi, F
    [J]. MACHINE TRANSLATION: FROM RESEARCH TO REAL USERS, 2002, 2499 : 240 - 243