Preserving Word-Level Emphasis in Speech-to-Speech Translation

被引:18
|
作者
Quoc Truong Do [1 ]
Toda, Tomoki [2 ]
Neubig, Graham [1 ]
Sakti, Sakriani [1 ]
Nakamura, Satoshi [1 ]
机构
[1] Nara Inst Sci & Technol, Grad Sch Informat Sci, Nara 6300192, Japan
[2] Nagoya Univ, Ctr Informat Technol, Nagoya, Aichi 4648601, Japan
关键词
Emphasis estimation; word-level emphasis; intent; emphasis translation; speech-to-speech translation;
D O I
10.1109/TASLP.2016.2643280
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech-to-speech translation (S2ST) is a technology that translates speech across languages, which can remove barriers in cross-lingual communication. In the conventional S2ST systems, the linguistic meaning of speech was translated, but paralinguistic information conveying other features of the speech such as emotion or emphasis were ignored. In this paper, we propose a method to translate paralinguistic information, specifically focusing on emphasis. The method consists of a series of components that can accurately translate emphasis using all acoustic features of speech. First, linear-regression hidden semi-Markov models (LR-HSMMs) are used to estimate a real-numbered emphasis value for every word in an utterance, resulting in a sequence of values for the utterance. After that the emphasis translation module translates the estimated emphasis sequence into a target language emphasis sequence using a conditional random field model considering the features of emphasis levels, words, and part-of-speech tags. Finally, the speech synthesis module synthesizes emphasized speech with LR-HSMMs, taking into account the translated emphasis sequence and transcription. The results indicate that our translation model can translate emphasis information, correctly emphasizing words in the target language with 91.6% F-measure by objective evaluation. A listening test with human subjects further showed that they could identify the emphasized words with 87.8% F-measure, and that the naturalness of the audio was preserved.
引用
收藏
页码:544 / 556
页数:13
相关论文
共 50 条
  • [31] MODEL FOR WORD-LEVEL CONVERSION OF ARBITRARY TEXT TO SPEECH
    ALLEN, J
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1973, 53 (01): : 356 - &
  • [32] WORD-LEVEL TONE MODELING FOR MANDARIN SPEECH RECOGNITION
    Lei, Xin
    Ostendorf, Mari
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 665 - +
  • [33] Predicting dialogue acts for a speech-to-speech translation system
    Reithinger, N
    Engel, R
    Kipp, M
    Klesen, M
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 654 - 657
  • [34] A speech-to-speech translation system for Catalan, Spanish, and English
    Arranz, V
    Comelles, E
    Farwell, D
    Nadeu, C
    Padrell, J
    Febrer, A
    Alexander, D
    Peterson, K
    [J]. MACHINE TRANSLATION: FROM REAL USERS TO RESEARCH, PROCEEDINGS, 2004, 3265 : 7 - 16
  • [35] CVSS Corpus and Massively Multilingual Speech-to-Speech Translation
    Jia, Ye
    Ramanovich, Michelle Tadmor
    Wang, Quan
    Zen, Heiga
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6691 - 6703
  • [36] A hand-held speech-to-speech translation system
    Zhou, BW
    Gao, YQ
    Sorensen, J
    Déchelotte, D
    Picheny, M
    [J]. ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, : 664 - 669
  • [37] Speech-to-speech translation services for the Olympic Games 2008
    Stueker, Sebastian
    Zong, Chengqing
    Reichert, Juergen
    Cao, Wenjie
    Kolss, Muntsin
    Xie, Guodong
    Peterson, Kay
    Ding, Peng
    Arranz, Victoria
    Yu, Jian
    Waibel, Alex
    [J]. MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2006, 4299 : 297 - +
  • [38] SPEECH-TO-SPEECH TRANSLATION BETWEEN UNTRANSCRIBED UNKNOWN LANGUAGES
    Tjandra, Andros
    Sakti, Sakriani
    Nakamura, Satoshi
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 593 - 600
  • [39] CORBA-based speech-to-speech translation system
    Gruhn, R
    Takashima, K
    Nishino, A
    Nakamura, S
    [J]. ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 355 - 358
  • [40] TECNOPARLA - Speech technologies for Catalan and its application to Speech-to-speech Translation
    Schulz, Henrik
    Costa-Jussa, Marta R.
    Fonollosa, Jose A. R.
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2008, (41): : 319 - 320