Preserving Word-Level Emphasis in Speech-to-Speech Translation

被引:18
|
作者
Quoc Truong Do [1 ]
Toda, Tomoki [2 ]
Neubig, Graham [1 ]
Sakti, Sakriani [1 ]
Nakamura, Satoshi [1 ]
机构
[1] Nara Inst Sci & Technol, Grad Sch Informat Sci, Nara 6300192, Japan
[2] Nagoya Univ, Ctr Informat Technol, Nagoya, Aichi 4648601, Japan
关键词
Emphasis estimation; word-level emphasis; intent; emphasis translation; speech-to-speech translation;
D O I
10.1109/TASLP.2016.2643280
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech-to-speech translation (S2ST) is a technology that translates speech across languages, which can remove barriers in cross-lingual communication. In the conventional S2ST systems, the linguistic meaning of speech was translated, but paralinguistic information conveying other features of the speech such as emotion or emphasis were ignored. In this paper, we propose a method to translate paralinguistic information, specifically focusing on emphasis. The method consists of a series of components that can accurately translate emphasis using all acoustic features of speech. First, linear-regression hidden semi-Markov models (LR-HSMMs) are used to estimate a real-numbered emphasis value for every word in an utterance, resulting in a sequence of values for the utterance. After that the emphasis translation module translates the estimated emphasis sequence into a target language emphasis sequence using a conditional random field model considering the features of emphasis levels, words, and part-of-speech tags. Finally, the speech synthesis module synthesizes emphasized speech with LR-HSMMs, taking into account the translated emphasis sequence and transcription. The results indicate that our translation model can translate emphasis information, correctly emphasizing words in the target language with 91.6% F-measure by objective evaluation. A listening test with human subjects further showed that they could identify the emphasized words with 87.8% F-measure, and that the naturalness of the audio was preserved.
引用
收藏
页码:544 / 556
页数:13
相关论文
共 50 条
  • [21] The ATR multilingual speech-to-speech translation system
    Nakamura, S
    Markov, K
    Nakaiwa, H
    Kikui, G
    Kawai, H
    Jitsuhiro, T
    Zhang, JS
    Yamamoto, H
    Sumita, E
    Yamamoto, S
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (02): : 365 - 376
  • [22] The impact of ASR on speech-to-speech translation performance
    Sarikaya, Ruhi
    Zhou, Bowen
    Povey, Daniel
    Afify, Mohamed
    Gao, Yuqing
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 1289 - +
  • [23] A speech-to-speech translation based interface for tourism
    Cettolo, M
    Corazza, A
    Lazzari, G
    Pianesi, F
    Pianta, E
    Tovena, LM
    [J]. INFORMATION AND COMMUNICATION TECHNOLOGIES IN TOURISM 1999, 1999, : 191 - 200
  • [24] Finite-state speech-to-speech translation
    Vidal, E
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 111 - 114
  • [25] Incremental Dialog Clustering For Speech-to-Speech Translation
    Stallard, David
    Tsakalidis, Stavros
    Saleem, Shirin
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 428 - 431
  • [26] ASSESSING EVALUATION METRICS FOR SPEECH-TO-SPEECH TRANSLATION
    Salesky, Elizabeth
    Maeder, Julian
    Klinger, Severin
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 733 - 740
  • [27] Applications of Language Modeling in Speech-To-Speech Translation
    Liu, Fu-Hua
    Gu, Liang
    Gao, Yuqing
    Picheny, Michael
    [J]. International Journal of Speech Technology, 2004, 7 (2-3) : 221 - 229
  • [28] INTENT TRANSFER IN SPEECH-TO-SPEECH MACHINE TRANSLATION
    Anumanchipalli, Gopala Krishna
    Oliveira, Luis C.
    Black, Alan W.
    [J]. 2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 153 - 158
  • [29] Textless Speech-to-Speech Translation on Real Data
    Lee, Ann
    Gong, Hongyu
    Duquenne, Paul-Ambroise
    Schwenk, Holger
    Chen, Peng-Jen
    Wang, Changhan
    Popuri, Sravya
    Adi, Yossi
    Pino, Juan
    Gu, Jiatao
    Hsu, Wei-Ning
    [J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 860 - 872
  • [30] Fluent Personalized Speech Synthesis with Prosodic Word-Level Spontaneous Speech generation
    Huang, Yi-Chin
    Wu, Chung-Hsien
    Shie, Ming-Ge
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 294 - 298