Preserving Word-level Emphasis in Speech-to-speech Translation using Linear Regression HSMMs

被引:0
|
作者
Quoc Truong Do [1 ]
Takamichi, Shinnosuke [1 ]
Sakti, Sakriani [1 ]
Neubig, Graham [1 ]
Toda, Tomoki [1 ]
Nakamura, Satoshi [1 ]
机构
[1] Nara Inst Sci & Technol, Grad Sch Informat Sci, Ikoma, Nara, Japan
关键词
speech translation; paralinguistic translation; emphasis estimation; emphasis translation;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In speech, emphasis is an important type of paralinguistic information that helps convey the focus of an utterance, new information, and emotion. If emphasis can be incorporated into a speech-to-speech (S2S) translation system, it will be possible to convey this information across the language barrier. However, previous related work focuses only on the translation of particular prosodic features, such as F-0, or works with emphasis but focuses on extremely small vocabularies, such as the 10 digits. In this paper, we describe a new S2S method that is able to translate the emphasis across languages and consider multiple features of emphasis such as power, F0, and duration over larger vocabularies. We do so by introducing two new components: word-level emphasis estimation using linear regression hidden semi-Markov models, and emphasis translation that translates the word-level emphasis to the target language with conditional random fields. The text-to-speech synthesis system is also modified to be able to synthesize emphasized speech. The result shows that our system can translate the emphasis correctly with 91.6% F-measure for objective test, and 87.8% for subjective test.
引用
收藏
页码:3665 / 3669
页数:5
相关论文
共 50 条
  • [1] Preserving Word-Level Emphasis in Speech-to-Speech Translation
    Quoc Truong Do
    Toda, Tomoki
    Neubig, Graham
    Sakti, Sakriani
    Nakamura, Satoshi
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (03) : 544 - 556
  • [2] WORD-LEVEL EMPHASIS MODELLING IN HMM-BASED SPEECH SYNTHESIS
    Yu, K.
    Mairesse, F.
    Young, S.
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4238 - 4241
  • [3] OUT-OF-VOCABULARY WORD DETECTION IN A SPEECH-TO-SPEECH TRANSLATION SYSTEM
    Kuo, Hong-Kwang
    Kislal, Ellen Eide
    Mangu, Lidia
    Soltau, Hagen
    Beran, Tomas
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [4] Multilingual Web Conferencing Using Speech-to-Speech Translation
    Chen, John
    Wen, Shufei
    Sridhar, Vivek Kumar Rangarajan
    Bangalore, Srinivas
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1860 - 1862
  • [5] Word-level Speech Recognition with a Letter to Word Encoder
    Collobert, Ronan
    Hannun, Awni
    Synnaeve, Gabriel
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [6] Word-level Speech Recognition with a Letter to Word Encoder
    Collobert, Ronan
    Hannun, Awni
    Synnaeve, Gabriel
    [J]. 25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [7] Direct Vs Cascaded Speech-to-Speech Translation Using Transformer
    Arya, Lalaram
    Chowdhury, Amartya Roy
    Prasanna, S. R. Mahadeva
    [J]. SPEECH AND COMPUTER, SPECOM 2023, PT II, 2023, 14339 : 258 - 270
  • [8] Fluent Personalized Speech Synthesis with Prosodic Word-Level Spontaneous Speech generation
    Huang, Yi-Chin
    Wu, Chung-Hsien
    Shie, Ming-Ge
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 294 - 298
  • [9] Affinity Maturation of Homophones in Word-Level Speech Recognition
    Ghosh, P.
    Chingtham, T. S.
    Ghose, M. K.
    [J]. RECENT DEVELOPMENTS IN MACHINE LEARNING AND DATA ANALYTICS, 2019, 740 : 137 - 142
  • [10] MODEL FOR WORD-LEVEL CONVERSION OF ARBITRARY TEXT TO SPEECH
    ALLEN, J
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1973, 53 (01): : 356 - &