Emotional Speech Recognition and Synthesis in Multiple Languages toward Affective Speech-to-Speech Translation System

被引：3

作者：

Akagi, Masato ^{[1
]}

Han, Xiao ^{[1
]}

Elbarougy, Reda ^{[1
]}

Hamada, Yasuhiro ^{[1
]}

Li, Junfeng ^{[2
]}

机构：

[1] Japan Adv Inst Sci & Technol, Sch Informat Sci, Nomi, Ishikawa, Japan

[2] Chinese Acad Sci, Inst Acoust, Beijing, Peoples R China

来源：

2014 TENTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2014) | 2014年

关键词：

Speech-to-speech translation (S2ST) system; paralinguistic and non-linguistic information; emotion recognition/synthesis; multiple languages; QUALITY;

D O I：

10.1109/IIH-MSP.2014.148

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speech-to-speech translation (S2ST) is the process by which a spoken utterance in one language is used to produce a spoken output in another language. The cnventional approach to S2ST has focused on processing linguistic information only by directly translating the spoken utterance from the source language to the target language without taking into account para-linguistic and non-linguistic information such as the emotional states at play in the source language. In this work, we explore how to deal with para-and non-linguistic information among multiple languages, with a particular focus on speakers' emotional states, in S2ST scenarios called "affective S2ST." In our efforts to construct an effective system, we discuss (1) how to describe emotions in speech and how to model the perception/production of emotions and (2) the commonality and differences among multiple languages in the proposed model. We then use these discussions as context for (3) an examination of our "affective S2ST" system in operation.

引用

页码：574 / 577

页数：4

共 50 条

[1] Toward Affective Speech-to-Speech Translation: Strategy for Emotional Speech Recognition and Synthesis in Multiple Languages
Akagi, Masato
Han, Xiao
Elbarougy, Reda
Hamada, Yasuhiro
Li, Junfeng
[J]. 2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
[2] JANUS-III: Speech-to-speech translation in multiple languages
Lavie, A
Waibel, A
Levin, L
Finke, M
Gates, D
Gavalda, M
Zeppenfeld, T
Zhan, PM
[J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 99 - 102
[3] AN ANALYSIS OF MACHINE TRANSLATION AND SPEECH SYNTHESIS IN SPEECH-TO-SPEECH TRANSLATION SYSTEM
Hashimoto, Kei
Yamagishi, Junichi
Byrne, William
King, Simon
Tokuda, Keiichi
[J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5108 - 5111
[4] Impacts of machine translation and speech synthesis on speech-to-speech translation
Hashimoto, Kei
Yamagishi, Junichi
Byrne, William
King, Simon
Tokuda, Keiichi
[J]. SPEECH COMMUNICATION, 2012, 54 (07) : 857 - 866
[5] The NESPOLE! speech-to-speech translation system
Lavie, A
Levin, L
Frederking, R
Pianesi, F
[J]. MACHINE TRANSLATION: FROM RESEARCH TO REAL USERS, 2002, 2499 : 240 - 243
[6] SPEECH-TO-SPEECH TRANSLATION BETWEEN UNTRANSCRIBED UNKNOWN LANGUAGES
Tjandra, Andros
Sakti, Sakriani
Nakamura, Satoshi
[J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 593 - 600
[7] Pattern recognition approaches for speech-to-speech translation
Casacuberta, F
Vidal, E
Sanchis, A
Vilar, JM
[J]. CYBERNETICS AND SYSTEMS, 2004, 35 (01) : 3 - 17
[8] Unsupervised features from text for speech synthesis in a speech-to-speech translation system
Watts, Oliver
Zhou, Bowen
[J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2164 - 2167
[9] Multilingual speech-to-speech translation system: VoiceTra
Matsuda, Shigeki
Hu, Xinhui
Shiga, Yoshinori
Kashioka, Hideki
Hori, Chiori
Yasuda, Keiji
Okuma, Hideo
Uchiyama, Masao
Sumita, Eiichiro
Kawai, Hisashi
Nakamura, Satoshi
[J]. 2013 IEEE 14TH INTERNATIONAL CONFERENCE ON MOBILE DATA MANAGEMENT (MDM 2013), VOL 2, 2013, : 229 - 233
[10] The ATR multilingual speech-to-speech translation system
Nakamura, S
Markov, K
Nakaiwa, H
Kikui, G
Kawai, H
Jitsuhiro, T
Zhang, JS
Yamamoto, H
Sumita, E
Yamamoto, S
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (02): : 365 - 376

← 1 2 3 4 5 →