SPEECH-TO-SPEECH TRANSLATION BETWEEN UNTRANSCRIBED UNKNOWN LANGUAGES

被引：0

作者：

Tjandra, Andros ^{[1
]}

Sakti, Sakriani ^{[1
,2
]}

Nakamura, Satoshi ^{[1
,2
]}

机构：

[1] Nara Inst Sci & Technol, Ikoma, Japan

[2] RIKEN, Ctr Adv Intelligence Project AIP, Tokyo, Japan

来源：

2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019) | 2019年

关键词：

speech translation; sequence-to-sequence; zero-resource modeling; unit discovery; autoencoder; CHAIN;

D O I：

10.1109/asru46091.2019.9003853

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we explore a method for training speech-to-speech translation tasks without any transcription or linguistic supervision. Our proposed method consists of two steps: First, we train and generate discrete representation with unsupervised term discovery with a discrete quantized autoencoder. Second, we train a sequence-to-sequence model that directly maps the source language speech to the target languages discrete representation. Our proposed method can directly generate target speech without any auxiliary or pre-training steps with a source or target transcription. To the best of our knowledge, this is the first work that performed pure speech-to-speech translation between untranscribed unknown languages.

引用

页码：593 / 600

页数：8

共 50 条

[1] JANUS-III: Speech-to-speech translation in multiple languages
Lavie, A
Waibel, A
Levin, L
Finke, M
Gates, D
Gavalda, M
Zeppenfeld, T
Zhan, PM
[J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 99 - 102
[2] Impacts of machine translation and speech synthesis on speech-to-speech translation
Hashimoto, Kei
Yamagishi, Junichi
Byrne, William
King, Simon
Tokuda, Keiichi
[J]. SPEECH COMMUNICATION, 2012, 54 (07) : 857 - 866
[3] Toward Affective Speech-to-Speech Translation: Strategy for Emotional Speech Recognition and Synthesis in Multiple Languages
Akagi, Masato
Han, Xiao
Elbarougy, Reda
Hamada, Yasuhiro
Li, Junfeng
[J]. 2014 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2014,
[4] Emotional Speech Recognition and Synthesis in Multiple Languages toward Affective Speech-to-Speech Translation System
Akagi, Masato
Han, Xiao
Elbarougy, Reda
Hamada, Yasuhiro
Li, Junfeng
[J]. 2014 TENTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2014), 2014, : 574 - 577
[5] The NESPOLE! speech-to-speech translation system
Lavie, A
Levin, L
Frederking, R
Pianesi, F
[J]. MACHINE TRANSLATION: FROM RESEARCH TO REAL USERS, 2002, 2499 : 240 - 243
[6] Hierarchical Classification for Speech-to-Speech Translation
Ettelaie, Emil
Georgiou, Panayiotis G.
Narayanan, Shrikanth S.
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2534 - 2537
[7] Towards Machine Speech-to-speech Translation
Satoshi, Nakamura
Sudoh, Katsuhito
Sakti, Sakriani
[J]. TRADUMATICA-TRADUCCIO I TECNOLOGIES DE LA INFORMACIO I LA COMUNICACIO, 2019, (17): : 81 - 87
[8] Prosody generation for speech-to-speech translation
Aguero, Pablo Daniel
Adell, Jordi
Bonafonte, Antonio
[J]. 2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 557 - 560
[9] Contextual reasoning in speech-to-speech translation
Koch, S
Küssner, U
Stede, M
Tidhar, D
[J]. NATURAL LANGUAGE PROCESSING-NLP 2000, PROCEEDINGS, 2000, 1835 : 283 - 292
[10] Language Identification for Speech-to-Speech Translation
Lim, Daniel Chung Yong
Lane, Ian
[J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 204 - 207

← 1 2 3 4 5 →