Unsupervised phonetic and word level discovery for speech to speech translation for unwritten languages

被引：2

作者：

Hillis, Steven ^{[1
]}

Kumar, Anushree Prasanna ^{[1
]}

Black, Alan W. ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA

来源：

INTERSPEECH 2019 | 2019年

关键词：

speech-to-speech; machine translation; segmentation; unit discovery; low-resource; unwritten languages; Wilderness;

D O I：

10.21437/Interspeech.2019-3026

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

We experiment with unsupervised methods for deriving and clustering symbolic representations of speech, working towards speech-to-speech translation for languages without regular (or any) written representations. We consider five low-resource African languages, and we produce three different segmental representations of text data for comparisons against four different segmental representations derived solely from acoustic data for each language. The text and speech data for each language comes from the CMU Wilderness dataset introduced in [1], where speakers read a version of the New Testament in their language. Our goal is to evaluate the translation performance not only of acoustically derived units but also of discovered sequences or "words" made from these units, with the intuition that such representations will encode more meaning than phones alone. We train statistical machine translation models for each representation and evaluate their outputs on the basis of BLEU-1 scores to determine their efficacy. Our experiments produce encouraging results: as we cluster our atomic phonetic representations into more word-like units, the amount information retained generally approaches that of the actual words themselves.

引用

页码：1138 / 1142

页数：5

共 50 条

[1] UWSpeech: Speech to Speech Translation for Unwritten Languages
Zhang, Chen
Tan, Xu
Ren, Yi
Qin, Tao
Zhang, Kejun
Liu, Tie-Yan
[J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14319 - 14327
[2] AUTOMATIC DISCOVERY OF A PHONETIC INVENTORY FOR UNWRITTEN LANGUAGES FOR STATISTICAL SPEECH SYNTHESIS
Muthukumar, Prasanna Kumar
Black, Alan W.
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[3] Speech Technology for Unwritten Languages
Scharenborg, Odette
Besacier, Laurent
Black, Alan
Hasegawa-Johnson, Mark
Metzee, Florian
Neubig, Graham
Stueker, Sebastian
Godard, Pierre
Mueller, Markus
Ondel, Lucas
Palaskar, Shruti
Arthur, Philip
Ciannella, Francesco
Du, Mingxing
Larsen, Elfin
Merkx, Danny
Riad, Rachid
Wang, Liming
Dupoux, Emmanuel
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 964 - 975
[4] Adaptation of Unsupervised Term Discovery for Speech to Sign Languages
Polat, Korhan
Saraclar, Murat
[J]. 2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
[5] Preserving Word-Level Emphasis in Speech-to-Speech Translation
Quoc Truong Do
Toda, Tomoki
Neubig, Graham
Sakti, Sakriani
Nakamura, Satoshi
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (03) : 544 - 556
[6] Unsupervised word acquisition from speech using pattern discovery
Park, Alex
Glass, James R.
[J]. 2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 409 - 412
[7] Deriving phonetic transcriptions and discovering word segmentations for speech-to-speech translation in low-resource settings
Wilkinson, Andrew
Zhao, Tiancheng
Black, Alan W.
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3086 - 3090
[8] Unsupervised pattern discovery in speech
Park, Alex S.
Glass, James R.
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (01): : 186 - 197
[9] UNSUPERVISED WORD-LEVEL PROSODY TAGGING FOR CONTROLLABLE SPEECH SYNTHESIS
Guo, Yiwei
Du, Chenpeng
Yu, Kai
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7597 - 7601
[10] Simple and Effective Unsupervised Speech Translation
Wang, Changhan
Inaguma, Hirofumi
Chen, Peng-Jen
Kulikov, Ilia
Tang, Yun
Hsu, Wei-Ning
Auli, Michael
Pino, Juan
[J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 10771 - 10784

← 1 2 3 4 5 →