Unsupervised phonetic and word level discovery for speech to speech translation for unwritten languages

被引:2
|
作者
Hillis, Steven [1 ]
Kumar, Anushree Prasanna [1 ]
Black, Alan W. [1 ]
机构
[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
来源
关键词
speech-to-speech; machine translation; segmentation; unit discovery; low-resource; unwritten languages; Wilderness;
D O I
10.21437/Interspeech.2019-3026
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
We experiment with unsupervised methods for deriving and clustering symbolic representations of speech, working towards speech-to-speech translation for languages without regular (or any) written representations. We consider five low-resource African languages, and we produce three different segmental representations of text data for comparisons against four different segmental representations derived solely from acoustic data for each language. The text and speech data for each language comes from the CMU Wilderness dataset introduced in [1], where speakers read a version of the New Testament in their language. Our goal is to evaluate the translation performance not only of acoustically derived units but also of discovered sequences or "words" made from these units, with the intuition that such representations will encode more meaning than phones alone. We train statistical machine translation models for each representation and evaluate their outputs on the basis of BLEU-1 scores to determine their efficacy. Our experiments produce encouraging results: as we cluster our atomic phonetic representations into more word-like units, the amount information retained generally approaches that of the actual words themselves.
引用
收藏
页码:1138 / 1142
页数:5
相关论文
共 50 条
  • [1] UWSpeech: Speech to Speech Translation for Unwritten Languages
    Zhang, Chen
    Tan, Xu
    Ren, Yi
    Qin, Tao
    Zhang, Kejun
    Liu, Tie-Yan
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14319 - 14327
  • [2] AUTOMATIC DISCOVERY OF A PHONETIC INVENTORY FOR UNWRITTEN LANGUAGES FOR STATISTICAL SPEECH SYNTHESIS
    Muthukumar, Prasanna Kumar
    Black, Alan W.
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [3] Speech Technology for Unwritten Languages
    Scharenborg, Odette
    Besacier, Laurent
    Black, Alan
    Hasegawa-Johnson, Mark
    Metzee, Florian
    Neubig, Graham
    Stueker, Sebastian
    Godard, Pierre
    Mueller, Markus
    Ondel, Lucas
    Palaskar, Shruti
    Arthur, Philip
    Ciannella, Francesco
    Du, Mingxing
    Larsen, Elfin
    Merkx, Danny
    Riad, Rachid
    Wang, Liming
    Dupoux, Emmanuel
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 964 - 975
  • [4] Adaptation of Unsupervised Term Discovery for Speech to Sign Languages
    Polat, Korhan
    Saraclar, Murat
    [J]. 2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
  • [5] Preserving Word-Level Emphasis in Speech-to-Speech Translation
    Quoc Truong Do
    Toda, Tomoki
    Neubig, Graham
    Sakti, Sakriani
    Nakamura, Satoshi
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (03) : 544 - 556
  • [6] Unsupervised word acquisition from speech using pattern discovery
    Park, Alex
    Glass, James R.
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 409 - 412
  • [7] Deriving phonetic transcriptions and discovering word segmentations for speech-to-speech translation in low-resource settings
    Wilkinson, Andrew
    Zhao, Tiancheng
    Black, Alan W.
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3086 - 3090
  • [8] Unsupervised pattern discovery in speech
    Park, Alex S.
    Glass, James R.
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (01): : 186 - 197
  • [9] UNSUPERVISED WORD-LEVEL PROSODY TAGGING FOR CONTROLLABLE SPEECH SYNTHESIS
    Guo, Yiwei
    Du, Chenpeng
    Yu, Kai
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7597 - 7601
  • [10] Simple and Effective Unsupervised Speech Translation
    Wang, Changhan
    Inaguma, Hirofumi
    Chen, Peng-Jen
    Kulikov, Ilia
    Tang, Yun
    Hsu, Wei-Ning
    Auli, Michael
    Pino, Juan
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 10771 - 10784