Massively Multilingual Pronunciation Mining with WikiPron

被引:0
|
作者
Lee, Jackson L.
Ashby, Lucas F. E. [1 ]
Garza, M. Elizabeth [1 ]
Lee-Sikka, Yeonju [1 ]
Miller, Sean [1 ]
Wong, Alan [1 ]
McCarthy, Arya D. [2 ]
Gorman, Kyle [1 ]
机构
[1] CUNY, Grad Ctr, New York, NY 10021 USA
[2] Johns Hopkins Univ, Ctr Language & Speech Proc, Baltimore, MD 21218 USA
关键词
speech; pronunciation; grapheme-to-phoneme; g2p; MODELS;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We introduce WikiPron, an open-source command-line tool for extracting pronunciation data from Wiktionary, a collaborative multilingual online dictionary. We first describe the design and use of WikiPron. We then discuss the challenges faced scaling this tool to create an automatically-generated database of 1.7 million pronunciations from 165 languages. Finally, we validate the pronunciation database by using it to train and evaluating a collection of generic grapheme-to-phoneme models. The software, pronunciation data, and models are all made available under permissive open-source licenses.
引用
收藏
页码:4223 / 4228
页数:6
相关论文
共 50 条
  • [1] Multilingual pronunciation by analogy
    Information: Signals, Images, Systems Research Group, School of Electronics and Computer Science, University of Southampton, Southampton SO17 1BJ, United Kingdom
    不详
    Nat Lang Eng, 2008, 4 (527-546):
  • [2] Massively Multilingual Lexical Specialization of Multilingual Transformers
    Green, Tommaso
    Ponzetto, Simone Paolo
    Glavas, Goran
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 7700 - 7715
  • [3] Massively Multilingual Transfer for NER
    Rahimi, Afshin
    Li, Yuan
    Cohn, Trevor
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 151 - 164
  • [4] Category Similarity in Multilingual Pronunciation Training
    Koreman, Jacques
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2578 - 2582
  • [5] Massively Multilingual Neural Machine Translation
    Aharoni, Roee
    Johnson, Melvin
    Firat, Orhan
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 3874 - 3884
  • [6] Massively Multilingual Adversarial Speech Recognition
    Adams, Oliver
    Wiesner, Matthew
    Watanabe, Shinji
    Yarowsky, David
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 96 - 108
  • [7] Collaboration in the Production of a Massively Multilingual Lexicon
    Benjamin, Martin
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,
  • [8] Assessment of Massively Multilingual Sentiment Classifiers
    Rajda, Krzysztof
    Augustyniak, Lukasz
    Gramacki, Piotr
    Gruza, Marcin
    Wozniak, Szymon
    Kajdanowicz, Tomasz
    PROCEEDINGS OF THE 12TH WORKSHOP ON COMPUTATIONAL APPROACHES TO SUBJECTIVITY, SENTIMENT & SOCIAL MEDIA ANALYSIS, 2022, : 125 - 140
  • [9] CoVoST 2 and Massively Multilingual Speech Translation
    Wang, Changhan
    Wu, Anne
    Gu, Jiatao
    Pino, Juan
    INTERSPEECH 2021, 2021, : 2247 - 2251
  • [10] MASSIVELY MULTILINGUAL ASR: A LIFELONG LEARNING SOLUTION
    Li, Bo
    Pang, Ruoming
    Zhang, Yu
    Sainath, Tara N.
    Strohman, Trevor
    Haghani, Parisa
    Zhu, Yun
    Farris, Brian
    Gaur, Neeraj
    Prasad, Manasa
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6397 - 6401