ASR Corpus Design for Resource-Scarce Languages

被引:0
|
作者
Barnard, Etienne [1 ]
Davel, Marelie [1 ]
van Heerden, Charl [1 ]
机构
[1] CSIR, Meraka Inst, Human Language Technol Res Grp, Pretoria, South Africa
关键词
speech recognition; corpus design;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We investigate the number of speakers and the amount of data that is required for the development of useable speaker-independent speech-recognition systems in resource-scarce languages. Our experiments employ the Lwazi corpus, which contains speech in the eleven official languages of South Africa. We find that a surprisingly small number of speakers (fewer than 50) and around 10 to 20 hours of speech per language are sufficient for the purposes of acceptable phone-based recognition.
引用
收藏
页码:2823 / 2826
页数:4
相关论文
共 50 条
  • [1] Transliteration for resource-scarce languages
    Chinnakotla M.K.
    Damani O.P.
    Satoskar A.
    [J]. ACM Transactions on Asian Language Information Processing, 2010, 9 (04):
  • [2] Metaphor Annotation in SesothoText Corpus Towards the Representation of Resource-Scarce Languages in NLP
    Mahloane, Malefu Justina
    Trausan-Matu, Stefan
    [J]. 2015 20TH INTERNATIONAL CONFERENCE ON CONTROL SYSTEMS AND COMPUTER SCIENCE, 2015, : 405 - 410
  • [3] Efficient harvesting of Internet audio for resource-scarce ASR
    Davel, Marelie H.
    van Heerden, Charl
    Kleynhans, Neil
    Barnard, Etienne
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 3160 - +
  • [4] NLP Web Services for Resource-Scarce Languages
    Puttkammer, M. J.
    Eiselen, E. R.
    Hocking, J.
    Koen, F. J.
    [J]. 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2018): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, 2018, : 43 - 49
  • [5] Automatic diacritic restoration for resource-scarce languages
    De Pauw, Guy
    Wagacha, Peter W.
    de Schryver, Gilles-Maurice
    [J]. TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2007, 4629 : 170 - +
  • [6] Developing Core Technologies for Resource-Scarce Nguni Languages
    du Toit, Jakobus S.
    Puttkammer, Martin J.
    [J]. INFORMATION, 2021, 12 (12)
  • [7] Small-Vocabulary Speech Recognition for Resource-Scarce Languages
    Qiao, Fang
    Sherwani, Jahanzeb
    Rosenfeld, Roni
    [J]. PROCEEDINGS OF THE FIRST ACM SYMPOSIUM ON COMPUTING FOR DEVELOPMENT (ACM DEV 2010), 2010,
  • [8] Viability of Neural Networks for Core Technologies for Resource-Scarce Languages
    Loubser, Melinda
    Puttkammer, Martin J.
    [J]. INFORMATION, 2020, 11 (01)
  • [9] Query Expansion in Resource-Scarce Languages: A Multilingual Framework Utilizing Document Structure
    Atreya, Arjun, V
    Kankaria, Ashish
    Bhattacharyya, Pushpak
    Ramakrishnan, Ganesh
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2016, 16 (02)
  • [10] Tower of Babel: A Crowdsourcing Game Building Sentiment Lexicons for Resource-scarce Languages
    Hong, Yoonsung
    Kwak, Haewoon
    Baek, Youngmin
    Moon, Sue
    [J]. PROCEEDINGS OF THE 22ND INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'13 COMPANION), 2013, : 549 - 556