Developing Speech Resources from Parliamentary Data for South African English

被引:4
|
作者
de Wet, Febe [1 ]
Badenhorst, Jaco [1 ]
Modipa, Thipe [1 ]
机构
[1] CSIR Meraka, Human Language Technol Res Grp, Pretoria, South Africa
关键词
Under-resourced languages; speech data; South African English; automatic alignment;
D O I
10.1016/j.procs.2016.04.028
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The official languages of South Africa can still be classified as under-resourced with respect to the speech resources that are required for technology development. Harvesting speech data from existing sources is one means to create additional resources. The aim of the study reported on in this paper was to improve the harvesting and transcription accuracy of a corpus derived from parliamentary data. This aim was achieved by improving on the text normalisation process and pronunciation modelling as well as by iteratively training more accurate in-domain acoustic models. In this manner, more data could be harvested with higher confidence than using baseline pronunciation dictionaries and out-of-domain speech data. (C) 2016 Published by Elsevier B.V.
引用
收藏
页码:45 / 52
页数:8
相关论文
共 50 条
  • [1] South African English Speech Development: Preliminary Data from Typically Developing Preschool Children in Cape Town
    Pascoe, Michelle
    Mahura, Olebeng
    Le Roux, Jane
    [J]. CLINICAL LINGUISTICS & PHONETICS, 2018, 32 (12) : 1145 - 1161
  • [2] Capitalising on North American speech resources for the development of a South African English large vocabulary speech recognition system
    Kamper, Herman
    de Wet, Febe
    Hain, Thomas
    Niesler, Thomas
    [J]. COMPUTER SPEECH AND LANGUAGE, 2014, 28 (06): : 1255 - 1268
  • [3] Developing Text Resources for Ten South African Languages
    Eiselen, Roald
    Puttkammer, Martin J.
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 3698 - 3703
  • [4] The impact of accent identification errors on speech recognition of South African English
    Kamper, Herman
    Niesler, Thomas R.
    [J]. SOUTH AFRICAN JOURNAL OF SCIENCE, 2014, 110 (1-2) : 63 - 68
  • [5] Automatic Speech Recognition of English-isiZulu Code-switched Speech from South African Soap Operas
    van der Westhuizen, Ewald
    Niesler, Thomas
    [J]. SLTU-2016 5TH WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGIES FOR UNDER-RESOURCED LANGUAGES, 2016, 81 : 121 - 127
  • [6] Parliamentary Speech from the Perspective of the Speech Act Theory
    Mazurova, Helena
    [J]. JAZYK A POLITIKA: NA POMEDZI LINGVISTIKY A POLITOLOGIE III. BETWEEN LINGUISTICS AND POLITICAL SCIENCE III, 2018, : 84 - 90
  • [7] Contrast, Contact, Convergence? Afrikaans and English Modal Auxiliaries in South African Parliamentary Discourse (1925-1985)
    van Rooy, Bertus
    Kotze, Haidee
    [J]. CONTRASTIVE PRAGMATICS, 2022, 3 (02): : 159 - 193
  • [8] The Formation of South African English
    Bekker, Ian
    [J]. ENGLISH TODAY, 2013, 29 (01) : 3 - 9
  • [9] English from Scratch: Preadolescents' Developing Use of English Lexical Resources in Belgian Dutch
    Schuring, Melissa
    Zenner, Eline
    [J]. FRONTIERS IN COMMUNICATION, 2022, 6
  • [10] DEVELOPING CURRICULA FOR ENGLISH FOR OCCUPATIONAL PURPOSES: A CASE STUDY AT A SOUTH AFRICAN UNIVERSITY OF TECHNOLOGY
    Rautenbach, E.
    Mann, C. C.
    van Ryneveld, L.
    [J]. SOUTH AFRICAN JOURNAL OF HIGHER EDUCATION, 2018, 32 (02) : 237 - 257