ASR FOR LOW-RESOURCED LANGUAGES: BUILDING A PHONETICALLY BALANCED ROMANIAN SPEECH CORPUS

被引:0
|
作者
Stanescu , Miruna [1 ]
Cucu, Horia [1 ]
Buzo, Andi [1 ]
Burileanu, Corneliu [1 ]
机构
[1] Univ Politehn Bucuresti, Bucharest, Romania
关键词
ASR; corpora acquisition; corpora processing; diacritics restoration;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The construction of automatic speech recognition (ASR) systems is fundamentally dependent on the speech corpus used to train the acoustic models. The speech corpus should be phonetically balanced to assure that the acoustic models are properly trained. This paper presents the design and development of the first phonetically balanced Romanian speech corpus. It describes all the language processing steps taken in order to obtain a proper set of phrases, discusses some important aspects regarding Romanian phonetics and emphasizes the phrase selection mechanism.
引用
收藏
页码:2060 / 2064
页数:5
相关论文
共 50 条
  • [1] ASR DOMAIN ADAPTATION METHODS FOR LOW-RESOURCED LANGUAGES: APPLICATION TO ROMANIAN LANGUAGE
    Cucu, Horia
    Besacier, Laurent
    Burileanu, Corneliu
    Buzo, Andi
    [J]. 2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 1648 - 1652
  • [2] Multilingual Neural Semantic Parsing for Low-Resourced Languages
    Xia, Menglin
    Monti, Emilio
    [J]. 10TH CONFERENCE ON LEXICAL AND COMPUTATIONAL SEMANTICS (SEM 2021), 2021, : 185 - 194
  • [3] Acoustic Modeling with Bootstrap and Restructuring for Low-resourced Languages
    Cui, Xiaodong
    Xue, Jian
    Dognin, Pierre L.
    Chaudhari, Upendra V.
    Zhou, Bowen
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2974 - 2977
  • [4] Neural Machine Translation for Low-Resourced Indian Languages
    Choudhary, Himanshu
    Rao, Shivansh
    Rohilla, Rajesh
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3610 - 3615
  • [5] Surface Realization Architecture for Low-resourced African Languages
    Mahlaza, Zola
    Keet, C. Maria
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (03)
  • [6] EFFECTIVE KEYWORD SEARCH FOR LOW-RESOURCED CONVERSATIONAL SPEECH
    Lileikyte, Rasa
    Fraga-Silva, Thiago
    Lamel, Lori
    Gauvain, Jean-Luc
    Laurent, Antoine
    Huang, Guangpu
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5785 - 5789
  • [7] An efficient algorithm to select phonetically balanced scripts for constructing a speech corpus
    Liang, MS
    Lyu, RY
    Chiang, YC
    [J]. 2003 INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, PROCEEDINGS, 2003, : 433 - 437
  • [8] Hidden Markov Acoustic Modeling With Bootstrap and Restructuring for Low-Resourced Languages
    Cui, Xiaodong
    Xue, Jian
    Chen, Xin
    Olsen, Peder A.
    Dognin, Pierre L.
    Chaudhari, Upendra V.
    Hershey, John R.
    Zhou, Bowen
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (08): : 2252 - 2264
  • [9] A Linguistics-Driven Approach to Statistical Parsing for Low-Resourced Languages
    Boonkwan, Prachya
    Supnithi, Thepchai
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2015, E98D (05): : 1045 - 1052
  • [10] Towards Mental Health Analysis in Social Media for Low-resourced Languages
    Garg, Muskan
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (03)