Using a hybrid approach to build a pronunciation dictionary for Brazilian Portuguese1

被引:0
|
作者
Mendonca, Gustavo [1 ]
Aluisio, Sandra [1 ]
机构
[1] Univ Sao Paulo, Inst Ciencias Matemat & Comp, Sao Paulo, Brazil
关键词
pronunciation dictionary; grapheme to phoneme conversion; text to speech;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes the method employed to build a machine-readable pronunciation dictionary for Brazilian Portuguese. The dictionary makes use of a hybrid approach for converting graphemes into phonemes, based on both manual transcription rules and machine learning algorithms. It makes use of a word list compiled from the Portuguese Wikipedia dump. Wikipedia articles were transformed into plain text, tokenized and word types were extracted. A language identification tool was developed to detect loanwords among data. Words' syllable boundaries and stress were identified. The transcription task was carried out in a two-step process: i) words are submitted to a set of transcription rules, in which predictable graphemes (mostly consonants) are transcribed; ii) a machine learning classifier is used to predict the transcription of the remaining graphemes (mostly vowels). The method was evaluated through 5-fold cross-validation; results show a F1-score of 0.98. The dictionary and all the resources used to build it were made publicly available.
引用
收藏
页码:1278 / 1282
页数:5
相关论文
共 46 条
  • [1] Brazilian songs in Portuguese as an Additional Language: an approach to teaching MPB as a hybrid genre
    Parisotto, Ana Paula
    Schlatter, Margarete
    [J]. REVISTA VIRTUAL DE ESTUDOS DA LINGUAGEM-REVEL, 2020, 18 (35): : 208 - 241
  • [2] Using Cross-Linguistic Knowledge to Build VerbNet-Style Lexicons: Results for a (Brazilian) Portuguese VerbNet
    Scarton, Carolina
    Duran, Magali Sanches
    Aluisio, Sandra Maria
    [J]. COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, 2014, 8775 : 149 - 160
  • [4] A word-prediction eye-typing approach for Brazilian Portuguese entries using geometric movements
    Ramos, Guilherme M. A.
    Hanada, Raiza
    Pimentel, Maria da Graca C.
    Teixeira, Cesar A. C.
    [J]. SIGDOC'17: PROCEEDINGS OF THE 35TH ACM INTERNATIONAL CONFERENCE ON THE DESIGN OF COMMUNICATION, 2017,
  • [5] What are the risk behaviors of Brazilian and Portuguese drivers? An exploratory approach using self-reported data
    de Campos, Cintia Isabel
    Pitombo, Cira Souza
    Delhomme, Patricia
    Ferreira, Sara
    [J]. CASE STUDIES ON TRANSPORT POLICY, 2021, 9 (04) : 1746 - 1756
  • [6] Hybrid approach for ECG signal enhancement using dictionary learning-based sparse representation
    Rakshit, Manas
    Das, Susmita
    [J]. IET SCIENCE MEASUREMENT & TECHNOLOGY, 2019, 13 (03) : 381 - 391
  • [7] Querying Brazilian Educational Open Data using a Hybrid NLP-based Approach
    Antoni, Marco
    Charao, Andrea
    Franciscatto, Maria
    [J]. ICEIS: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS - VOL 2, 2021, : 120 - 130
  • [8] Using a Build-and-Click Approach for Producing Structural and Functional Diversity in DNA-Targeted Hybrid Anticancer Agents
    Ding, Song
    Qiao, Xin
    Kucera, Gregory L.
    Bierbach, Ulrich
    [J]. JOURNAL OF MEDICINAL CHEMISTRY, 2012, 55 (22) : 10198 - 10203
  • [9] Hybrid Approach for Sentiment Analysis of Twitter Posts Using a Dictionary-based Approach and Fuzzy Logic Methods: Study Case on Cloud Service Providers
    Alharbi, Jamilah Rabeh
    Alhalabi, Wadee S.
    [J]. INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2020, 16 (01) : 116 - 145
  • [10] Image classification based on sparse-coded features using sparse coding technique for aerial imagery: a hybrid dictionary approach
    Qayyum, Abdul
    Malik, Aamir Saeed
    Saad, Naufal M.
    Iqbal, Mahboob
    Abdullah, Mohd Faris
    Rasheed, Waqas
    Abdullah, Tuan A. B. Rashid
    Bin Jafaar, Mohd Yaqoob
    [J]. NEURAL COMPUTING & APPLICATIONS, 2019, 31 (08): : 3587 - 3607