Case Study on Data Collection of Kreol Morisien, a Low-Resourced Creole Language

被引:0
|
作者
Bastien, David Joshen [1 ]
Chumroo, Vijay Prakash [1 ]
Bastien, Johan Patrice [1 ]
机构
[1] Hydrus Labs Ltd, Roche Brunes, Rose Hill, Mauritius
来源
关键词
Natural Language Processing; Machine Learning; Speech-to-text; Information Extractor; Mauritian Creole; Data Collection;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This case study focuses on laying down the foundations for the development of Kreol Morisien NLP (KreMoN) which is a series of Natural Language Processing tools to be used to process Mauritian Creole. While most of the works done so far focuses on detailing the Machine Learning algorithms, this work focuses on the first steps needed for any low resourced language which is the collection of data. We present a process currently being used to collect audio and textual data for a low resourced language like Mauritian Creole. This data will be used to develop a speech-to-text system as well as an Information Extractor for Mauritian Creole. As part of the case study, we detail some of the works made using existing textual data in Non standardized Mauritian Creole where an NLP pre-processing pipeline adapted for low resourced languages have been developed.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Physical Activity is Associated With Monocytes and Monocyte Subsets in Low-Resourced Neighborhoods: Data From the Step It Up Physical Activity Intervention
    Saurabh, Abhinav
    Tarfa, Hannatu
    Baumer, Yvonne
    Dave, Ayushi
    Pita, Mario
    Cintron, Manuel
    Ortiz-Whittingham, Lola
    Reynolds, Sandy
    Potharaju, Kameswari
    Baez, Andrew
    Thompson, Keitra
    Baah, Foster Osei
    Ayers, Colby
    Neally, Sam
    Curlin, Kaveri
    Vijayakumar, Nithya
    Mitchell, Valerie
    Wells, Ayanna
    Marah, Marie
    Collins, Billy
    Powell-Wiley, Tiffany M.
    [J]. CIRCULATION, 2023, 148
  • [42] Attention-Based Neural Machine Translation Approach for Low-Resourced Indic Languages-A Case of Sanskrit to Hindi Translation
    Bakarola, Vishvajit
    Nasriwala, Jitendra
    [J]. SMART SYSTEMS: INNOVATIONS IN COMPUTING (SSIC 2021), 2022, 235 : 565 - 572
  • [43] Improving Tone Recognition Performance using Wav2vec 2.0-Based Learned Representation in Yoruba, a Low-Resourced Language
    Bengono Obiang, Saint Germes B.
    Tsopze, Norbert
    Melatagia Yonta, Paulin
    Bonastre, Jean-Francois
    Jiménez, Tania
    [J]. ACM Transactions on Asian and Low-Resource Language Information Processing, 2024, 23 (12)
  • [44] Exploring the Integration of Patient Generated Health Data in a FAIR Digital Health System in Low-Resourced Settings: A User-Centered Approach
    Kawu, Abdullahi Abubakar
    Kievit, Rens
    Abubakar, Adamu
    van Reisen, Mirjam
    O'Sullivan, Dympna
    Hederman, Lucy
    [J]. PROCEEDINGS OF THE 4TH AFRICAN CONFERENCE FOR HUMAN COMPUTER INTERACTION, AFRICHI 2023, 2023, : 215 - 220
  • [45] Measuring the Quality of Low-Resourced Statistical Parametric Speech Synthesis Trained with Noise-Degraded Data Supported by the University of Costa Rica
    Coto-Jimenez, Marvin
    [J]. COMPUTACION Y SISTEMAS, 2022, 26 (02): : 835 - 842
  • [46] PARTNERSHIP AND ADAPTATION TO IMPLEMENT TRAUMA-FOCUSED CBT (TF-CBT) IN A LOW-RESOURCED AND POSTDISASTER CONTEXT: THE CASE OF PUERTO RICO
    Orengo-Aguayo, Rosaura
    [J]. JOURNAL OF THE AMERICAN ACADEMY OF CHILD AND ADOLESCENT PSYCHIATRY, 2021, 60 (10): : S312 - S313
  • [47] Generation of Cross-Lingual Word Vectors for Low-Resourced Languages Using Deep Learning and Topological Metrics in a Data-Efficient Way
    Sanjanasri, J. P.
    Menon, Vijay Krishna
    Soman, K. P.
    Rajendran, S.
    Wolk, Agnieszka
    [J]. ELECTRONICS, 2021, 10 (12)
  • [48] Seeking consensus on a play-based intervention framework for promoting play of children with HIV/Aids in a low-resourced setting: A Delphi study
    Munambah, Nyaradzai
    Ramugondo, Elelwani L.
    Collins, Tracy
    Cordier, Reinie
    [J]. AUSTRALIAN OCCUPATIONAL THERAPY JOURNAL, 2024, 71 (04) : 627 - 639
  • [49] Comparing mono- & multilingual acoustic seed models for a low e-resourced language: a case-study of Luxembourgish
    Adda-Decker, Martine
    Lamel, Lori
    Snoeren, Natalie D.
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 270 - 273
  • [50] Data on the clinical, functional, and patient-reported outcomes of patient-centred rehabilitation for patients with non-communicable disease living in low-resourced settings
    Heine, Martin
    Derman, Wayne
    Muller, Ashleigh
    Fell, Brittany
    Abbas, Mumtaz
    Hanekom, Susan
    [J]. DATA IN BRIEF, 2022, 45