Voicer: A Crowd Sourcing Tool for Speech Data Collection

被引:0
|
作者
Buddhika, Darshana [1 ]
Liyadipita, Ranula [1 ]
Nadeeshan, Sudeepa [1 ]
Witharana, Hasini [1 ]
Jayasena, Sanath [1 ]
Thayasivam, Uthayasanker [1 ]
机构
[1] Univ Moratuwa, Dept Comp Sci & Engn, Moratuwa, Sri Lanka
关键词
data corpus; data collection tool; low resourced languages;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Speech corpora do not exist for most low-resource languages. Thus, creating speech corpora for a language of such a nature is challenging and involves a significant amount of time and effort. This paper provides an overview of related data collection strategies, highlighting a few issues prevalent in the existing approaches. The objectives of this paper encompass firstly the introduction of an open-source tool called "Voicer" that is accessible via both handheld devices and computers that can he used to conduct a speech data collection for a specific domain in a short span of time irrespective of the language. Secondly, it demonstrates the power of the tool, utilizing the same to build a Sinhala speech corpus that consists of 10 hours of speech data for 39 different sentences in the banking domain. Finally, this paper provides a framework to evaluate a speech data corpus along with the lessons learned during the process of data collection with a view to contributing towards future researches.
引用
收藏
页码:174 / 181
页数:8
相关论文
共 50 条
  • [41] Sick Weather Ahead On Data-Mining, Crowd-Sourcing and White Noise
    Caduff, Carlo
    [J]. CAMBRIDGE JOURNAL OF ANTHROPOLOGY, 2014, 32 (01): : 32 - 46
  • [42] A framework for evaluating urban land use mix from crowd-sourcing data
    Gervasoni, Luciano
    Bosch, Marti
    Fenet, Serge
    Sturm, Peter
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 2147 - 2156
  • [43] Privacy-Preserving Crowd-Sourcing ofWeb Searches with Private Data Donor
    Primault, Vincent
    Lampos, Vasileios
    Cox, Ingemar J.
    De Cristofaro, Emiliano
    [J]. WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, : 1487 - 1497
  • [44] Skyline Queries over Incomplete Data - Error Models for Focused Crowd-Sourcing
    Lofi, Christoph
    El Maarry, Kinda
    Balke, Wolf-Tilo
    [J]. CONCEPTUAL MODELING, ER 2013, 2013, 8217 : 298 - +
  • [45] Code-Mixed Question Answering Challenge: Crowd-sourcing Data and Techniques
    Chandu, Khyathi Raghavi
    Loginova, Ekaterina
    Gupta, Vishal
    van Genabith, Josef
    Neuman, Guenter
    Chinnakotla, Manoj
    Nyberg, Eric
    Black, Alan
    [J]. COMPUTATIONAL APPROACHES TO LINGUISTIC CODE-SWITCHING, 2018, : 29 - 38
  • [46] Reexaminatin on Voting for Crowd Sourcing MT Evaluation
    Wang, Yiming
    Yang, Muyun
    [J]. MACHINE TRANSLATION, CWMT 2014, 2014, 493 : 104 - 115
  • [47] Crowd-sourcing Framework to Assess QoE
    Mushtaq, M. Sajid
    Augustin, Brice
    Mellouk, Abdelhamid
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2014, : 1705 - 1710
  • [48] A Framework for Crowd-Sourced Exercise Data Collection and Processing
    Khasawneh, Natheer
    Schulte, Christoph
    Fraiwan, Mohammad
    [J]. 2020 11TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2020, : 313 - 317
  • [49] CROWD-SOURCING SATELLITE IMAGE ANALYSIS
    Christophe, Emmanuel
    Inglada, Jordi
    Maudlin, Jerome
    [J]. 2010 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2010, : 1430 - 1433
  • [50] Efficient User Assignment in Crowd Sourcing Applications
    Yadav, Akash
    Sairam, Ashok Singh
    Singh, Rituraj
    [J]. 2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 1199 - 1205