Voicer: A Crowd Sourcing Tool for Speech Data Collection

被引:0
|
作者
Buddhika, Darshana [1 ]
Liyadipita, Ranula [1 ]
Nadeeshan, Sudeepa [1 ]
Witharana, Hasini [1 ]
Jayasena, Sanath [1 ]
Thayasivam, Uthayasanker [1 ]
机构
[1] Univ Moratuwa, Dept Comp Sci & Engn, Moratuwa, Sri Lanka
关键词
data corpus; data collection tool; low resourced languages;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Speech corpora do not exist for most low-resource languages. Thus, creating speech corpora for a language of such a nature is challenging and involves a significant amount of time and effort. This paper provides an overview of related data collection strategies, highlighting a few issues prevalent in the existing approaches. The objectives of this paper encompass firstly the introduction of an open-source tool called "Voicer" that is accessible via both handheld devices and computers that can he used to conduct a speech data collection for a specific domain in a short span of time irrespective of the language. Secondly, it demonstrates the power of the tool, utilizing the same to build a Sinhala speech corpus that consists of 10 hours of speech data for 39 different sentences in the banking domain. Finally, this paper provides a framework to evaluate a speech data corpus along with the lessons learned during the process of data collection with a view to contributing towards future researches.
引用
收藏
页码:174 / 181
页数:8
相关论文
共 50 条
  • [21] Crowd sourcing in drug discovery
    Lessl, Monika
    Bryans, Justin S.
    Richards, Duncan
    Asadullah, Khusru
    [J]. NATURE REVIEWS DRUG DISCOVERY, 2011, 10 (04) : 241 - 242
  • [22] Democratizing Data Analytics: Crowd-sourcing Decentralized Collective Measurements
    Pournaras, Evangelos
    Gaere, Edward
    Kunz, Renato
    Ghulam, Atif Nabi
    [J]. 2019 IEEE 4TH INTERNATIONAL WORKSHOPS ON FOUNDATIONS AND APPLICATIONS OF SELF* SYSTEMS (FAS*W 2019), 2019, : 265 - 266
  • [23] Crowd sourcing in drug discovery
    Monika Lessl
    Justin S. Bryans
    Duncan Richards
    Khusru Asadullah
    [J]. Nature Reviews Drug Discovery, 2011, 10 : 241 - 242
  • [24] Verification and Employment of Crowd-Sourcing Data in Road Safety Assessment
    Tian, Shan
    Yang, Zi
    Yin, Qiuyang
    Yue, Yun
    Pei, Xin
    Zhang, Zuo
    [J]. CICTP 2020: ADVANCED TRANSPORTATION TECHNOLOGIES AND DEVELOPMENT-ENHANCING CONNECTIONS, 2020, : 3600 - 3611
  • [25] A Review: Big Data Analytics for enhanced Customer Experiences with Crowd Sourcing
    Satish, Laika
    Yusof, Norazah
    [J]. DISCOVERY AND INNOVATION OF COMPUTER SCIENCE TECHNOLOGY IN ARTIFICIAL INTELLIGENCE ERA, 2017, 116 : 274 - 283
  • [26] Crowd-sourcing as a novel tool to measure financial toxicity in patients with ovarian cancer
    Esselen, K. M.
    Jansen, C. N.
    Stack-Dunnbier, H.
    Hacker, M. R.
    [J]. GYNECOLOGIC ONCOLOGY, 2020, 159 : 243 - 243
  • [27] CSTD-Telugu Corpus: Crowd-Sourced Approach for Large-Scale Speech data collection
    Mirishkar, Ganesh S.
    Raju, Vishnu Vidyadhara V.
    Naroju, Meher Dinesh
    Maity, Sudhamay
    Yalla, Prakash
    Vuppala, Anil Kumar
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 511 - 517
  • [28] Crowd Sourcing: Do Peer Crowd Prototypes Match Reality?
    Pivnick, Lilla K.
    Gordon, Rachel A.
    Crosnoe, Robert
    [J]. SOCIAL PSYCHOLOGY QUARTERLY, 2020, 83 (03) : 272 - 293
  • [29] Crowd-Sourcing for Smart Cities
    Chowdhury, Srinjoy Nag
    Dhawan, Saniya
    Agnihotri, Akshay
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ELECTRONICS, INFORMATION & COMMUNICATION TECHNOLOGY (RTEICT), 2016, : 360 - 365
  • [30] REMOTE SENSING AND CROWD-SOURCING
    Guida, Raffaella
    Brett, Peter T. B.
    Khan, Salman S.
    [J]. 2013 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2013, : 3942 - 3945