Samromur: Crowd-sourcing large amounts of data

被引:0
|
作者
Hedstrom, Staffan [1 ]
Mollberg, David Erik [2 ]
Thorhallsdottir, Ragnheiour [1 ]
Guonason, Jon [1 ]
机构
[1] Reykjavik Univ, Menntavegi 1, IS-102 Reykjavik, Iceland
[2] Tiro, IS-163 Reykjavik, Iceland
关键词
Speech corpora; Icelandic; Crowd Sourcing;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This contribution describes the collection of a large and diverse corpus for speech recognition and similar tools using crowd-sourced donations. We have built a collection platform inspired by Mozilla Common Voice and specialized it to our needs. We discuss the importance of engaging the community and motivating it to contribute, in our case through competitions. Given the incentive and a platform to easily read in large amounts of utterances, we have observed four cases of speakers freely donating over 10 thousand utterances. We have also seen that women are keener to participate in these events throughout all age groups. Manually verifying a large corpus is a monumental task and we attempt to automatically verify parts of the data using tools like Marosijo and the Montreal Forced Aligner. The method proved helpful, especially for detecting invalid utterances and halving the work needed from crowd-sourced verification.
引用
收藏
页码:2311 / 2316
页数:6
相关论文
共 50 条
  • [1] Samromur: Crowd-sourcing Data Collection for Icelandic Speech Recognition
    Mollberg, David Erik
    Jonsson, Olafur Helgi
    Porsteinsdottir, Sunneva
    Steingrimsson, Steinpor
    Magnusdottir, Eydis Huld
    Gudnason, Jon
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3463 - 3467
  • [2] Software CROWD-Sourcing
    Naik, Nitin
    [J]. 2017 11TH INTERNATIONAL CONFERENCE ON RESEARCH CHALLENGES IN INFORMATION SCIENCE (RCIS), 2017, : 463 - 464
  • [3] Crowd-Sourcing Creation
    Brunick, Paul
    [J]. FILM COMMENT, 2011, 47 (04) : 42 - 45
  • [4] The RedDots Platform for Mobile Crowd-Sourcing of Speech Data
    Lee, Kong Aik
    Wang, Guangsen
    Ng, Kam Pheng
    Sun, Hanwu
    Trung Hieu Nguyen
    Thai, Ngoc Thuy Huong
    Ma, Bin
    Li, Haizhou
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2603 - 2604
  • [5] Crowd-Sourcing Drug Discovery
    Bagla, Pallava
    [J]. SCIENCE, 2012, 335 (6071) : 909 - 909
  • [6] The GEP: Crowd-Sourcing Big Data Analysis with Undergraduates
    Elgin, Sarah C. R.
    Hauser, Charles
    Holzen, Teresa M.
    Jones, Christopher
    Kleinschmit, Adam
    Leatherman, Judith
    [J]. TRENDS IN GENETICS, 2017, 33 (02) : 81 - 85
  • [7] Crowd-Sourcing for Smart Cities
    Chowdhury, Srinjoy Nag
    Dhawan, Saniya
    Agnihotri, Akshay
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ELECTRONICS, INFORMATION & COMMUNICATION TECHNOLOGY (RTEICT), 2016, : 360 - 365
  • [8] REMOTE SENSING AND CROWD-SOURCING
    Guida, Raffaella
    Brett, Peter T. B.
    Khan, Salman S.
    [J]. 2013 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2013, : 3942 - 3945
  • [9] Crowd-sourcing prosodic annotation
    Cole, Jennifer
    Mahrt, Timothy
    Roy, Joseph
    [J]. COMPUTER SPEECH AND LANGUAGE, 2017, 45 : 300 - 325
  • [10] Crowd-sourcing: Strength in numbers
    Philip Ball
    [J]. Nature, 2014, 506 : 422 - 423