Samromur: Crowd-sourcing large amounts of data

被引:0
|
作者
Hedstrom, Staffan [1 ]
Mollberg, David Erik [2 ]
Thorhallsdottir, Ragnheiour [1 ]
Guonason, Jon [1 ]
机构
[1] Reykjavik Univ, Menntavegi 1, IS-102 Reykjavik, Iceland
[2] Tiro, IS-163 Reykjavik, Iceland
关键词
Speech corpora; Icelandic; Crowd Sourcing;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This contribution describes the collection of a large and diverse corpus for speech recognition and similar tools using crowd-sourced donations. We have built a collection platform inspired by Mozilla Common Voice and specialized it to our needs. We discuss the importance of engaging the community and motivating it to contribute, in our case through competitions. Given the incentive and a platform to easily read in large amounts of utterances, we have observed four cases of speakers freely donating over 10 thousand utterances. We have also seen that women are keener to participate in these events throughout all age groups. Manually verifying a large corpus is a monumental task and we attempt to automatically verify parts of the data using tools like Marosijo and the Montreal Forced Aligner. The method proved helpful, especially for detecting invalid utterances and halving the work needed from crowd-sourced verification.
引用
收藏
页码:2311 / 2316
页数:6
相关论文
共 50 条
  • [31] Skyline Queries over Incomplete Data - Error Models for Focused Crowd-Sourcing
    Lofi, Christoph
    El Maarry, Kinda
    Balke, Wolf-Tilo
    [J]. CONCEPTUAL MODELING, ER 2013, 2013, 8217 : 298 - +
  • [32] A framework for evaluating urban land use mix from crowd-sourcing data
    Gervasoni, Luciano
    Bosch, Marti
    Fenet, Serge
    Sturm, Peter
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 2147 - 2156
  • [33] Research on Group Innovation and Crowd-Funding, Crowd-Sourcing of Wuhan
    Cai Guo-pei
    Ting, Cao
    Dong, Liang
    [J]. PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON ECONOMIC DEVELOPMENT AND EDUCATION MANAGEMENT (ICEDEM 2017), 2017, 107 : 242 - 246
  • [34] Computational Models of Consumer Confidence from Large-Scale Online Attention Data: Crowd-Sourcing Econometrics
    Dong, Xianlei
    Bollen, Johan
    [J]. PLOS ONE, 2015, 10 (03):
  • [35] Online Incentive Mechanism Design for Smartphone Crowd-sourcing
    Subramanian, Ashwin
    Kanth, G. Sai
    Moharir, Sharayu
    Vaze, Rahul
    [J]. 2015 13TH INTERNATIONAL SYMPOSIUM ON MODELING AND OPTIMIZATION IN MOBILE, AD HOC, AND WIRELESS NETWORKS (WIOPT), 2015, : 403 - 410
  • [36] Robust and Trusted Crowd-Sourcing and Crowd-Tasking in the Future Internet
    Havlik, Denis
    Egly, Maria
    Huber, Hermann
    Kutschera, Peter
    Falgenhauer, Markus
    Cizek, Markus
    [J]. ENVIRONMENTAL SOFTWARE SYSTEMS: FOSTERING INFORMATION SHARING, 2013, 413 : 164 - 176
  • [37] Crowd-sourcing tools within the PREPARE analytical platform
    Ikonomopoulos, A.
    Konstantopoulos, S.
    [J]. RADIOPROTECTION, 2016, 51 (HS2) : S187 - S189
  • [38] Crowd-sourcing: Citizens as scientists for air pollution monitoring
    Angelevska, Beti
    Andreevski, Igor
    Atanasova, Vaska
    [J]. 2021 56TH INTERNATIONAL SCIENTIFIC CONFERENCE ON INFORMATION, COMMUNICATION AND ENERGY SYSTEMS AND TECHNOLOGIES (ICEST), 2021, : 131 - 134
  • [39] Integration of Computational and Crowd-Sourcing Methods for Ontology Extraction
    Lin, Huairen
    Davis, Joseph
    Zhou, Ying
    [J]. 2009 FIFTH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRID (SKG 2009), 2009, : 306 - 309
  • [40] histoGraph as a Demonstrator for Domain Specific Challenges to Crowd-Sourcing
    Wieneke, Lars
    Duering, Marten
    Croce, Vincenzo
    Novak, Jasminko
    [J]. Social Informatics, 2015, 8852 : 469 - 476