Collection and evaluation of lexical complexity data for Russian language using crowdsourcing

被引:2
|
作者
Abramov, Aleksei, V [1 ]
Ivanov, Vladimir V. [1 ]
机构
[1] Kazan Fed Univ, Kazan, Russia
来源
RUSSIAN JOURNAL OF LINGUISTICS | 2022年 / 26卷 / 02期
基金
俄罗斯科学基金会;
关键词
Lexical complexity; Russian language; annotation; corpora; Bible;
D O I
10.22363/2687-0088-30118
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
Estimating word complexity with binary or continuous scores is a challenging task that has been studied for several domains and natural languages. Commonly this task is referred to as Complex Word Identification (CWI) or Lexical Complexity Prediction (LCP). Correct evaluation of word complexity can be an important step in many Lexical Simplification pipelines. Earlier works have usually presented methodologies of lexical complexity estimation with several restrictions: hand-crafted features correlated with word complexity, performed feature engineering to describe target words with features such as number of hypernyms, count of consonants, Named Entity tag, and evaluations with carefully selected target audiences. Modern works investigated the use of transforner-based models that afford extracting features from surrounding context as well. However, the majority of papers have been devoted to pipelines for the English language and few translated them to other languages such as German, French, and Spanish. In this paper we present a dataset of lexical complexity in context based on the Russian Synodal Bible collected using a crowdsourcing platform. We describe a methodology for collecting the data using a 5-point Likert scale for annotation, present descriptive statistics and compare results with analogous work for the English language. We evaluate a linear regression model as a baseline for predicting word complexity on handcrafted features, fastText and ELMo embeddings of target words. The result is a corpus consisting of 931 distinct words that used in 3,364 different contexts.
引用
收藏
页码:409 / 425
页数:17
相关论文
共 50 条
  • [41] Crowdsourcing Kazakh-Russian Sign Language: FluentSigners-50
    Mukushev, M.
    Kydyrbekova, A.
    Imashev, A.
    Kimmelman, V.
    Sandygulova, A.
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 4767 - 4773
  • [42] Crowdsourcing Kazakh-Russian Sign Language: FluentSigners-50
    Mukushev, M.
    Kydyrbekova, A.
    Imashev, A.
    Kimmelman, V.
    Sandygulova, A.
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 2541 - 2547
  • [43] Using Crowdsourcing for Data Analytics
    Garcia-Molina, Hector
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [44] Development of a Mobile Application for Crowdsourcing the Data Collection of Environmental Sounds
    Matsuyama, Minori
    Nisimura, Ryuichi
    Kawahara, Hideki
    Yamada, Junnosuke
    Irino, Toshio
    [J]. HUMAN INTERFACE AND THE MANAGEMENT OF INFORMATION: INFORMATION AND KNOWLEDGE DESIGN AND EVALUATION, PT I, 2014, 8521 : 514 - 524
  • [45] Mobile Gamification for Crowdsourcing Data Collection: Leveraging the Freemium Model
    Dergousoff, Kristen
    Mandryk, Regan L.
    [J]. CHI 2015: PROCEEDINGS OF THE 33RD ANNUAL CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2015, : 1065 - 1074
  • [46] Key Research Issues and Related Technologies in Crowdsourcing Data Collection
    Li, Yunhui
    Chang, Liang
    Li, Long
    Bao, Xuguang
    Gu, Tianlong
    [J]. WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2021, 2021
  • [47] CROWDSOURCING AND VGI IN NATIONAL MAPPING AGENCY'S DATA COLLECTION
    Bol, Daphne
    Grus, Magdalena
    Laakso, Mari
    [J]. 6TH INTERNATIONAL CONFERENCE ON CARTOGRAPHY AND GIS, VOLS 1 AND 2, 2016, : 493 - 498
  • [48] Crowdsourcing research: Data collection with Amazon's Mechanical Turk
    Sheehan, Kim Bartel
    [J]. COMMUNICATION MONOGRAPHS, 2018, 85 (01) : 140 - 156
  • [49] Lexical borrowings in language contact (on the example of the language of Russian Germans in the Kirov region, Russia)
    Ivanov, Andrey, V
    Baykova, Olga, V
    [J]. ANTHROPOS, 2023, 118 (01) : 165 - 174
  • [50] Crowdsourcing as a Method for the Collection of Revealed Preference Data Short Paper
    Assemi, Behrang
    Schlagwein, Daniel
    Safi, Hamid
    Mesbah, Mahmoud
    [J]. 9TH IEEE INTERNATIONAL SYMPOSIUM ON SERVICE-ORIENTED SYSTEM ENGINEERING (SOSE 2015), 2015, : 378 - 382