Lexical Substitution Dataset for German

被引:0
|
作者
Cholakov, Kostadin [1 ]
Biemann, Chris [2 ]
Eckle-Kohler, Judith [3 ,4 ]
Gurevych, Iryna [3 ,4 ]
机构
[1] Humboldt Univ, Berlin, Germany
[2] FG Language Technol, Berlin, Germany
[3] Tech Univ Darmstadt, Dept Comp Sci, Ubiquitous Knowledge Proc Lab UKP TUDA, Darmstadt, Germany
[4] German Inst Educ Res & Educ Informat, Ubiquitous Knowledge Proc Lab UKP DIPF, Berlin, Germany
来源
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2014年
关键词
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
This article describes a lexical substitution dataset for German. The whole dataset contains 2,040 sentences from the German Wikipedia, with one target word in each sentence. There are 51 target nouns, 51 adjectives, and 51 verbs randomly selected from 3 frequency groups based on the lemma frequency list of the German WaCKy corpus. 200 sentences have been annotated by 4 professional annotators and the remaining sentences by 1 professional annotator and 5 additional annotators who have been recruited via crowdsourcing. The resulting dataset can be used to evaluate not only lexical substitution systems, but also different sense inventories and word sense disambiguation systems.
引用
收藏
页码:1406 / 1411
页数:6
相关论文
共 50 条
  • [1] Chinese Lexical Substitution: Dataset and Method
    Qiang, Jipeng
    Liu, Kang
    Li, Ying
    Li, Yun
    Zhu, Yi
    Yuan, Yunhao
    Hu, Xiaocheng
    Ouyang, Xiaoye
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 29 - 42
  • [2] Arabic Lexical Substitution: AraLexSubD Dataset and AraLexSub Pipeline
    Naser-Karajah, Eman
    Arman, Nabil
    DATA, 2024, 9 (08)
  • [3] The English lexical substitution task
    Diana McCarthy
    Roberto Navigli
    Language Resources and Evaluation, 2009, 43 : 139 - 159
  • [4] The English lexical substitution task
    McCarthy, Diana
    Navigli, Roberto
    LANGUAGE RESOURCES AND EVALUATION, 2009, 43 (02) : 139 - 159
  • [5] ALEXSIS: A Dataset for Lexical Simplification in Spanish
    Ferres, Daniel
    Saggion, Horacio
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 3582 - 3594
  • [6] Explorations in lexical sample and all-words lexical substitution
    Sinha, Ravi
    Mihalcea, Rada
    NATURAL LANGUAGE ENGINEERING, 2014, 20 (01) : 99 - 129
  • [7] A Dataset for the Evaluation of Lexical Simplification in Portuguese for Children
    Hartmann, Nathan S.
    Paetzold, Gustavo H.
    Aluisio, Sandra M.
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2020, 2020, 12037 : 55 - 64
  • [8] Lexical Substitution as a Framework for Multiword Evaluation
    McCarthy, Diana
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 1055 - 1062
  • [9] Sense identification data: A dataset for lexical semantics
    Colla, Davide
    Mensa, Enrico
    Radicioni, Daniele P.
    DATA IN BRIEF, 2020, 32
  • [10] Evaluation Dataset and System for Japanese Lexical Simplification
    Kajiwara, Tomoyuki
    Yamamoto, Kazuhide
    53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP (ACL-IJCNLP 2015), 2015, : 35 - 40