Text corpus with errors

被引:0
|
作者
Pala, K [1 ]
Rychly, P [1 ]
Smrz, P [1 ]
机构
[1] Masaryk Univ, Fac Informat, Brno 60200, Czech Republic
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a description of a Czech text corpus (Chyby) containing various kinds of errors such as spelling, typographical, grammatical, style, lexical. We explain how Chyby has been built, how the errors in it have been discovered, marked and annotated. The classification of the errors is presented and the statistics concerning the types of errors is given. The tools for annotating the errors are also described. To the best of our knowledge, this is first text corpus of this sort prepared for Czech.
引用
收藏
页码:90 / 97
页数:8
相关论文
共 50 条
  • [21] Emotion Corpus Construction on Microblog Text
    Huang, Lei
    Li, Shoushan
    Zhou, Guodong
    CHINESE LEXICAL SEMANTICS (CLSW 2015), 2015, 9332 : 204 - 212
  • [22] Patterns of text reuse in a scientific corpus
    Citron, Daniel T.
    Ginsparg, Paul
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2015, 112 (01) : 25 - 30
  • [23] Trust the text: language, corpus and discourse
    Liu, Jingzhong
    AUSTRALIAN REVIEW OF APPLIED LINGUISTICS, 2005, 28 (01) : 109 - 112
  • [24] Text Corpus as a Tool for Literary Studies
    Karlinska, Agnieszka
    Czwordon-Lis, Paulina
    Maryl, Maciej
    TEKSTY DRUGIE, 2023, (06): : 294 - 319
  • [25] Text and corpus analysis - Stubbs,M
    Wortham, S
    DISCOURSE & SOCIETY, 1997, 8 (03) : 429 - 430
  • [26] IceSum: An Icelandic Text Summarization Corpus
    Dadason, Jon Fridrik
    Loftsson, Hrafn
    Sigurdardottir, Salome Lilja
    Bjornsson, Dorsteinn
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 9 - 14
  • [27] Concurrent Processing of Text Corpus Queries
    Rabara, Radoslav
    Rychly, Pavel
    RECENT ADVANCES IN SLAVONIC NATURAL LANGUAGE PROCESSING (RASLAN 2015), 2015, : 49 - 58
  • [28] Modality in Text: a Proposal for Corpus Annotation
    Hendrickx, Iris
    Mendes, Amaalia
    Mencarelli, Silvia
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1805 - 1812
  • [29] The Electronic Text Corpus of Sumerian Literature
    Ebeling, Jarle
    CORPORA, 2007, 2 (01) : 111 - 120
  • [30] 'The Khataks' Chronicle': Corpus and Function of the Text
    Andreyev, Sergei
    CENTRAL ASIAN SURVEY, 2022, 41 (03) : 612 - 614