Czech Legal Text Treebank 1.0

被引:0
|
作者
Kriz, Vincent [1 ]
Hladka, Barbora [1 ]
Uresova, Zdenka [1 ]
机构
[1] Charles Univ Prague, Fac Math & Phys, Inst Formal & Appl Linguist, Prague, Czech Republic
关键词
annotated corpus; legal domain; parsing;
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
We introduce a new member of the family of Prague dependency treebanks. The Czech Legal Text Treebank 1.0 is a morphologically and syntactically annotated corpus of 1,128 sentences. The treebank contains texts from the legal domain, namely the documents from the Collection of Laws of the Czech Republic. Legal texts differ from other domains in several language phenomena influenced by rather high frequency of very long sentences. A manual annotation of such sentences presents a new challenge. We describe a strategy and tools for this task. The resulting treebank can be explored in various ways. It can be downloaded from the LINDAT/CLARIN repository and viewed locally using the TrEd editor or it can be accessed on-line using the KonText and TreeQuery tools.
引用
收藏
页码:2387 / 2392
页数:6
相关论文
共 50 条
  • [1] Czech Legal Text Treebank 2.0
    Kriz, Vincent
    Hladka, Barbora
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 4501 - 4505
  • [2] The Hinoki treebank. A treebank for text understanding
    Bond, F
    Fujita, S
    Hashimoto, C
    Kasahara, K
    Nariyama, S
    Nichols, E
    Ohtani, A
    Tanaka, T
    Amano, S
    [J]. NATURAL LANGUAGE PROCESSING - IJCNLP 2004, 2005, 3248 : 158 - 167
  • [3] Prague Dependency Treebank - Consolidated 1.0
    Hajic, Jan
    Bejcek, Eduard
    Hlavacova, Jaroslava
    Mikulova, Marie
    Straka, Milan
    Stepanek, Jan
    Stepankova, Barbora
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 5208 - 5218
  • [4] Coreference in Prague Czech-English Dependency Treebank
    Nedoluzhko, Anna
    Novak, Michal
    Cinkova, Silvie
    Mikulova, Marie
    Mirovsky, Jiri
    [J]. LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 169 - 176
  • [5] Announcing Prague Czech-English Dependency Treebank 2.0
    Hajic, Jan
    Hajicova, Eva
    Panevova, Jarmila
    Sgall, Petr
    Bojar, Ondrej
    Cinkova, Silvie
    Fucikova, Eva
    Mikulova, Marie
    Pajas, Petr
    Popelka, Jan
    Semecky, Jiri
    Sindlerova, Jana
    Stepanek, Jan
    Toman, Josef
    Uresova, Zdenka
    Zabokrtsky, Zdenek
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 3153 - 3160
  • [6] The actor instrument alternation in a parallel Czech-English dependency treebank
    Sindlerova, Jana
    Kubon, Vladislav
    Tamchyna, Ales
    Veselovska, Katerina
    [J]. SLOVO A SLOVESNOST, 2018, 79 (01): : 27 - 46
  • [7] Analyzing Text Coherence via Multiple Annotation in the Prague Dependency Treebank
    Rysova, Katerina
    Rysova, Magdalena
    [J]. TEXT, SPEECH, AND DIALOGUE (TSD 2015), 2015, 9302 : 71 - 79
  • [8] Building an Ellipsis-aware Chinese Dependency Treebank for Web Text
    Ren, Xuancheng
    Sun, Xu
    Wen, Ji
    Wei, Bingzhen
    Zhan, Weidong
    Zhang, Zhiyuan
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 1749 - 1754
  • [9] Ways of Evaluation of the Annotators in Building the Prague Czech-English Dependency Treebank
    Mikulova, Marie
    Stepanek, Jan
    [J]. LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 1836 - 1839
  • [10] Thematic concentration of the text in czech
    Gnatchuk, Hanna
    [J]. GLOTTOMETRICS, 2016, 34 : 83 - 83