Building the essential resources for Finnish: the Turku Dependency Treebank

被引:0
|
作者
Katri Haverinen
Jenna Nyblom
Timo Viljanen
Veronika Laippala
Samuel Kohonen
Anna Missilä
Stina Ojala
Tapio Salakoski
Filip Ginter
机构
[1] University of Turku,Turku Centre for Computer Science
[2] University of Turku,Department of Information Technology
[3] University of Turku,Department of French Studies
[4] University of Turku,University of Turku Graduate School
来源
关键词
Treebank; Finnish; Parsing; Morphology;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, we present the final version of a publicly available treebank of Finnish, the Turku Dependency Treebank. The treebank contains 204,399 tokens (15,126 sentences) from 10 different text sources and has been manually annotated in a Finnish-specific version of the well-known Stanford Dependency scheme. The morphological analyses of the treebank have been assigned using a novel machine learning method to disambiguate readings given by an existing tool. As the second main contribution, we present the first open source Finnish dependency parser, trained on the newly introduced treebank. The parser achieves a labeled attachment score of 81 %. The treebank data as well as the parsing pipeline are available under an open license at http://bionlp.utu.fi/.
引用
收藏
页码:493 / 531
页数:38
相关论文
共 50 条
  • [1] Building the essential resources for Finnish: the Turku Dependency Treebank
    Haverinen, Katri
    Nyblom, Jenna
    Viljanen, Timo
    Laippala, Veronika
    Kohonen, Samuel
    Missila, Anna
    Ojala, Stina
    Salakoski, Tapio
    Ginter, Filip
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2014, 48 (03) : 493 - 531
  • [2] BKTreebank: Building a Vietnamese Dependency Treebank
    Kiem-Hieu Nguyen
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 2164 - 2168
  • [3] Building a Treebank for Vietnamese Dependency Parsing
    Luong Nguyen Thi
    Linh Ha My
    Hung Nguyen Viet
    Huyen Nguyen Thi Minh
    Phuong Le Hong
    [J]. PROCEEDINGS OF 2013 IEEE RIVF INTERNATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION TECHNOLOGIES: RESEARCH, INNOVATION, AND VISION FOR THE FUTURE (RIVF), 2013, : 147 - 151
  • [4] Building the Croatian Dependency Treebank: the initial stages
    Tadic, Marko
    [J]. SUVREMENA LINGVISTIKA, 2007, 63 (01): : 85 - 92
  • [5] Transforming a Constituency Treebank into a Dependency Treebank
    Gelbukh, Alexander
    Torres, Sulema
    Calvo, Hiram
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2005, (35): : 145 - 152
  • [6] The Norwegian Dependency Treebank
    Solberg, Per Erik
    Skjaerholt, Arne
    Ovrelid, Lilja
    Hagen, Kristin
    Johannessen, Janne Bondi
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 789 - 795
  • [7] The Alpino Dependency Treebank
    van der Beek, L
    Bouma, G
    Malouf, R
    van Noord, G
    [J]. COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS 2001, 2002, (45): : 8 - 22
  • [8] Hungarian Dependency Treebank
    Vincze, Veronika
    Szauter, Dora
    Almasi, Attila
    Mora, Gyoergy
    Alexin, Zoltan
    Csirik, Janos
    [J]. LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 1855 - 1862
  • [9] Building an Ellipsis-aware Chinese Dependency Treebank for Web Text
    Ren, Xuancheng
    Sun, Xu
    Wen, Ji
    Wei, Bingzhen
    Zhan, Weidong
    Zhang, Zhiyuan
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 1749 - 1754
  • [10] Resources for Turkish dependency parsing: introducing the BOUN Treebank and the BoAT annotation tool
    Turk, Utku
    Atmaca, Furkan
    Ozates, Saziye Betul
    Berk, Gozde
    Bedir, Seyyit Talha
    Koksal, Abdullatif
    Basaran, Balkiz Ozturk
    Gungor, Tunga
    Ozgur, Arzucan
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2022, 56 (01) : 259 - 307