Adjusting Indonesian Multiword Expression Annotation to the Penn Treebank Format

被引:0
|
作者
Arwidarasti, Jessica Naraiswari [1 ]
Alfina, Ika [1 ]
Krisnadhi, Adila Alfa [1 ]
机构
[1] Univ Indonesia, Fac Comp Sci, Depok, Indonesia
关键词
compound word; Indonesian; multiword expression; Penn Treebank; Stanford parser;
D O I
10.1109/ialp51396.2020.9310479
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multiword Expression (MWE) has been a pain in the neck, especially in determining its word-classes in syntactic treebank. Previous work had proposed annotation guidelines for Indonesian MWEs that align to the Penn Treebank (PTB) format. However, we think that their proposed annotation still needs improvements. Therefore, this study proposes a new annotation guideline in labeling Indonesian MWE that conforms to PTB format. Moreover, we also revised the MWE annotation of an existing Indonesian constituency treebank consisting of 1030 sentences to conform to the new guidelines. To evaluate the revised treebank's quality, we built an Indonesian constituency parser model using the revised treebank and Stanford parser. The experiments show that the resulting parser has an F1-score of 69.97%.
引用
收藏
页码:75 / 80
页数:6
相关论文
共 10 条
  • [1] Converting an Indonesian Constituency Treebank to the Penn Treebank Format
    Arwidarasti, Jessica Naraiswari
    Alfina, Ika
    Krisnadhi, Adila Alfa
    [J]. PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 331 - 336
  • [2] Annotation of multiword expressions in the Prague dependency treebank
    Eduard Bejček
    Pavel Straňák
    [J]. Language Resources and Evaluation, 2010, 44 : 7 - 21
  • [3] Sense annotation in the penn discourse treebank
    Miltsakaki, Eleni
    Robaldo, Livio
    Lee, Alan
    Joshi, Aravind
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2008, 4919 : 275 - +
  • [4] Annotation of multiword expressions in the Prague dependency treebank
    Bejcek, Eduard
    Stranak, Pavel
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2010, 44 (1-2) : 7 - 21
  • [5] Attribution and its annotation in the Penn Discourse TreeBank
    Prasad, Rashmi
    Dinesh, Nikhil
    Lee, Alan
    Joshi, Aravind
    Webber, Bonnie
    [J]. TRAITEMENT AUTOMATIQUE DES LANGUES, 2006, 47 (02): : 43 - 63
  • [6] Reflections on the Penn Discourse TreeBank, Comparable Corpora, and Complementary Annotation
    Prasad, Rashmi
    Webber, Bonnie
    Joshi, Aravind
    [J]. COMPUTATIONAL LINGUISTICS, 2014, 40 (04) : 921 - 950
  • [7] Approach for Multiword Expression Recognition & Annotation in Urdu Corpora
    Gupta, Vaishali
    Joshi, Nisheeth
    Mathur, Iti
    [J]. 2017 FOURTH INTERNATIONAL CONFERENCE ON IMAGE INFORMATION PROCESSING (ICIIP), 2017, : 176 - 181
  • [8] Automatic Extraction of Multiword Expression Candidates for Indonesian Language
    Gunawan, Dani
    Amalia, Amalia
    Charisma, Indra
    [J]. 2016 6TH IEEE INTERNATIONAL CONFERENCE ON CONTROL SYSTEM, COMPUTING AND ENGINEERING (ICCSCE), 2016, : 304 - 309
  • [9] Transfer Building of Multiword Expression Resource from Indonesian to Malay
    Liu, Wuying
    Wang, Lin
    [J]. 2020 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2020), 2020, : 299 - 304
  • [10] Implementing universal dependency, morphology, and multiword expression annotation standards for Turkish language processing
    Sulubacak, Umut
    Eryigit, Gulsen
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2018, 26 (03) : 1662 - 1672