Building a Turkish UCCA dataset

被引:0
|
作者
Bolucu, Necva [1 ,2 ]
Can, Burcu [3 ]
机构
[1] Hacettepe Univ, Dept Comp Engn, Ankara, Turkiye
[2] CSIRO, Data61, Sydney, NSW, Australia
[3] Univ Stirling, Comp Sci, Stirling, Scotland
关键词
Universal Conceptual Cognitive Annotation; UCCA; Semantic representation; METU-Sabanci Turkish Treebank; dataset;
D O I
10.1017/nlp.2024.36
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semantic representation is the task of conveying the meaning of a natural language utterance by converting it to a logical form that can be processed and understood by machines. It is utilised by many applications in natural language processing (NLP), particularly in tasks relevant to natural language understanding (NLU). Due to the widespread use of semantic parsing in NLP, many semantic representation schemes with different forms have been proposed; Universal Conceptual Cognitive Annotation (UCCA) is one of them. UCCA is a cross-lingual semantic annotation framework that allows easy annotation without requiring substantial linguistic knowledge. UCCA-annotated datasets have been released so far for English, French, German, Russian, and Hebrew. In this paper, we present a UCCA-annotated Turkish dataset of 400 sentences that are obtained from the METU-Sabanci Turkish Treebank. We provide the UCCA annotation specifications defined for the Turkish language so that it can be extended further. We followed a semi-automatic annotation approach, where an external semantic parser is utilised for the initial annotation of the dataset, which is manually revised by two annotators. We used the same semantic parser model to evaluate the dataset with zero-shot and few-shot learning, demonstrating that even a small sample set from the target language in the training data has a notable impact on the performance of the parser (15.6% and 2.5% gain over zero-shot for labelled and unlabelled results, respectively).
引用
收藏
页数:39
相关论文
共 50 条
  • [1] 15年,UCCA何以成为UCCA?
    余一
    [J]. 收藏.拍卖, 2022, (03) : 6 - 11
  • [2] Scene Text Dataset in Turkish
    Erdogmus, Nesli
    [J]. 2017 25TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2017,
  • [3] Turkish Cuisine: A Benchmark Dataset with Turkish Meals for Food Recognition
    Gungor, Cem
    Baltaci, Fatih
    Erdem, Aykut
    Erdem, Erkut
    [J]. 2017 25TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2017,
  • [4] UCCA EDGE
    álvaro Gómez-Sellés
    Rubi Xu
    Ray Rui Wu
    Beatriz de U?a Bóveda
    Yuanjun Summer Liu
    Sophie Nichols
    [J]. 建筑实践, 2021, (02) : 46 - 47
  • [5] UCCA ADMISSIONS
    KAY, H
    CHAPMAN, NB
    [J]. NATURE, 1983, 301 (5896) : 106 - 106
  • [6] TREMO: A dataset for emotion analysis in Turkish
    Tocoglu, Mansur Alp
    Alpkocak, Adil
    [J]. JOURNAL OF INFORMATION SCIENCE, 2018, 44 (06) : 848 - 860
  • [7] Turkish Dataset for Semantic Textual Similarity
    Fikri, Figen Beken
    Oflazer, Kemal
    Yanikoglu, Berrin
    [J]. 29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
  • [8] TurCoins: Turkish Republic Coin Dataset
    Temiz, Huseyin
    Gokberk, Berk
    Akarun, Lale
    [J]. 29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
  • [9] A Named Entity Recognition Dataset for Turkish
    Kucuk, Dilek
    Kucuk, Dogan
    Arici, Nursal
    [J]. 2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 329 - 332
  • [10] Visual Lip Reading Dataset in Turkish
    Berkol, Ali
    Tumer-Sivri, Talya
    Pervan-Akman, Nergis
    Colak, Melike
    Erdem, Hamit
    [J]. DATA, 2023, 8 (01)