Shallow Discourse Parsing for Under-Resourced Languages: Combining Machine Translation and Annotation Projection

被引:0
|
作者
Sluyter-Gaethje, Henny [1 ]
Bourgonje, Peter [1 ]
Stede, Manfred [1 ]
机构
[1] Univ Potsdam, Appl Computat Linguist, Potsdam, Germany
关键词
machine translation; annotation projection; discourse parsing;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Shallow Discourse Parsing (SDP), the identification of coherence relations between text spans, relies on large amounts of training data, which so far exists only for English - any other language is in this respect an under-resourced one. For those languages where machine translation from English is available with reasonable quality, MT in conjunction with annotation projection can be an option for producing an SDP resource. In our study, we translate the English Penn Discourse TreeBank into German and experiment with various methods of annotation projection to arrive at the German counterpart of the PDTB. We describe the key characteristics of the corpus as well as some typical sources of errors encountered during its creation. Then we evaluate the GermanPDTB by training components for selected sub-tasks of discourse parsing on this silver data and compare performance to the same components when trained on the gold, original PDTB corpus.
引用
收藏
页码:1044 / 1050
页数:7
相关论文
共 50 条
  • [1] ASR and translation for under-resourced languages
    Besacier, L.
    Le, V-B.
    Boitet, C.
    Berment, V.
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 6079 - 6082
  • [2] Crawl and crowd to bring machine translation to under-resourced languages
    Toral, Antonio
    Espla-Gomis, Miquel
    Klubicka, Filip
    Ljubesic, Nikola
    Papavassiliou, Vassilis
    Prokopidis, Prokopis
    Rubino, Raphael
    Way, Andy
    LANGUAGE RESOURCES AND EVALUATION, 2017, 51 (04) : 1019 - 1051
  • [3] Crawl and crowd to bring machine translation to under-resourced languages
    Antonio Toral
    Miquel Esplá-Gomis
    Filip Klubička
    Nikola Ljubešić
    Vassilis Papavassiliou
    Prokopis Prokopidis
    Raphael Rubino
    Andy Way
    Language Resources and Evaluation, 2017, 51 : 1019 - 1051
  • [4] InterlinguaPlus Machine Translation Approach for Under-Resourced Languages: Ekegusii & Swahili
    Ombui, Edward O.
    Wagacha, Peter W.
    Ng'ang'a, Wanjiku
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,
  • [5] The Use of Machine Translation to Provide Resources for Under-Resourced Languages - Image Captioning Task
    Ahmed, Basem H.
    Saad, Motaz
    2021 PALESTINIAN INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (PICICT 2021), 2021, : 25 - 29
  • [6] The Multilingual GRUG Parallel Treebank - Syntactic Annotation for Under-Resourced Languages
    Kapanadze, Oleg
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,
  • [7] Eigentrigraphemes for under-resourced languages
    Ko, Tom
    Mak, Brian
    SPEECH COMMUNICATION, 2014, 56 : 132 - 141
  • [8] A Modular and Automated Annotation Platform for Handwritings: Evaluation on Under-Resourced Languages
    Vidal-Gorene, Chahan
    Dupin, Boris
    Decours-Perez, Alienor
    Riccioli, Thomas
    DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT III, 2021, 12823 : 507 - 522
  • [9] Evaluating machine-assisted annotation in under-resourced settings
    Felt, Paul
    Ringger, Eric K.
    Seppi, Kevin
    Heal, Kristian S.
    Haertel, Robbie A.
    Lonsdale, Deryle
    LANGUAGE RESOURCES AND EVALUATION, 2014, 48 (04) : 561 - 599
  • [10] Evaluating machine-assisted annotation in under-resourced settings
    Paul Felt
    Eric K. Ringger
    Kevin Seppi
    Kristian S. Heal
    Robbie A. Haertel
    Deryle Lonsdale
    Language Resources and Evaluation, 2014, 48 : 561 - 599