Shallow Discourse Parsing for Under-Resourced Languages: Combining Machine Translation and Annotation Projection

被引:0
|
作者
Sluyter-Gaethje, Henny [1 ]
Bourgonje, Peter [1 ]
Stede, Manfred [1 ]
机构
[1] Univ Potsdam, Appl Computat Linguist, Potsdam, Germany
关键词
machine translation; annotation projection; discourse parsing;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Shallow Discourse Parsing (SDP), the identification of coherence relations between text spans, relies on large amounts of training data, which so far exists only for English - any other language is in this respect an under-resourced one. For those languages where machine translation from English is available with reasonable quality, MT in conjunction with annotation projection can be an option for producing an SDP resource. In our study, we translate the English Penn Discourse TreeBank into German and experiment with various methods of annotation projection to arrive at the German counterpart of the PDTB. We describe the key characteristics of the corpus as well as some typical sources of errors encountered during its creation. Then we evaluate the GermanPDTB by training components for selected sub-tasks of discourse parsing on this silver data and compare performance to the same components when trained on the gold, original PDTB corpus.
引用
收藏
页码:1044 / 1050
页数:7
相关论文
共 50 条
  • [41] Cross-lingual Transfer from Large Multilingual Translation Models to Unseen Under-resourced Languages
    Tars, Maali
    Tattar, Andre
    Fishel, Mark
    BALTIC JOURNAL OF MODERN COMPUTING, 2022, 10 (03): : 435 - 446
  • [42] Using Resource-Rich Languages to Improve Morphological Analysis of Under-Resourced Languages
    Baumann, Peter
    Pierrehumbert, Janet
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 3355 - 3359
  • [43] YAST : A scalable ASR toolkit especially designed for under-resourced languages
    Ferreira, Emmanuel
    Nocera, Pascal
    Goudi, Maria
    Ngoc Diep Do Thi
    2012 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2012), 2012, : 141 - 144
  • [44] Analysis of Mismatched Transcriptions Generated by Humans and Machines for Under-Resourced Languages
    Do, Van Hai
    Chen, Nancy F.
    Lim, Boon Pang
    Hasegawa-Johnson, Mark
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3863 - 3867
  • [45] Creating language resources for under-resourced languages: methodologies, and experiments with Arabic
    El-Haj, Mahmoud
    Kruschwitz, Udo
    Fox, Chris
    LANGUAGE RESOURCES AND EVALUATION, 2015, 49 (03) : 549 - 580
  • [46] A Review on Speech Recognition for Under-Resourced Languages: A Case Study of Vietnamese
    Phung, Trung-Nghia
    Nguyen, Duc-Binh
    Pham, Ngoc-Phuong
    INTERNATIONAL JOURNAL OF KNOWLEDGE AND SYSTEMS SCIENCE, 2024, 15 (01)
  • [47] Creating language resources for under-resourced languages: methodologies, and experiments with Arabic
    Mahmoud El-Haj
    Udo Kruschwitz
    Chris Fox
    Language Resources and Evaluation, 2015, 49 : 549 - 580
  • [48] Multilingual Sentiment Analysis for Under-Resourced Languages: A Systematic Review of the Landscape
    Mabokela, Koena Ronny
    Celik, Turgay
    Raborife, Mpho
    IEEE ACCESS, 2023, 11 : 15996 - 16020
  • [49] Automatic Speech Recognition for Under-Resourced Languages: Application to Vietnamese Language
    Le, Viet-Bac
    Besacier, Laurent
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (08): : 1471 - 1482
  • [50] Multilingual Query by Example Spoken Term Detection for Under-Resourced Languages
    Buzo, Andi
    Cucu, Horia
    Safta, Mihai
    Burileanu, Corneliu
    2013 7TH CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN - COMPUTER DIALOGUE (SPED), 2013,