Shallow Discourse Parsing for Under-Resourced Languages: Combining Machine Translation and Annotation Projection

被引:0
|
作者
Sluyter-Gaethje, Henny [1 ]
Bourgonje, Peter [1 ]
Stede, Manfred [1 ]
机构
[1] Univ Potsdam, Appl Computat Linguist, Potsdam, Germany
关键词
machine translation; annotation projection; discourse parsing;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Shallow Discourse Parsing (SDP), the identification of coherence relations between text spans, relies on large amounts of training data, which so far exists only for English - any other language is in this respect an under-resourced one. For those languages where machine translation from English is available with reasonable quality, MT in conjunction with annotation projection can be an option for producing an SDP resource. In our study, we translate the English Penn Discourse TreeBank into German and experiment with various methods of annotation projection to arrive at the German counterpart of the PDTB. We describe the key characteristics of the corpus as well as some typical sources of errors encountered during its creation. Then we evaluate the GermanPDTB by training components for selected sub-tasks of discourse parsing on this silver data and compare performance to the same components when trained on the gold, original PDTB corpus.
引用
收藏
页码:1044 / 1050
页数:7
相关论文
共 50 条
  • [31] Multi-task learning in under-resourced Dravidian languages
    Adeep Hande
    Siddhanth U. Hegde
    Bharathi Raja Chakravarthi
    Journal of Data, Information and Management, 2022, 4 (2): : 137 - 165
  • [32] Network-Enabled Keyword Extraction for Under-Resourced Languages
    Beliga, Slobodan
    Martincic-Ipsic, Sanda
    SEMANTIC KEYWORD-BASED SEARCH ON STRUCTURED DATA SOURCES, IKC 2016, 2017, 10151 : 124 - 135
  • [33] Text Spotting In Large Speech Databases For Under-Resourced Languages
    Buzo, Andi
    Cucu, Horia
    Burileanu, Corneliu
    2013 7TH CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN - COMPUTER DIALOGUE (SPED), 2013,
  • [34] A Statistical Method for Translating Chinese into Under-resourced Minority Languages
    Chen, Lei
    Li, Miao
    Zhang, Jian
    Zhu, Zede
    Yang, Zhenxin
    MACHINE TRANSLATION, CWMT 2014, 2014, 493 : 49 - 60
  • [35] Automating the Creation of Speech Recognition Systems for Under-Resourced Languages
    Khusainov, Aidar
    Suleymanov, Dzhavdet
    2015 FOURTEENTH MEXICAN INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (MICAI), 2015, : 28 - 32
  • [36] Mismatched Crowdsourcing based Language Perception for Under-resourced Languages
    Chen, Wenda
    Hasegawa-Johnson, Mark
    Chen, Nancy F.
    SLTU-2016 5TH WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGIES FOR UNDER-RESOURCED LANGUAGES, 2016, 81 : 23 - 29
  • [37] Towards Learning Morphology for Under-Resourced Fusional and Agglutinating Languages
    Shalonova, Ksenia
    Golenia, Bruno
    Flach, Peter
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (05): : 956 - 965
  • [38] Speech recognition of under-resourced languages using mismatched transcriptions
    Do, Van Hai
    Chen, Nancy F.
    Lim, Boon Pang
    Hasegawa-Johnson, Mark
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2016, : 112 - 115
  • [39] An Unsupervised Word Sense Disambiguation System for Under-Resourced Languages
    Ustalov, Dmitry
    Teslenko, Denis
    Panchenko, Alexander
    Chernoskutov, Mikhail
    Biemann, Chris
    Ponzetto, Simone Paolo
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 1018 - 1022
  • [40] Cross-lingual acoustic modeling for under-resourced languages
    Song, Meixu
    Zhang, Qingqing
    Pan, Jielin
    Yan, Yonghong
    Journal of Computational Information Systems, 2015, 11 (14): : 5039 - 5046