Multilingual Extension of PDTB-Style Annotation: The Case of TED Multilingual Discourse Bank

被引:0
|
作者
Zeyrek, Deniz [1 ]
Mendes, Amalia [2 ]
Kurfali, Murathan [1 ]
机构
[1] Middle East Tech Univ, Informat Inst, Ankara, Turkey
[2] Univ Lisbon, Ctr Linguist, Lisbon, Portugal
关键词
discourse; parallel; multilingual corpus;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We introduce TED-Multilingual Discourse Bank, a corpus of TED talks transcripts in 6 languages (English, German, Polish, European Portuguese, Russian and Turkish), where the ultimate aim is to provide a clearly described level of discourse structure and semantics in multiple languages. The corpus is manually annotated following the goals and principles of PDTB, involving explicit and implicit discourse connectives, entity relations, alternative lexicalizations and no relations. In the corpus, we also aim to capture the characteristics of spoken language that exist in the transcripts and adapt the PDTB scheme according to our aims; for example, we introduce hypophora. We spot other aspects of spoken discourse such as the discourse marker use of connectives to keep them distinct from their discourse connective use. TED-MDB is, to the best of our knowledge, one of the few multilingual discourse treebanks and is hoped to be a source of parallel data for contrastive linguistic analysis as well as language technology applications. We describe the corpus, the annotation procedure and provide preliminary corpus statistics.
引用
收藏
页码:1913 / 1919
页数:7
相关论文
共 11 条
  • [1] TED Multilingual Discourse Bank (TED-MDB): a parallel corpus annotated in the PDTB style
    Zeyrek, Deniz
    Mendes, Amalia
    Grishina, Yulia
    Kurfali, Murathan
    Gibbon, Samuel
    Ogrodniczuk, Maciej
    LANGUAGE RESOURCES AND EVALUATION, 2020, 54 (02) : 587 - 613
  • [2] TED Multilingual Discourse Bank (TED-MDB): a parallel corpus annotated in the PDTB style
    Deniz Zeyrek
    Amália Mendes
    Yulia Grishina
    Murathan Kurfalı
    Samuel Gibbon
    Maciej Ogrodniczuk
    Language Resources and Evaluation, 2020, 54 : 587 - 613
  • [3] Quantitative Aspects of PDTB-Style Discourse Relations across Languages
    Sun, Kun
    Zhang, Lili
    JOURNAL OF QUANTITATIVE LINGUISTICS, 2018, 25 (04) : 342 - 371
  • [4] Explicitness and implicitness of discourse relations in a multilingual discourse bank
    Mendes, Amalia
    Zeyrek, Deniz
    Oleskeviciene, Giedre
    FUNCTIONS OF LANGUAGE, 2023, 30 (01) : 67 - 91
  • [5] DISCOURSE THERAPY IN MULTILINGUAL APHASIA - A CASE-STUDY
    PENN, C
    BEECHAM, R
    CLINICAL LINGUISTICS & PHONETICS, 1992, 6 (1-2) : 11 - 25
  • [6] Multilingual Research Writing beyond English: The Case of Norwegian Academic Discourse in an Era of Multilingual Publication Practices
    Solli, Kristin
    Odemark, Ingjerd Legreid
    PUBLICATIONS, 2019, 7 (02):
  • [7] Pragmatic Annotation of Discourse Markers in a Multilingual Parallel Corpus (Arabic-Spanish-English)
    Samy, Doaa
    Gonzalez-Ledesma, Ana
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 3299 - 3305
  • [8] Multilingual Crowdsourcing Motivation on Global Social Media. Case Study: TED OTP
    de la Fuente, Lidia Camara
    SENDEBAR-REVISTA DE TRADUCCION E INTERPRETACION, 2014, (25): : 197 - 218
  • [9] Evaluating the Evaluation Metrics for Style Transfer: A Case Study in Multilingual Formality Transfer
    Briakou, Eleftheria
    Agrawals, Sweta
    Tetreault, Joel
    Carpuat, Marine
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 1321 - 1336
  • [10] Parallel digital monolingualism A Canadian case study of language ideologies and hashtags in multilingual digital discourse
    Vessey, Rachelle
    INTERNET PRAGMATICS, 2023, 6 (01): : 107 - 128