Design and compilation of a specialized Spanish-German parallel corpus

被引:0
|
作者
Escartin, Carla Parra [1 ]
机构
[1] Univ Bergen, Bergen, Norway
来源
LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2012年
关键词
corpus compilation; specialized parallel corpora; Machine Translation;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
This paper discusses the design and compilation of the TRIS corpus, a specialized parallel corpus of Spanish and German texts. It will be used for phraseological research aimed at improving statistical machine translation. The corpus is based on the European database of Technical Regulations Information System (TRIS), containing 995 original documents written in German and Spanish and their translations into Spanish and German respectively. This parallel corpus is under development and the first version with 97 aligned file pairs was released in the first META-NORD upload of metadata and resources in November 2011. The second version of the corpus, described in the current paper, contains 205 file pairs which have been completely aligned at sentence level, which account for approximately 1,563,000 words and 70,648 aligned sentence pairs.
引用
收藏
页码:2199 / 2206
页数:8
相关论文
共 50 条
  • [31] Valor y al toro!: : cultural contrastive study of Spanish-German comics
    Lazaro, Carmen Cuellar
    ONOMAZEIN, 2024, : 188 - 206
  • [32] Spanish-German Military Collaboration during the Spanish Non-Belligerency: German Advice for the Defence of the Canary Islands in November 1942
    Diaz Benitez, Juan Jose
    WAR IN HISTORY, 2016, 23 (03) : 362 - 381
  • [33] Specialized corpus for the study of the Spanish language of geometry in the seventeenth century
    Sanchez Martin, Francisco Javier
    PHILOLOGICA CANARIENSIA, 2018, 24 : 113 - 130
  • [34] Building the Spanish-Croatian Parallel Corpus
    Mikelenic, Bojana
    Tadic, Marko
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3932 - 3936
  • [35] SUMAT: Data Collection and Parallel Corpus Compilation for Machine Translation of Subtitles
    Petukhova, Volha
    Agerri, Rodrigo
    Fishel, Mark
    Georgakopoulou, Yota
    Penkale, Sergio
    del Pozo, Arantza
    Maucec, Mirjam Sepesy
    Volk, Martin
    Way, Andy
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 21 - 28
  • [36] Gender acquisition in bilingual children: French-German, Italian-German, Spanish-German and Italian-French
    Eichler, Nadine
    Jansen, Veronika
    Mueller, Natascha
    INTERNATIONAL JOURNAL OF BILINGUALISM, 2013, 17 (05) : 550 - 572
  • [37] AGONY OF NEUTRAL SPANISH-GERMAN RELATIONS DURING SECOND WORLD WAR IN AZUL DIVISION - SPANISH - PROCTOR,R
    JACKSON, G
    AMERICAN HISTORICAL REVIEW, 1974, 79 (02): : 621 - 622
  • [38] Good practices in the compilation of FOLK, the Research and Teaching Corpus of Spoken German
    Schmidt, Thomas
    INTERNATIONAL JOURNAL OF CORPUS LINGUISTICS, 2016, 21 (03) : 396 - 418
  • [39] COMPILATION AND EVALUATION OF A CORPUS OF THE STANDARD WRITTEN GERMAN-LANGUAGE IN AUGSBURG
    GRASER, H
    SPRACHWISSENSCHAFT, 1993, 18 (2-3): : 174 - 187
  • [40] Corpus design and compilation process for the preparation of a bilingual glossary (English-Spanish) in the logistics and maritime transport field: LogisTRANS
    Araceli Losey-Leon, Maria
    32ND INTERNATIONAL CONFERENCE OF THE SPANISH ASSOCIATION OF APPLIED LINGUISTICS (AESLA): LANGUAGE INDUSTRIES AND SOCIAL CHANGE, 2015, 173 : 293 - 299