Design and compilation of a specialized Spanish-German parallel corpus

被引:0
|
作者
Escartin, Carla Parra [1 ]
机构
[1] Univ Bergen, Bergen, Norway
来源
LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2012年
关键词
corpus compilation; specialized parallel corpora; Machine Translation;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
This paper discusses the design and compilation of the TRIS corpus, a specialized parallel corpus of Spanish and German texts. It will be used for phraseological research aimed at improving statistical machine translation. The corpus is based on the European database of Technical Regulations Information System (TRIS), containing 995 original documents written in German and Spanish and their translations into Spanish and German respectively. This parallel corpus is under development and the first version with 97 aligned file pairs was released in the first META-NORD upload of metadata and resources in November 2011. The second version of the corpus, described in the current paper, contains 205 file pairs which have been completely aligned at sentence level, which account for approximately 1,563,000 words and 70,648 aligned sentence pairs.
引用
收藏
页码:2199 / 2206
页数:8
相关论文
共 50 条
  • [41] CEDEL2: Design, compilation and web interface of an online corpus for L2 Spanish acquisition research
    Lozano, Cristobal
    SECOND LANGUAGE RESEARCH, 2022, 38 (04) : 965 - 983
  • [42] Parallel corpus of Spanish, English and Chinese and corpus-based contrastive analysis of the past tense in Spanish
    Lu, Hui-Chuan
    Cheng, An Chung
    Yeh, Meng-Hsin
    Lu, Chao-Yi
    Di Lascio, Ruth Alegre
    LINGUAMATICA, 2021, 13 (01): : 23 - 30
  • [43] Balanced corpus of informal spoken Czech: compilation, design and findings
    Waclawicova, Martina
    Kren, Michal
    Valkova, Lucie
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1807 - 1810
  • [44] PROTOCOLIZED METHODOLOGY FOR COMPILATION OF A TRAVEL INSURANCE CORPUS: DESIGN AND REPRESENTATIVENESS
    Seghiri, Miriam
    RLA-REVISTA DE LINGUISTICA TEORICA Y APLICADA, 2011, 49 (02): : 13 - 30
  • [45] Design and compilation of syntactically tagged corpus of Japanese statutory sentences
    Ogawa, Yasuhiro
    Yamada, Masayuki
    Kato, Ryuta
    Toyama, Katsuhiko
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2011, 6797 LNAI : 141 - 152
  • [46] Spain is different. Spanish-German Auto- and Heterostereotypes in the Film Production during the Spanish Franco-Dictatorship
    Gutierrez Koester, Isabel
    REVISTA DE FILOLOGIA ALEMANA, 2013, 21 : 149 - 162
  • [47] A study on the translation of the lexicon related to olive-oil tourism promotion in Andalusia (Spanish-German)
    Montes Sanchez, Alba
    ONOMAZEIN, 2020, : 191 - 205
  • [48] Linguistic analysis contrastivo of specialized texts in Spanish and German
    Juan Batista, Jose
    REVISTA DE FILOLOGIA DE LA UNIVERSIDAD DE LA LAGUNA, 2009, 27 : 225 - 227
  • [49] Social reimbursement-the Spanish-German ENT Society's (SDGHNO) Latin America project
    Offergeld, C.
    Zahnert, T.
    Caro, J.
    Prieto, J. A.
    Centeno, J.
    Laszig, R.
    Schwager, K.
    Bockmuehl, U.
    Praetorius, M.
    Baumann, I.
    Bootz, F.
    Schmidt, T.
    Yepes, A.
    Schipper, J.
    HNO, 2019, 67 (07) : 515 - 518
  • [50] A contrastive study of the gender registration form to congress (Spanish-German) with didactic application for the translation class
    Sanchez Nieto, Maria Teresa
    TRANS-REVISTA DE TRADUCTOLOGIA, 2006, (10): : 113 - 134