SHARING HIGH-QUALITY LANGUAGE RESOURCES IN THE LEGAL DOMAIN TO DEVELOP NEURAL MACHINE TRANSLATION FOR UNDER-RESOURCED EUROPEAN LANGUAGES

被引:4
|
作者
Bago, Petra [1 ]
Castilho, Sheila [2 ]
Celeste, Edoardo [3 ,4 ]
Dunne, Jane [2 ]
Gaspari, Federico [2 ]
Gislason, Niels Runar [5 ]
Kasen, Andre
Klubicka, Filip [1 ]
Kristmannsson, Gauti [5 ]
McHugh, Helen [2 ]
Moran, Roisin [7 ]
Ni Loinsigh, Orla [2 ]
Olsen, Jon Arild [6 ]
Escartin, Carla Parra [7 ]
Ramesh, Akshai [7 ]
Resende, Natalia [2 ]
Sheridan, Paraic [7 ]
Way, Andy [2 ]
机构
[1] Univ Zagreb, Fac Humanities & Social Sci, Zagreb, Croatia
[2] Dublin City Univ, ADAPT Ctr, Dublin, Ireland
[3] Dublin City Univ, Sch Law & Govt, Dublin, Ireland
[4] ADAPT Ctr, Limerick, Ireland
[5] Univ Iceland, Reykjavik, Iceland
[6] Natl Lib Norway, Oslo, Norway
[7] Icon Translat Machines Ltd, Dublin, Ireland
基金
爱尔兰科学基金会;
关键词
language resources; under-resourced languages; legal translation; neural machine translation; evaluation;
D O I
10.2436/rld.i78.2022.3741
中图分类号
D9 [法律]; DF [法律];
学科分类号
0301 ;
摘要
This article reports some of the main achievements of the European Union-funded PRINCIPLE project in collecting high-quality language resources (LRs) in the legal domain for four under-resourced European languages: Croatian, Irish, Norwegian, and Icelandic. After illustrating the significance of this work for developing translation technologies in the context of the European Union and the European Economic Area, the article outlines the main steps of data collection, curation, and sharing of the LRs gathered with the support of public and private data contributors. This is followed by a description of the development pipeline and key features of the state-of-the-art, bespoke neural machine translation (MT) engines for the legal domain that were built using this data. The MT systems were evaluated with a combination of automatic and human methods to validate the quality of the LRs collected in the project, and the high-quality LRs were subsequently shared with the wider community via the ELRC-SHARE repository. The main challenges encountered in this work are discussed, emphasising the importance and the key benefits of sharing high-quality digital LRs.
引用
收藏
页码:9 / 34
页数:26
相关论文
共 15 条
  • [11] Marian: Cost-effective High-Quality Neural Machine Translation in C plus
    Junczys-Dowmunt, Marcin
    Heafield, Kenneth
    Hieu Hoang
    Grundkiewicz, Roman
    Aue, Anthony
    NEURAL MACHINE TRANSLATION AND GENERATION, 2018, : 129 - 135
  • [12] A Post-Editing Dataset in the Legal Domain: Do we Underestimate Neural Machine Translation Quality?
    Ive, Julia
    Specia, Lucia
    Szoc, Sara
    Vanallemeersch, Tom
    Van den Bogaert, Joachim
    Farah, Eduardo
    Maroti, Christine
    Ventura, Artur
    Khalilov, Maxim
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3692 - 3697
  • [13] Sharing mechanism of high-quality english education resources under the influence of COVID-19
    Niu, Zhen
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2021, 128 : 229 - 229
  • [14] Study on Economics Talents Practice Teaching Coordinated Development under the Concept of High-quality Education Resources Sharing
    Wang Xin
    PROCEEDINGS OF THE 10TH EURO-ASIA CONFERENCE ON ENVIRONMENT AND CORPORATE SOCIAL RESPONSIBILITY: TOURISM, SOCIETY AND EDUCATION SESSION, PT III, 2014, : 198 - 202
  • [15] Sharing mechanism of high quality legal teaching resources under the influence of COVID-19 epidemic prevention and control
    Liu, Haizhi
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2021, 128 : 241 - 241