The Task of Post-Editing Machine Translation for the Low-Resource Language

被引:4
|
作者
Rakhimova, Diana [1 ,2 ]
Karibayeva, Aidana [1 ,2 ]
Turarbek, Assem [1 ]
机构
[1] Al Farabi Kazakh Natl Univ, Dept Informat Syst, Alma Ata 050040, Kazakhstan
[2] Inst Informat & Comp Technol, Alma Ata 050010, Kazakhstan
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 02期
关键词
machine translation; post-editing machine translation; light post-editing; full post-editing; BRNN; transformer; English; Kazakh; Uzbek; Russian; HANDLING UNKNOWN WORDS; PRODUCT;
D O I
10.3390/app14020486
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
In recent years, machine translation has made significant advancements; however, its effectiveness can vary widely depending on the language pair. Languages with limited resources, such as Kazakh, Uzbek, Kalmyk, Tatar, and others, often encounter challenges in achieving high-quality machine translations. Kazakh is an agglutinative language with complex morphology, making it a low-resource language. This article addresses the task of post-editing machine translation for the Kazakh language. The research begins by discussing the history and evolution of machine translation and how it has developed to meet the unique needs of languages with limited resources. The research resulted in the development of a machine translation post-editing system. The system utilizes modern machine learning methods, starting with neural machine translation using the BRNN model in the initial post-editing stage. Subsequently, the transformer model is applied to further edit the text. Complex structural and grammatical forms are processed, and abbreviations are replaced. Practical experiments were conducted on various texts: news publications, legislative documents, IT sphere, etc. This article serves as a valuable resource for researchers and practitioners in the field of machine translation, shedding light on effective post-editing strategies to enhance translation quality, particularly in scenarios involving languages with limited resources such as Kazakh and Uzbek. The obtained results were tested and evaluated using specialized metrics-BLEU, TER, and WER.
引用
收藏
页数:19
相关论文
共 50 条
  • [31] System for Post-Editing and Automatic Error Classification of Machine Translation
    Munkova, Dasa
    Kapusta, Jozef
    Drlik, Martin
    DIVAI 2016: 11TH INTERNATIONAL SCIENTIFIC CONFERENCE ON DISTANCE LEARNING IN APPLIED INFORMATICS, 2016, : 571 - 579
  • [32] Re-thinking Machine Translation Post-Editing Guidelines
    Perez, Celia Rico
    JOURNAL OF SPECIALISED TRANSLATION, 2024, (41): : 26 - 47
  • [33] The Role of Machine Translation Quality Estimation in the Post-Editing Workflow
    Bechara, Hannah
    Orasan, Constantin
    Escartin, Carla Parra
    Zampieri, Marcos
    Lowe, William
    INFORMATICS-BASEL, 2021, 8 (03):
  • [34] Multidimensional strategy for the selection of machine translation candidates for post-editing
    de Gibert, Ona
    Aranberri, Nora
    LINGUAMATICA, 2019, 11 (02): : 3 - 16
  • [35] Reasons for the increasing use of Machine translation followed by post-editing
    Sanchez-Martinez, Felipe
    TRADUMATICA-TRADUCCIO I TECNOLOGIES DE LA INFORMACIO I LA COMUNICACIO, 2012, (10): : 150 - 156
  • [36] The raw machine translation to the professional advanced post-editing: the case of financial translation
    Peraldi, Sandrine
    REVUE FRANCAISE DE LINGUISTIQUE APPLIQUEE, 2016, 21 (01): : 67 - 90
  • [37] Translation Quality and Error Recognition in Professional Neural Machine Translation Post-Editing
    Vardaro, Jennifer
    Schaeffer, Moritz
    Hansen-Schirra, Silvia
    INFORMATICS-BASEL, 2019, 6 (03):
  • [38] Survey of Low-Resource Machine Translation
    Haddow, Barry
    Bawden, Rachel
    Barone, Antonio Valerio Miceli
    Helcl, Jindrich
    Birch, Alexandra
    COMPUTATIONAL LINGUISTICS, 2022, 48 (03) : 673 - 732
  • [39] IntelliCAT: Intelligent Machine Translation Post-Editing with Quality Estimation and Translation Suggestion
    Lee, Dongjun
    Ahn, Junhyeong
    Park, Heesoo
    Jo, Jaemin
    ACL-IJCNLP 2021: THE JOINT CONFERENCE OF THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE SYSTEM DEMONSTRATIONS, 2021, : 11 - 19
  • [40] Unsupervised Multimodal Machine Translation for Low-resource Distant Language Pairs
    Tayir, Turghun
    Li, Lin
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (04)