Corpus-Based Translation Automation of Adaptable Corpus Translation Module

被引:0
|
作者
Lutskiv, Andriy [1 ]
Lutsyshyn, Roman [1 ]
机构
[1] Ternopil Ivan Puluj Natl Tech Univ, Ruska 56, UA-46001 Ternopol, Ukraine
关键词
Natural language processing; neural machine translation; corpus-based translation; Transformer model; sequence to sequence model; deep learning; machine learning; adaptable corpus tool; Computer-Aided Translation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Thesis deals with the translation module development which automates corpus-based translation. The translation module is a part of an adaptable corpus tool and is implemented as a separate microservice. This translation module provides for the linguist and translator the ability to translate texts with conveying their style. The translation module is domain-specific oriented which allows to convey text style better than public cloud translation services. In this research religious and historical texts were analyzed. Neural machine translation method was justified and used. Sequence to sequence Transformer model as a neural network model was chosen. All stages of text processing by the Transformer model which based on the Multi-Head Attention mechanism were analyzed. Software libraries and toolkits for the Sequence to sequence Transformer model were analyzed and chosen. Based on chosen software libraries implemented own Transformer model implementation. Developed model comprises text preprocessing and neural network model implementation. Cost-efficient computer system which comprises hardware and software components for neural network model training was configured. Based on heuristic approach by carrying out computational experiments neural network model hyper-parameters were chosen and justified. Loss function, learning rate, perplexity and BLEU as a key model training criterion were analyzed and applied. Training and test samples of text data sets were prepared. Training and test data sets comprise language pairs of Ukrainian text fragments and their English equivalents. Configured neural network model was trained and tested. Automatic assessment approach of trained model which based on semantic closeness was suggested and tested.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Corpus-based interpreting studies as an offshoot of corpus-based translation studies
    Shlesinger, M
    [J]. META, 1998, 43 (04) : 486 - 493
  • [2] Corpus-based translation studies and historical journalistic translation
    Mclaughlin, Mairi
    [J]. META, 2022, 67 (01): : 170 - 189
  • [3] Introducing corpus-based translation studies
    Yan, Sheng
    Lei, Lei
    [J]. POZNAN STUDIES IN CONTEMPORARY LINGUISTICS, 2020, 56 (02): : 365 - 371
  • [4] Style in Translation: A Corpus-Based Perspective
    Liang, Linxin
    Xu, Mingwu
    [J]. BABEL-REVUE INTERNATIONALE DE LA TRADUCTION-INTERNATIONAL JOURNAL OF TRANSLATION, 2016, 62 (01): : 165 - 168
  • [5] Introducing corpus-based translation studies
    Liang, Linxin
    Xu, Mingwu
    [J]. INTERPRETER AND TRANSLATOR TRAINER, 2017, 11 (2-3): : 241 - 243
  • [6] Phraseology in Corpus-Based Translation Studies
    Rica Peromingo, Juan Pedro
    [J]. ESTUDIOS DE TRADUCCION, 2013, 3 : 339 - 341
  • [7] Introducing corpus-based translation studies
    Hardmeier, Christian
    [J]. MACHINE TRANSLATION, 2016, 30 (1-2) : 117 - 120
  • [8] Corpus-based Disambiguation for Machine Translation
    Baisa, Vit
    [J]. RASLAN 2011: RECENT ADVANCES IN SLAVONIC NATURAL LANGUAGE PROCESSING: FIFTH WORKSHOP, 2011, : 81 - 87
  • [9] Corpus-Based Translation Studies in the Academy
    Mona Baker
    [J]. 外国语(上海外国语大学学报), 2007, (05) : 50 - 55
  • [10] OBITUARIES IN TRANSLATION: A CORPUS-BASED STUDY
    Rebechi, Rozane Rodrigues
    [J]. CADERNOS DE TRADUCAO, 2018, 38 (03): : 298 - 318