Simplification of English and Bengali Sentences for Improving Quality of Machine Translation

被引:4
|
作者
Mahata, Sainik Kumar [1 ]
Garain, Avishek [2 ]
Das, Dipankar [2 ]
Bandyopadhyay, Sivaji [2 ]
机构
[1] Inst Engn & Management, Kolkata, India
[2] Jadavpur Univ, Kolkata, India
关键词
WEB;
D O I
10.1007/s11063-022-10755-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Training translation systems with complex and compound sentences are generally considered computationally tough and such systems fail to process, the large syntactical information given out by these sentences. This issue subsequently, affects the overall quality of translations. On the other hand, simple sentences are shorter by nature and produce less syntactical information. Therefore, it would be safe to say that training translation systems using simple sentences only, would result in better translation output. However, training a translation system requires a large and quality parallel corpus involving two natural languages. While parallel corpus for various language pairs is abundant, such lexicons for low-resourced languages, consisting of only simple sentences are rare. In such a scenario, the development of such a parallel lexicon is the initial purpose of the present work. Building the same would require differentiating complex and/or compound sentences from the overall corpus and then converting them into simple sentences. Since, the work includes two languages, English and Bengali, different algorithms to accomplish the same, is documented in this paper. Converting complex and compound sentences to simple instances results in fragmenting sentences into two or more segments, which then needs to be aligned to make them semantically similar. Hence, a basic alignment technique has also been proposed to mitigate this problem. After developing the parallel corpus, we needed to check for its effectiveness in solving the quality issues of translation systems discussed earlier. For this, state-of-the-art translation modules like Statistical Machine Translation and Neural Machine Translation, have been trained using the developed corpus as well as with the raw parallel corpus consisting of sentences of mixed complexities. The performance of these translation models has been compared using automated as well as manual evaluation metrics. The results are promising and prove that translation systems do perform better when trained using simple sentence language pairs.
引用
收藏
页码:3115 / 3139
页数:25
相关论文
共 50 条
  • [41] Syntax Analysis and Machine Translation of Bangla Sentences
    Anwar, Md. Musfique
    Anwar, Mohammad Zabed
    Bhuiyan, Md. Al-Amin
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2009, 9 (08): : 317 - 326
  • [42] Source sentence simplification for statistical machine translation
    Hasler, Eva
    de Gispert, Adria
    Stahiberg, Felix
    Waite, Aurelien
    Byrne, Bill
    [J]. COMPUTER SPEECH AND LANGUAGE, 2017, 45 : 221 - 235
  • [43] A quality assessment of Korean-English patent machine translation
    Lee, Jieun
    Choi, Hyoeun
    [J]. FORUM-REVUE INTERNATIONALE D INTERPRETATION ET DE TRADUCTION-INTERNATIONAL JOURNAL OF INTERPRETATION AND TRANSLATION, 2023, 21 (02): : 236 - 257
  • [44] Text Simplification Using Neural Machine Translation
    Wang, Tong
    Chen, Ping
    Rochford, John
    Qiang, Jipeng
    [J]. THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 4270 - 4271
  • [45] A Sentiment Classification in Bengali and Machine Translated English Corpus
    Sazzed, Salim
    Jayarathna, Sampath
    [J]. 2019 IEEE 20TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE (IRI 2019), 2019, : 107 - 114
  • [46] Improving Statistical Machine Translation Quality Using Differential Evolution
    Dugonik, Jani
    Boskovic, Borko
    Brest, Janez
    Sepesy Maucec, Mirjam
    [J]. INFORMATICA, 2019, 30 (04) : 629 - 645
  • [47] A Machine Learning Approach for English Sentences Classifier
    Al-Neami, Ahmed
    Al-Saedy, Hasan
    Richard, Gilles
    [J]. INNOVATION AND SUSTAINABLE COMPETITIVE ADVANTAGE: FROM REGIONAL DEVELOPMENT TO WORLD ECONOMIES, VOLS 1-5, 2012, : 80 - 85
  • [48] A Contrastive Analysis and Translation of Negative Sentences in English and in Chinese
    尚华
    [J]. 青春岁月, 2013, (13) : 132 - 132
  • [49] On the Subject Translation of English Inanimate-subject Sentences
    肖雪颖
    朱宇丹
    [J]. 海外英语, 2019, (15) : 193 - 195
  • [50] Research on Machine Translation Method of English-Chinese Long Sentences Based on Fuzzy Semantic Optimization
    Dong, Zhaofeng
    [J]. MOBILE INFORMATION SYSTEMS, 2022, 2022