Simplification of English and Bengali Sentences for Improving Quality of Machine Translation

被引：4

作者：

Mahata, Sainik Kumar ^{[1
]}

Garain, Avishek ^{[2
]}

Das, Dipankar ^{[2
]}

Bandyopadhyay, Sivaji ^{[2
]}

机构：

[1] Inst Engn & Management, Kolkata, India

[2] Jadavpur Univ, Kolkata, India

来源：

NEURAL PROCESSING LETTERS | 2022年 / 54卷 / 04期

关键词：

WEB;

D O I：

10.1007/s11063-022-10755-3

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Training translation systems with complex and compound sentences are generally considered computationally tough and such systems fail to process, the large syntactical information given out by these sentences. This issue subsequently, affects the overall quality of translations. On the other hand, simple sentences are shorter by nature and produce less syntactical information. Therefore, it would be safe to say that training translation systems using simple sentences only, would result in better translation output. However, training a translation system requires a large and quality parallel corpus involving two natural languages. While parallel corpus for various language pairs is abundant, such lexicons for low-resourced languages, consisting of only simple sentences are rare. In such a scenario, the development of such a parallel lexicon is the initial purpose of the present work. Building the same would require differentiating complex and/or compound sentences from the overall corpus and then converting them into simple sentences. Since, the work includes two languages, English and Bengali, different algorithms to accomplish the same, is documented in this paper. Converting complex and compound sentences to simple instances results in fragmenting sentences into two or more segments, which then needs to be aligned to make them semantically similar. Hence, a basic alignment technique has also been proposed to mitigate this problem. After developing the parallel corpus, we needed to check for its effectiveness in solving the quality issues of translation systems discussed earlier. For this, state-of-the-art translation modules like Statistical Machine Translation and Neural Machine Translation, have been trained using the developed corpus as well as with the raw parallel corpus consisting of sentences of mixed complexities. The performance of these translation models has been compared using automated as well as manual evaluation metrics. The results are promising and prove that translation systems do perform better when trained using simple sentence language pairs.

引用

页码：3115 / 3139

页数：25

共 50 条

[1] Simplification of English and Bengali Sentences for Improving Quality of Machine Translation
Sainik Kumar Mahata
Avishek Garain
Dipankar Das
Sivaji Bandyopadhyay
[J]. Neural Processing Letters, 2022, 54 : 3115 - 3139
[2] A Bilingual Machine Translation System: English & Bengali
Adak, Chandranath
[J]. 2014 FIRST INTERNATIONAL CONFERENCE ON AUTOMATION, CONTROL, ENERGY & SYSTEMS (ACES-14), 2014, : 271 - 274
[3] Comma Analysis and Processing for Improving Translation Quality of Long Sentences in Rule-based English-Korean Machine Translation
Kim, Sung-Dong
[J]. PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE (ICAART), VOL 2, 2019, : 474 - 479
[4] Evaluation of Machine Translation Approaches to Translate English to Bengali
Nahar, Shamsun
Huda, Mohammad Nurul
Nur-E-Arefin, Md.
Rahman, Mohammad Mahbubur
[J]. 2017 20TH INTERNATIONAL CONFERENCE OF COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2017,
[5] Improving the Rule based Machine Translation System using Sentence Simplification (English to Tamil)
Kavirajan, B.
Kumar, Anand M.
Soman, K. P.
Rajendran, S.
Vaithehi, S.
[J]. 2017 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2017, : 957 - 963
[6] Improving parsing of 'BA' sentences for machine translation
Yin, Dapeng
Shao, Min
Ren, Fuji
Kuroiwa, Shingo
[J]. IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING, 2008, 3 (01) : 106 - 112
[7] Enhancement of English-Bengali Machine Translation Leveraging Back-Translation
Mondal, Subrota Kumar
Wang, Chengwei
Chen, Yijun
Cheng, Yuning
Huang, Yanbo
Dai, Hong-Ning
Kabir, H. M. Dipu
[J]. APPLIED SCIENCES-BASEL, 2024, 14 (15):
[8] One-Expression Classification in Bengali and its role in Bengali-English Machine Translation
Senapati, Apurbalal
Garain, Utpal
[J]. PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2014), 2014, : 162 - 165
[9] An intelligent algorithm for fast machine translation of long English sentences
He, Hengheng
[J]. JOURNAL OF INTELLIGENT SYSTEMS, 2023, 32 (01)
[10] Modern Bengali poetry in English translation
Dutta, K
[J]. AGENDA, 1998, 36 (01): : 226 - 230

← 1 2 3 4 5 →