Simplification of English and Bengali Sentences for Improving Quality of Machine Translation

被引:4
|
作者
Mahata, Sainik Kumar [1 ]
Garain, Avishek [2 ]
Das, Dipankar [2 ]
Bandyopadhyay, Sivaji [2 ]
机构
[1] Inst Engn & Management, Kolkata, India
[2] Jadavpur Univ, Kolkata, India
关键词
WEB;
D O I
10.1007/s11063-022-10755-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Training translation systems with complex and compound sentences are generally considered computationally tough and such systems fail to process, the large syntactical information given out by these sentences. This issue subsequently, affects the overall quality of translations. On the other hand, simple sentences are shorter by nature and produce less syntactical information. Therefore, it would be safe to say that training translation systems using simple sentences only, would result in better translation output. However, training a translation system requires a large and quality parallel corpus involving two natural languages. While parallel corpus for various language pairs is abundant, such lexicons for low-resourced languages, consisting of only simple sentences are rare. In such a scenario, the development of such a parallel lexicon is the initial purpose of the present work. Building the same would require differentiating complex and/or compound sentences from the overall corpus and then converting them into simple sentences. Since, the work includes two languages, English and Bengali, different algorithms to accomplish the same, is documented in this paper. Converting complex and compound sentences to simple instances results in fragmenting sentences into two or more segments, which then needs to be aligned to make them semantically similar. Hence, a basic alignment technique has also been proposed to mitigate this problem. After developing the parallel corpus, we needed to check for its effectiveness in solving the quality issues of translation systems discussed earlier. For this, state-of-the-art translation modules like Statistical Machine Translation and Neural Machine Translation, have been trained using the developed corpus as well as with the raw parallel corpus consisting of sentences of mixed complexities. The performance of these translation models has been compared using automated as well as manual evaluation metrics. The results are promising and prove that translation systems do perform better when trained using simple sentence language pairs.
引用
收藏
页码:3115 / 3139
页数:25
相关论文
共 50 条
  • [1] Simplification of English and Bengali Sentences for Improving Quality of Machine Translation
    Sainik Kumar Mahata
    Avishek Garain
    Dipankar Das
    Sivaji Bandyopadhyay
    [J]. Neural Processing Letters, 2022, 54 : 3115 - 3139
  • [2] A Bilingual Machine Translation System: English & Bengali
    Adak, Chandranath
    [J]. 2014 FIRST INTERNATIONAL CONFERENCE ON AUTOMATION, CONTROL, ENERGY & SYSTEMS (ACES-14), 2014, : 271 - 274
  • [3] Comma Analysis and Processing for Improving Translation Quality of Long Sentences in Rule-based English-Korean Machine Translation
    Kim, Sung-Dong
    [J]. PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE (ICAART), VOL 2, 2019, : 474 - 479
  • [4] Evaluation of Machine Translation Approaches to Translate English to Bengali
    Nahar, Shamsun
    Huda, Mohammad Nurul
    Nur-E-Arefin, Md.
    Rahman, Mohammad Mahbubur
    [J]. 2017 20TH INTERNATIONAL CONFERENCE OF COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2017,
  • [5] Improving the Rule based Machine Translation System using Sentence Simplification (English to Tamil)
    Kavirajan, B.
    Kumar, Anand M.
    Soman, K. P.
    Rajendran, S.
    Vaithehi, S.
    [J]. 2017 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2017, : 957 - 963
  • [6] Improving parsing of 'BA' sentences for machine translation
    Yin, Dapeng
    Shao, Min
    Ren, Fuji
    Kuroiwa, Shingo
    [J]. IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING, 2008, 3 (01) : 106 - 112
  • [7] Enhancement of English-Bengali Machine Translation Leveraging Back-Translation
    Mondal, Subrota Kumar
    Wang, Chengwei
    Chen, Yijun
    Cheng, Yuning
    Huang, Yanbo
    Dai, Hong-Ning
    Kabir, H. M. Dipu
    [J]. APPLIED SCIENCES-BASEL, 2024, 14 (15):
  • [8] One-Expression Classification in Bengali and its role in Bengali-English Machine Translation
    Senapati, Apurbalal
    Garain, Utpal
    [J]. PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2014), 2014, : 162 - 165
  • [9] An intelligent algorithm for fast machine translation of long English sentences
    He, Hengheng
    [J]. JOURNAL OF INTELLIGENT SYSTEMS, 2023, 32 (01)
  • [10] Modern Bengali poetry in English translation
    Dutta, K
    [J]. AGENDA, 1998, 36 (01): : 226 - 230