Simplification of English and Bengali Sentences for Improving Quality of Machine Translation

被引:0
|
作者
Sainik Kumar Mahata
Avishek Garain
Dipankar Das
Sivaji Bandyopadhyay
机构
[1] Institute of Engineering and Management,
[2] Jadavpur University,undefined
来源
Neural Processing Letters | 2022年 / 54卷
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Training translation systems with complex and compound sentences are generally considered computationally tough and such systems fail to process, the large syntactical information given out by these sentences. This issue subsequently, affects the overall quality of translations. On the other hand, simple sentences are shorter by nature and produce less syntactical information. Therefore, it would be safe to say that training translation systems using simple sentences only, would result in better translation output. However, training a translation system requires a large and quality parallel corpus involving two natural languages. While parallel corpus for various language pairs is abundant, such lexicons for low-resourced languages, consisting of only simple sentences are rare. In such a scenario, the development of such a parallel lexicon is the initial purpose of the present work. Building the same would require differentiating complex and/or compound sentences from the overall corpus and then converting them into simple sentences. Since, the work includes two languages, English and Bengali, different algorithms to accomplish the same, is documented in this paper. Converting complex and compound sentences to simple instances results in fragmenting sentences into two or more segments, which then needs to be aligned to make them semantically similar. Hence, a basic alignment technique has also been proposed to mitigate this problem. After developing the parallel corpus, we needed to check for its effectiveness in solving the quality issues of translation systems discussed earlier. For this, state-of-the-art translation modules like Statistical Machine Translation and Neural Machine Translation, have been trained using the developed corpus as well as with the raw parallel corpus consisting of sentences of mixed complexities. The performance of these translation models has been compared using automated as well as manual evaluation metrics. The results are promising and prove that translation systems do perform better when trained using simple sentence language pairs.
引用
收藏
页码:3115 / 3139
页数:24
相关论文
共 50 条
  • [1] Simplification of English and Bengali Sentences for Improving Quality of Machine Translation
    Mahata, Sainik Kumar
    Garain, Avishek
    Das, Dipankar
    Bandyopadhyay, Sivaji
    NEURAL PROCESSING LETTERS, 2022, 54 (04) : 3115 - 3139
  • [2] A Bilingual Machine Translation System: English & Bengali
    Adak, Chandranath
    2014 FIRST INTERNATIONAL CONFERENCE ON AUTOMATION, CONTROL, ENERGY & SYSTEMS (ACES-14), 2014, : 271 - 274
  • [3] Comma Analysis and Processing for Improving Translation Quality of Long Sentences in Rule-based English-Korean Machine Translation
    Kim, Sung-Dong
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE (ICAART), VOL 2, 2019, : 474 - 479
  • [4] Evaluation of Machine Translation Approaches to Translate English to Bengali
    Nahar, Shamsun
    Huda, Mohammad Nurul
    Nur-E-Arefin, Md.
    Rahman, Mohammad Mahbubur
    2017 20TH INTERNATIONAL CONFERENCE OF COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2017,
  • [5] Machine translation of simple English sentences to Hindi
    Ahmed, Mansoor
    Bhattacharyya, S.K.
    Advances in Modelling and Analysis B: Signals, Information, Data, Patterns, 1995, 33 (1-3): : 13 - 26
  • [6] Improving the Rule based Machine Translation System using Sentence Simplification (English to Tamil)
    Kavirajan, B.
    Kumar, Anand M.
    Soman, K. P.
    Rajendran, S.
    Vaithehi, S.
    2017 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2017, : 957 - 963
  • [7] Improving parsing of 'BA' sentences for machine translation
    Yin, Dapeng
    Shao, Min
    Ren, Fuji
    Kuroiwa, Shingo
    IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING, 2008, 3 (01) : 106 - 112
  • [8] Enhancement of English-Bengali Machine Translation Leveraging Back-Translation
    Mondal, Subrota Kumar
    Wang, Chengwei
    Chen, Yijun
    Cheng, Yuning
    Huang, Yanbo
    Dai, Hong-Ning
    Kabir, H. M. Dipu
    APPLIED SCIENCES-BASEL, 2024, 14 (15):
  • [9] One-Expression Classification in Bengali and its role in Bengali-English Machine Translation
    Senapati, Apurbalal
    Garain, Utpal
    PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2014), 2014, : 162 - 165
  • [10] An intelligent algorithm for fast machine translation of long English sentences
    He, Hengheng
    JOURNAL OF INTELLIGENT SYSTEMS, 2023, 32 (01)