Simplification of English and Bengali Sentences for Improving Quality of Machine Translation

被引:0
|
作者
Sainik Kumar Mahata
Avishek Garain
Dipankar Das
Sivaji Bandyopadhyay
机构
[1] Institute of Engineering and Management,
[2] Jadavpur University,undefined
来源
Neural Processing Letters | 2022年 / 54卷
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Training translation systems with complex and compound sentences are generally considered computationally tough and such systems fail to process, the large syntactical information given out by these sentences. This issue subsequently, affects the overall quality of translations. On the other hand, simple sentences are shorter by nature and produce less syntactical information. Therefore, it would be safe to say that training translation systems using simple sentences only, would result in better translation output. However, training a translation system requires a large and quality parallel corpus involving two natural languages. While parallel corpus for various language pairs is abundant, such lexicons for low-resourced languages, consisting of only simple sentences are rare. In such a scenario, the development of such a parallel lexicon is the initial purpose of the present work. Building the same would require differentiating complex and/or compound sentences from the overall corpus and then converting them into simple sentences. Since, the work includes two languages, English and Bengali, different algorithms to accomplish the same, is documented in this paper. Converting complex and compound sentences to simple instances results in fragmenting sentences into two or more segments, which then needs to be aligned to make them semantically similar. Hence, a basic alignment technique has also been proposed to mitigate this problem. After developing the parallel corpus, we needed to check for its effectiveness in solving the quality issues of translation systems discussed earlier. For this, state-of-the-art translation modules like Statistical Machine Translation and Neural Machine Translation, have been trained using the developed corpus as well as with the raw parallel corpus consisting of sentences of mixed complexities. The performance of these translation models has been compared using automated as well as manual evaluation metrics. The results are promising and prove that translation systems do perform better when trained using simple sentence language pairs.
引用
收藏
页码:3115 / 3139
页数:24
相关论文
共 50 条
  • [41] Improving English-to-Indian Language Neural Machine Translation Systems
    Kandimalla, Akshara
    Lohar, Pintu
    Maji, Souvik Kumar
    Way, Andy
    INFORMATION, 2022, 13 (05)
  • [42] Can Text Simplification Help Machine Translation?
    Stajner, Sanja
    Popovic, Maja
    BALTIC JOURNAL OF MODERN COMPUTING, 2016, 4 (02): : 230 - 242
  • [43] Syntax Analysis and Machine Translation of Bangla Sentences
    Anwar, Md. Musfique
    Anwar, Mohammad Zabed
    Bhuiyan, Md. Al-Amin
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2009, 9 (08): : 317 - 326
  • [44] A quality assessment of Korean-English patent machine translation
    Lee, Jieun
    Choi, Hyoeun
    FORUM-REVUE INTERNATIONALE D INTERPRETATION ET DE TRADUCTION-INTERNATIONAL JOURNAL OF INTERPRETATION AND TRANSLATION, 2023, 21 (02): : 236 - 257
  • [45] Source sentence simplification for statistical machine translation
    Hasler, Eva
    de Gispert, Adria
    Stahiberg, Felix
    Waite, Aurelien
    Byrne, Bill
    COMPUTER SPEECH AND LANGUAGE, 2017, 45 : 221 - 235
  • [46] A Sentiment Classification in Bengali and Machine Translated English Corpus
    Sazzed, Salim
    Jayarathna, Sampath
    2019 IEEE 20TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE (IRI 2019), 2019, : 107 - 114
  • [47] Text Simplification Using Neural Machine Translation
    Wang, Tong
    Chen, Ping
    Rochford, John
    Qiang, Jipeng
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 4270 - 4271
  • [48] Improving Statistical Machine Translation Quality Using Differential Evolution
    Dugonik, Jani
    Boskovic, Borko
    Brest, Janez
    Sepesy Maucec, Mirjam
    INFORMATICA, 2019, 30 (04) : 629 - 645
  • [49] A Contrastive Analysis and Translation of Negative Sentences in English and in Chinese
    尚华
    青春岁月, 2013, (13) : 132 - 132
  • [50] On the Subject Translation of English Inanimate-subject Sentences
    肖雪颖
    朱宇丹
    海外英语, 2019, (15) : 193 - 195