Simplification of English and Bengali Sentences for Improving Quality of Machine Translation

被引:0
|
作者
Sainik Kumar Mahata
Avishek Garain
Dipankar Das
Sivaji Bandyopadhyay
机构
[1] Institute of Engineering and Management,
[2] Jadavpur University,undefined
来源
Neural Processing Letters | 2022年 / 54卷
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Training translation systems with complex and compound sentences are generally considered computationally tough and such systems fail to process, the large syntactical information given out by these sentences. This issue subsequently, affects the overall quality of translations. On the other hand, simple sentences are shorter by nature and produce less syntactical information. Therefore, it would be safe to say that training translation systems using simple sentences only, would result in better translation output. However, training a translation system requires a large and quality parallel corpus involving two natural languages. While parallel corpus for various language pairs is abundant, such lexicons for low-resourced languages, consisting of only simple sentences are rare. In such a scenario, the development of such a parallel lexicon is the initial purpose of the present work. Building the same would require differentiating complex and/or compound sentences from the overall corpus and then converting them into simple sentences. Since, the work includes two languages, English and Bengali, different algorithms to accomplish the same, is documented in this paper. Converting complex and compound sentences to simple instances results in fragmenting sentences into two or more segments, which then needs to be aligned to make them semantically similar. Hence, a basic alignment technique has also been proposed to mitigate this problem. After developing the parallel corpus, we needed to check for its effectiveness in solving the quality issues of translation systems discussed earlier. For this, state-of-the-art translation modules like Statistical Machine Translation and Neural Machine Translation, have been trained using the developed corpus as well as with the raw parallel corpus consisting of sentences of mixed complexities. The performance of these translation models has been compared using automated as well as manual evaluation metrics. The results are promising and prove that translation systems do perform better when trained using simple sentence language pairs.
引用
收藏
页码:3115 / 3139
页数:24
相关论文
共 50 条
  • [21] Improving Evaluation of Machine Translation Quality Estimation
    Graham, Yvette
    PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, 2015, : 1804 - 1813
  • [22] Estimating and Controlling the Appropriate Number of Output Sentences in Neural Machine Translation for Japanese–English News Translation
    Ito H.
    Kinugawa K.
    Mino H.
    Goto I.
    Yamada I.
    Kyokai Joho Imeji Zasshi/Journal of the Institute of Image Information and Television Engineers, 2022, 76 (03): : 416 - 419
  • [23] Analysis of Machine Translation Tools for Translating Sentences from English to Malayalam and Vice Versa
    Jayalakshmi, R.
    Muralidhara, B. L.
    INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING, 2021, 12 (04): : 476 - 491
  • [24] Modeling hypotactic structure for Chinese-English neural machine translation of complex sentences
    Miao, Guoyi
    Chen, Yufeng
    Liu, Jian
    Xu, Jinan
    Liu, Mingtong
    Feng, Wenhe
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 41 (06) : 7015 - 7029
  • [25] Addressing Limited Vocabulary and Long Sentences Constraints in English–Arabic Neural Machine Translation
    Safae Berrichi
    Azzeddine Mazroui
    Arabian Journal for Science and Engineering, 2021, 46 : 8245 - 8259
  • [26] An Empirical Machine Translation Framework for Translating Bangla Imperative, Optative and Exclamatory Sentences into English
    Alamgir, Tanzina
    Arefin, Mohammed Safayet
    Hoque, Mohammed Moshiul
    2016 5TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS AND VISION (ICIEV), 2016, : 932 - 937
  • [27] TROIKA - BENGALI POEMS IN ENGLISH-TRANSLATION - GUPTA,A
    SANYAL, A
    INDIAN LITERATURE, 1984, 27 (04) : 152 - 154
  • [28] Neural approach-based quality estimation in improving translation of English to Hindi using machine translation under data science
    Chouhan, Mansi
    Srivastava, Devesh Kumar
    2021 INTERNATIONAL CONFERENCE ON COMPUTATIONAL PERFORMANCE EVALUATION (COMPE-2021), 2021, : 35 - 39
  • [29] Improving Quality of Machine Translation Using Text Rewriting
    Chopra, Deepti
    Joshi, Nisheeth
    Mathur, Iti
    2016 SECOND INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE & COMMUNICATION TECHNOLOGY (CICT), 2016, : 22 - 27
  • [30] Improving the Quality of Machine Translation Using the Reverse Model
    Skachkov, N. A.
    Vorontsov, K. V.
    AUTOMATION AND REMOTE CONTROL, 2022, 83 (12) : 1897 - 1907