Neural Machine Translation System for English to Indian Language Translation Using MTIL Parallel Corpus

被引:13
|
作者
Premjith, B. [1 ]
Kumar, M. Anand [1 ]
Soman, K. P. [1 ]
机构
[1] Amrita Vishwa Vidyapeetham, Amrita Sch Engn, Ctr Computat Engn & Networking CEN, Coimbatore 641112, Tamil Nadu, India
关键词
Neural machine translation; bidirectional RNN; LSTM; English-Indian languages parallel corpus; human evaluation;
D O I
10.1515/jisys-2019-2510
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Introduction of deep neural networks to the machine translation research ameliorated conventional machine translation systems in multiple ways, specifically in terms of translation quality. The ability of deep neural networks to learn a sensible representation of words is one of the major reasons for this improvement. Despite machine translation using deep neural architecture is showing state-of-the-art results in translating European languages, we cannot directly apply these algorithms in Indian languages mainly because of two reasons: unavailability of the good corpus and Indian languages are morphologically rich. In this paper, we propose a neural machine translation (NMT) system for four language pairs: English-Malayalam, English-Hindi, English-Tamil, and English-Punjabi. We also collected sentences from different sources and cleaned them to make four parallel corpora for each of the language pairs, and then used them to model the translation system. The encoder network in the NMT architecture was designed with long short-term memory (LSTM) networks and bi-directional recurrent neural networks (Bi-RNN). Evaluation of the obtained models was performed both automatically and manually. For automatic evaluation, the bilingual evaluation understudy (BLEU) scorewas used, and for manual evaluation, three metrics such as adequacy, fluency, and overall ranking were used. Analysis of the results showed the presence of lengthy sentences in English-Malayalam, and the English-Hindi corpus affected the translation. Attention mechanism was employed with a view to addressing the problem of translating lengthy sentences (sentences contain more than 50 words), and the system was able to perceive long-term contexts in the sentences.
引用
收藏
页码:387 / 398
页数:12
相关论文
共 50 条
  • [1] MTIL2017: Machine Translation Using Recurrent Neural Network on Statistical Machine Translation
    Mahata, Sainik Kumar
    Das, Dipankar
    Bandyopadhyay, Sivaji
    [J]. JOURNAL OF INTELLIGENT SYSTEMS, 2019, 28 (03) : 447 - 453
  • [2] Real Time Machine Translation System for English to Indian language
    Vyas, Raj
    Joshi, Kirti
    Sutar, Hitesh
    Nagarhalli, Tatwadarshi P.
    [J]. 2020 6TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATION SYSTEMS (ICACCS), 2020, : 838 - 842
  • [3] Improving English-to-Indian Language Neural Machine Translation Systems
    Kandimalla, Akshara
    Lohar, Pintu
    Maji, Souvik Kumar
    Way, Andy
    [J]. INFORMATION, 2022, 13 (05)
  • [4] Construction of Mizo: English Parallel Corpus for Machine Translation
    Haulai, Thangkhanhau
    Hussain, Jamal
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (08)
  • [5] A Machine Translation System from Indian Sign Language to English Text
    Mistree, Kinjal
    Thakor, Devendra
    Bhatt, Brijesh
    [J]. INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGIES AND SYSTEMS APPROACH, 2022, 15 (01)
  • [6] A Domain Specific Parallel Corpus and Enhanced English-Assamese Neural Machine Translation
    Laskar, Sahinur Rahman
    Manna, Riyanka
    Pakray, Partha
    Bandyopadhyay, Sivaji
    [J]. COMPUTACION Y SISTEMAS, 2022, 26 (04): : 1669 - 1687
  • [7] Extended Parallel Corpus for Amharic-English Machine Translation
    Gezmu, Andargachew Mekonnen
    Nuernberger, Andreas
    Bati, Tesfaye Bayu
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6644 - 6653
  • [8] Neural machine translation system using a content-equivalently translated parallel corpus for the newswire translation tasks at WAT 2019
    NHK Science and Technology Research Laboratories, Japan
    不详
    不详
    [J]. WAT@EMNLP-IJCNLP - Workshop Asian Transl., Proc., 1600, (106-111):
  • [9] Using Neural Machine Translation Methods for Sign Language Translation
    Angelova, Galina
    Avramidis, Eleftherios
    Moeller, Sebastian
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): STUDENT RESEARCH WORKSHOP, 2022, : 273 - 284
  • [10] An English-Portuguese parallel corpus of questions: translation guidelines and application in Statistical Machine Translation
    Costa, Angela
    Luis, Tiago
    Ribeiro, Joana
    Mendes, Ana Cristina
    Coheur, Luisa
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 2172 - 2176