Machine Translation Evaluation: Unveiling the Role of Dense Sentence Vector Embedding for Morphologically Rich Language

被引:9
|
作者
Tripathi, Samiksha [1 ]
Kansal, Vineet [1 ]
机构
[1] Dr APJ Abdul Kalam Tech Univ, Lucknow 226021, Uttar Pradesh, India
基金
美国国家科学基金会; 加拿大自然科学与工程研究理事会;
关键词
Evaluation of Hindi MT; dense sentence embeddings; word vectors; linguistic knowledge; METEOR;
D O I
10.1142/S0218001420590016
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine Translation (MT) evaluation metrics like BiLingual Evaluation Understudy (BLEU) and Metric for Evaluation of Translation with Explicit Ordering (METEOR) are known to have poor performance for word-order and morphologically rich languages. Application of linguistic knowledge to evaluate MTs for morphologically rich language like Hindi as a target language, is shown to be more effective and accurate [S. Tripathi and V. Kansal, Using linguistic knowledge for machine translation evaluation with Hindi as a target language, Comput. Sist. 21(4) (2017) 717-724]. Leveraging the recent progress made in the domain of word vector and sentence vector embedding [T. Mikolov and J. Dean, Distributed representations of words and phrases and their compositionality, Adv. Neural Inf. Process. Syst. 2 (2013) 3111-3119], authors have trained a large corpus of pre-processed Hindi text (similar to 112 million tokens) for obtaining the word vectors and sentence vector embedding for Hindi. The training has been performed on high end system configuration utilizing Google Cloud platform resources. This sentence vector embedding is further used to corroborate the findings through linguistic knowledge in evaluation metric. For morphologically rich language as target, evaluation metric of MT systems is considered as an optimal solution. In this paper, authors have demonstrated that MT evaluation using sentence embedding-based approach closely mirrors linguistic evaluation technique. The relevant codes used to generate the vector embedding for Hindi have been uploaded on code sharing platform Github.(a)
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Improving Adversarial Neural Machine Translation for Morphologically Rich Language
    Mi, Chenggang
    Xie, Lei
    Zhang, Yanning
    [J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2020, 4 (04): : 417 - 426
  • [2] Sentence Parsing in a Morphologically Rich Language - Finnish
    Hyona, Jukka
    Vainio, Seppo
    [J]. LANGUAGE AND LINGUISTICS COMPASS, 2009, 3 (03):
  • [3] Support Vector Methods for Sentence Level Machine Translation Evaluation
    Veillard, Antoine
    Melissa, Elvina
    Theodora, Cassandra
    Racoceanu, Daniel
    Bressan, Stephane
    [J]. 22ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2010), PROCEEDINGS, VOL 2, 2010, : 347 - +
  • [4] Statistical machine translation into a morphologically complex language
    Oflazer, Kemal
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2008, 4919 : 376 - 387
  • [5] Sentence Embedding for Neural Machine Translation Domain Adaptation
    Wang, Rui
    Finch, Andrew
    Utiyama, Masao
    Sumita, Eiichiro
    [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, : 560 - 566
  • [6] Bilingual Word Embedding with Sentence Similarity Constraint for Machine Translation
    Wu, Kui
    Wang, Xuancong
    Aw, AiTi
    [J]. 2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 119 - 122
  • [7] Regression for machine translation evaluation at the sentence level
    Albrecht, Joshua S.
    Hwa, Rebecca
    [J]. MACHINE TRANSLATION, 2008, 22 (1-2) : 1 - 27
  • [8] Statistical Machine Translation from and into Morphologically Rich and Low Resourced Languages
    Pushpananda, Randil
    Weerasinghe, Ruvan
    Niranjan, Mahesan
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT I, 2015, 9041 : 545 - 556
  • [9] Using POS information for statistical machine translation into morphologically rich languages
    Ueffing, N
    Ney, H
    [J]. EACL 2003: 10TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2003, : 347 - 354
  • [10] Compositional Representation of Morphologically-Rich Input for Neural Machine Translation
    Ataman, Duygu
    Federico, Marcello
    [J]. PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2018, : 305 - 311