Assessing Human-Parity in Machine Translation on the Segment Level

被引:0
|
作者
Graham, Yvette [1 ]
Federmann, Christian [2 ]
Eskevich, Maria [3 ]
Haddow, Barry [4 ]
机构
[1] Trinity Coll Dublin, ADAPT, Dublin, Ireland
[2] Microsoft Res, Redmond, WA USA
[3] CLARIN ERIC, Utrecht, Netherlands
[4] Univ Edinburgh, Sch Informat, Edinburgh, Midlothian, Scotland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent machine translation shared tasks have shown top-performing systems to tie or in some cases even outperform human translation. Such conclusions about system and human performance are, however, based on estimates aggregated from scores collected over large test sets of translations and so leave some remaining questions unanswered. For instance, simply because a system significantly outperforms the human translator on average may not necessarily mean that it has done so for every translation in the test set. Furthermore, are there remaining source segments present in evaluation test sets that cause significant challenges for top-performing systems and can such challenging segments go unnoticed due to the opacity of current human evaluation procedures ? To provide insight into these issues we carefully inspect the outputs of top-performing systems in the recent WMT19 news translation shared task for all language pairs in which a system either tied or outperformed human translation. Our analysis provides a new method of identifying the remaining segments for which either machine or human perform poorly. For example, in our close inspection of WMT19 English to German and German to English we discover the segments that disjointly proved a challenge for human and machine. For English to Russian, there were no segments included in our sample of translations that caused a significant challenge for the human translator, while we again identify the set of segments that caused issues for the top-performing system.
引用
收藏
页码:4199 / 4207
页数:9
相关论文
共 50 条
  • [1] A Set of Recommendations for Assessing Human-Machine Parity in Language Translation
    Laeubli, Samuel
    Castilho, Sheila
    Neubig, Graham
    Sennrich, Rico
    Shen, Qinlan
    Toral, Antonio
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2020, 67 : 653 - 672
  • [2] A set of recommendations for assessing human-machine parity in language translation
    Läubli S.
    Castilho S.
    Neubig G.
    Sennrich R.
    Shen Q.
    Toral A.
    Journal of Artificial Intelligence Research, 2020, 67 : 653 - 672
  • [3] Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation
    Laeubli, Samuel
    Sennrich, Rico
    Volk, Martin
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 4791 - 4796
  • [4] On "Human Parity" and "Super Human Performance" in Machine Translation Evaluation
    Poibeau, Thierry
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6018 - 6023
  • [5] Meta-Evaluation Using IMPACT Machine Translation Evaluation Method at Document level and Segment level
    Echizen-ya, Hiroshi
    Araki, Kenji
    ICEME 2011: THE 2ND INTERNATIONAL CONFERENCE ON ENGINEERING AND META-ENGINEERING, 2011, : 152 - 157
  • [6] Source Segment Encoding for Neural Machine Translation
    Wang, Qiang
    Xiao, Tong
    Zhu, Jingbo
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT I, 2018, 11108 : 329 - 340
  • [7] Translation by Machine or Human?
    Boeri, Robert
    ECONTENT, 2014, 37 (09) : 13 - 13
  • [8] HUMAN TRANSLATION, TRANSLATION MACHINE AND QUALITY
    Fiola, Marco A.
    HERMENEUS, 2014, (16): : 21 - 26
  • [9] A Dataset for Assessing Machine Translation Evaluation Metrics
    Specia, Lucia
    Cancedda, Nicola
    Dymetman, Marc
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 3375 - 3378
  • [10] Machine translation and human translation Using machine translation engines and corpora for teaching and research
    Maia, Belinda
    CURRENT TRENDS IN CONTRASTIVE LINGUISTICS: FUNCTIONAL AND COGNITIVE PERSPECTIVES, 2008, 60 : 123 - 145