xcomet: Transparent Machine Translation Evaluation through Fine-grained Error Detection

被引:0
|
作者
Guerreiro, Nuno M. [1 ,3 ,4 ,5 ]
Rei, Ricardo [1 ,2 ,5 ]
van Stigt, Daan [1 ]
Coheur, Luisa [2 ,5 ]
Colombo, Pierre [4 ]
Martins, Andre F. T. [1 ,3 ,5 ]
机构
[1] Unbabel Lisbon, Lisbon, Portugal
[2] INESC ID, Lisbon, Portugal
[3] Inst Telecomunicacoes, Lisbon, Portugal
[4] Univ Paris Saclay, MICS, Cent Supelec, Paris, France
[5] Univ Lisbon, Inst Super Tecn, Lisbon, Portugal
基金
欧洲研究理事会;
关键词
Compendex;
D O I
10.1162/tacl_a_00683
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Widely used learned metrics for machine translation evaluation, such as Comet and Bleurt, estimate the quality of a translation hypothesis by providing a single sentence-level score. As such, they offer little insight into translation errors (e.g., what are the errors and what is their severity). On the other hand, generative large language models (LLMs) are amplifying the adoption of more granular strategies to evaluation, attempting to detail and categorize translation errors. In this work, we introduce xcomet, an open-source learned metric designed to bridge the gap between these approaches. xcomet integrates both sentence-level evaluation and error span detection capabilities, exhibiting state-of-the-art performance across all types of evaluation (sentence-level, system-level, and error span detection). Moreover, it does so while highlighting and categorizing error spans, thus enriching the quality assessment. We also provide a robustness analysis with stress tests, and show that xcomet is largely capable of identifying localized critical errors and hallucinations.
引用
收藏
页码:979 / 995
页数:17
相关论文
共 50 条
  • [41] Toward a Fine-Grained Evaluation of the Pwnable CTF
    Kim, Sung-Kyung
    Jang, Eun-Tae
    Park, Ki-Woong
    INFORMATION SECURITY APPLICATIONS, WISA 2020, 2020, 12583 : 179 - 190
  • [42] Evaluation of Fine-grained Parallelism in AUTOSAR Applications
    Stegmeier, Alexander
    Kehr, Sebastian
    George, Dave
    Bradatsch, Christian
    Panic, Milos
    Bodekker, Bert
    Ungerer, Theo
    INTERNATIONAL CONFERENCE ON EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION (SAMOS 2017), 2017, : 121 - 128
  • [43] Field evaluation of fine-grained industrial minerals
    Mitchell, CJ
    INDUSTRIAL MINERALS AND EXTRACTIVE INDUSTRY GEOLOGY, 2002, : 235 - 238
  • [44] Points-to Analysis: A Fine-Grained Evaluation
    Lundberg, Jonas
    Lowe, Welf
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2012, 18 (20) : 2851 - 2878
  • [45] Fine-grained Web Service Trust Detection: A Joint Method of Machine Learning and Blockchain
    Du, Ruizhong
    Gao, Yan
    Liu, Cui
    JOURNAL OF WEB ENGINEERING, 2022, 21 (05): : 1519 - 1542
  • [46] Fine-grained Tree-to-String Translation Rule Extraction
    Wu, Xianchao
    Matsuzaki, Takuya
    Tsujii, Jun'ichi
    ACL 2010: 48TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2010, : 325 - 334
  • [47] TaintStream: Fine-Grained Taint Tracking for Big Data Platforms through Dynamic Code Translation
    Yang, Chengxu
    Li, Yuanchun
    Xu, Mengwei
    Chen, Zhenpeng
    Liu, Yunxin
    Huang, Gang
    Liu, Xuanzhe
    PROCEEDINGS OF THE 29TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (ESEC/FSE '21), 2021, : 806 - 817
  • [48] Reference Matters: Benchmarking Factual Error Correction for Dialogue Summarization with Fine-grained Evaluation Framework
    Gao, Mingqi
    Wan, Xiaojun
    Su, Jia
    Wang, Zhefeng
    Huai, Baoxing
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 13932 - 13959
  • [49] Achieving incremental compilation through fine-grained builds
    Univ of Sydney, Sydney
    Software Pract Exper, 5 (497-517):
  • [50] Achieving incremental compilation through fine-grained builds
    Cooper, T
    Wise, M
    SOFTWARE-PRACTICE & EXPERIENCE, 1997, 27 (05): : 497 - 517