Revisiting Meta-evaluation for Grammatical Error Correction

被引:0
|
作者
Kobayashi, Masamune [1 ]
Mita, Masato [1 ,2 ]
Komachi, Mamoru [3 ]
机构
[1] Tokyo Metropolitan Univ, Tokyo, Japan
[2] CyberAgent Inc, Tokyo, Japan
[3] Hitotsubashi Univ, Tokyo, Japan
关键词
D O I
10.1162/tacl_a_00676
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Metrics are the foundation for automatic evaluation in grammatical error correction (GEC), with their evaluation of the metrics (meta-evaluation) relying on their correlation with human judgments. However, conventional meta-evaluations in English GEC encounter several challenges, including biases caused by inconsistencies in evaluation granularity and an outdated setup using classical systems. These problems can lead to misinterpretation of metrics and potentially hinder the applicability of GEC techniques. To address these issues, this paper proposes SEEDA, a new dataset for GEC meta-evaluation. SEEDA consists of corrections with human ratings along two different granularities: edit-based and sentence-based, covering 12 state-of-the-art systems including large language models, and two human corrections with different focuses. The results of improved correlations by aligning the granularity in the sentence-level meta-evaluation suggest that edit-based metrics may have been underestimated in existing studies. Furthermore, correlations of most metrics decrease when changing from classical to neural systems, indicating that traditional metrics are relatively poor at evaluating fluently corrected sentences with many edits.
引用
收藏
页码:837 / 855
页数:19
相关论文
共 50 条
  • [1] Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction
    Bryant, Christopher
    Felice, Mariano
    Briscoe, Ted
    [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 793 - 805
  • [2] Development of a Meta-Evaluation Rubric and Meta-Evaluation of Initial Teacher Education Programs
    Burakgazi, Sevinc Gelmez
    Karsantik, Yasemin
    [J]. EGITIM VE BILIM-EDUCATION AND SCIENCE, 2024, 49 (217): : 225 - 248
  • [3] Meta-evaluation: Evaluation of evaluations
    L. Georghiou
    [J]. Scientometrics, 1999, 45 : 523 - 530
  • [4] Meta-evaluation: Evaluation of evaluations
    Praestgaard, E
    [J]. SCIENTOMETRICS, 1999, 45 (03) : 531 - 532
  • [5] Meta-evaluation: Evaluation of evaluations
    E. Praestgaard
    [J]. Scientometrics, 1999, 45 : 531 - 532
  • [6] Adversarial Grammatical Error Correction
    Raheja, Vipul
    Alikaniotis, Dimitris
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020,
  • [7] Comparison of the Evaluation Metrics for Neural Grammatical Error Correction With Overcorrection
    Park, Chanjun
    Yang, Yeongwook
    Lee, Chanhee
    Lim, Heuiseok
    [J]. IEEE ACCESS, 2020, 8 : 106264 - 106272
  • [8] Efficient Grammatical Error Correction with Hierarchical Error Detections and Correction
    Pan, Fayu
    Cao, Bin
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES, ICWS 2021, 2021, : 525 - 530
  • [9] Concurrent Meta-Evaluation A Critique
    Hanssen, Carl E.
    Lawrenz, Frances
    Dunet, Diane O.
    [J]. AMERICAN JOURNAL OF EVALUATION, 2008, 29 (04) : 572 - 582
  • [10] Improving the quality of evaluation participation: a meta-evaluation
    Russ-Eft, Darlene
    Preskill, Hallie
    [J]. HUMAN RESOURCE DEVELOPMENT INTERNATIONAL, 2008, 11 (01) : 35 - 50