DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence

被引:0
|
作者
Zhao, Wei [1 ,2 ]
Strube, Michael [1 ]
Eger, Steffen [3 ]
机构
[1] Heidelberg Inst Theoret Studies, Heidelberg, Germany
[2] Tech Univ Darmstadt, Darmstadt, Germany
[3] Bielefeld Univ, Fac Technol, NLLG, Bielefeld, Germany
关键词
LOCAL COHERENCE; MACHINE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, there has been a growing interest in designing text generation systems from a discourse coherence perspective, e.g., modeling the interdependence between sentences. Still, recent BERT-based evaluation metrics are weak in recognizing coherence, and thus are not reliable in a way to spot the discourselevel improvements of those text generation systems. In this work, we introduce DiscoScore, a parametrized discourse metric, which uses BERT to model discourse coherence from different perspectives, driven by Centering theory. Our experiments encompass 16 non-discourse and discourse metrics, including DiscoScore and popular coherence models, evaluated on summarization and document-level machine translation (MT). We find that (i) the majority of BERT-based metrics correlate much worse with human rated coherence than early discourse metrics, invented a decade ago; (ii) the recent state-of-the-art BARTScore is weak when operated at system level-which is particularly problematic as systems are typically compared in this manner. DiscoScore, in contrast, achieves strong system-level correlation with human ratings, not only in coherence but also in factual consistency and other aspects, and surpasses BARTScore by over 10 correlation points on average. Further, aiming to understand DiscoScore, we provide justifications to the importance of discourse coherence for evaluation metrics, and explain the superiority of one variant over another. Our code is available at https://github.com/AIPHES/ DiscoScore.
引用
收藏
页码:3865 / 3883
页数:19
相关论文
共 50 条
  • [41] Knowledge-based Review Generation by Coherence Enhanced Text Planning
    Li, Junyi
    Zhao, Wayne Xin
    Wei, Zhicheng
    Yuan, Nicholas Jing
    Wen, Ji-Rong
    [J]. SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 183 - 192
  • [42] Metaphorical (in)coherence in discourse
    Shen, Y
    Balaban, N
    [J]. DISCOURSE PROCESSES, 1999, 28 (02) : 139 - 153
  • [43] Discourse Coherence in Translation
    朱志娟
    朱利勇
    [J]. 当代经理人(中旬刊), 2006, (21) : 1216 - 1217
  • [44] The roots of coherence in discourse
    Levy, ET
    [J]. HUMAN DEVELOPMENT, 2003, 46 (04) : 169 - 188
  • [45] The discourse: Coherence and connection
    Vivero, D
    Whittaker, R
    [J]. WORD-JOURNAL OF THE INTERNATIONAL LINGUISTIC ASSOCIATION, 2001, 52 (01): : 167 - 170
  • [46] Ellipsis and discourse coherence
    Lyn Frazier
    Charles Clifton
    [J]. Linguistics and Philosophy, 2006, 29 : 315 - 346
  • [47] On Realizations of Coherence in Discourse
    高海燕
    [J]. 英语广场, 2013, (02) : 45 - 46
  • [48] Text classification for evaluating digital technology adoption maturity based on BERT: An evidence of Industrial AI from China
    Wang, Yanhong
    Gong, Chen
    Ji, Xiaodong
    Yuan, Qi
    [J]. Technological Forecasting and Social Change, 2025, 211
  • [49] CONNEXITY AND TEXT COHERENCE - ANALYSIS OF TEXT AND DISCOURSE - FRENCH - HEIDRICH,W, NEUBAUER,F, PETOFI,JS']JS, SOZER,E
    CARON, J
    [J]. ANNEE PSYCHOLOGIQUE, 1994, 94 (01): : 137 - 138
  • [50] The effectiveness of T5, GPT-2, and BERT on text-to-image generation task
    Bahani, Mourad
    El Ouaazizi, Aziza
    Maalmi, Khalil
    [J]. PATTERN RECOGNITION LETTERS, 2023, 173 : 57 - 63