DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence

被引:0
|
作者
Zhao, Wei [1 ,2 ]
Strube, Michael [1 ]
Eger, Steffen [3 ]
机构
[1] Heidelberg Inst Theoret Studies, Heidelberg, Germany
[2] Tech Univ Darmstadt, Darmstadt, Germany
[3] Bielefeld Univ, Fac Technol, NLLG, Bielefeld, Germany
关键词
LOCAL COHERENCE; MACHINE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, there has been a growing interest in designing text generation systems from a discourse coherence perspective, e.g., modeling the interdependence between sentences. Still, recent BERT-based evaluation metrics are weak in recognizing coherence, and thus are not reliable in a way to spot the discourselevel improvements of those text generation systems. In this work, we introduce DiscoScore, a parametrized discourse metric, which uses BERT to model discourse coherence from different perspectives, driven by Centering theory. Our experiments encompass 16 non-discourse and discourse metrics, including DiscoScore and popular coherence models, evaluated on summarization and document-level machine translation (MT). We find that (i) the majority of BERT-based metrics correlate much worse with human rated coherence than early discourse metrics, invented a decade ago; (ii) the recent state-of-the-art BARTScore is weak when operated at system level-which is particularly problematic as systems are typically compared in this manner. DiscoScore, in contrast, achieves strong system-level correlation with human ratings, not only in coherence but also in factual consistency and other aspects, and surpasses BARTScore by over 10 correlation points on average. Further, aiming to understand DiscoScore, we provide justifications to the importance of discourse coherence for evaluation metrics, and explain the superiority of one variant over another. Our code is available at https://github.com/AIPHES/ DiscoScore.
引用
收藏
页码:3865 / 3883
页数:19
相关论文
共 50 条
  • [1] Evaluating Text Generation from Discourse Representation Structures
    Wang, Chunliu
    van Noord, Rik
    Bisazza, Arianna
    Bos, Johan
    [J]. 1ST WORKSHOP ON NATURAL LANGUAGE GENERATION, EVALUATION, AND METRICS (GEM 2021), 2021, : 73 - 83
  • [2] Evaluating Discourse in Structured Text Representations
    Ferracane, Elisa
    Durrett, Greg
    Li, Junyi Jessy
    Erk, Katrin
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 646 - 653
  • [3] TEXT AND DISCOURSE ELLIPSIS: A PROBLEM OF COHERENCE AND CONTEXT
    Tamas, Iulia
    [J]. DACOROMANIA, 2012, 17 (01): : 44 - 53
  • [4] KEEPING IT ALL TOGETHER - COHERENCE IN TEXT AND DISCOURSE
    ROBERTS, RM
    KREUZ, RJ
    [J]. BULLETIN OF THE PSYCHONOMIC SOCIETY, 1990, 28 (06) : 495 - 495
  • [5] ELLIPSIS OF TEXT AND DISCOURSE: A PROBLEM OF COHERENCE AND OF CONTEXT
    Tamas, Iulia
    [J]. DACOROMANIA, 2011, 16 (02): : 167 - 175
  • [6] RST Discourse Parsing as Text-to-Text Generation
    Hu, Xinyu
    Wan, Xiaojun
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3278 - 3289
  • [7] BARTSCORE: Evaluating Generated Text as Text Generation
    Yuan, Weizhe
    Neubig, Graham
    Liu, Pengfei
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [8] On Improving Text Generation Via Integrating Text Coherence
    Ai, Lisi
    Gao, Baoli
    Zheng, Jianbing
    Gao, Ming
    [J]. PROCEEDINGS OF 2019 6TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2019, : 6 - 10
  • [9] The Role of Discourse Markers in the Generation and Interpretation of Discourse Structure and Coherence
    Abuczki, Agnes
    [J]. 3RD IEEE INTERNATIONAL CONFERENCE ON COGNITIVE INFOCOMMUNICATIONS (COGINFOCOM 2012), 2012, : 531 - 536
  • [10] Optimizing referential coherence in text generation
    Kibble, R
    Power, R
    [J]. COMPUTATIONAL LINGUISTICS, 2004, 30 (04) : 401 - 416