Learning to Compare for Better Training and Evaluation of Open Domain Natural Language Generation Models

被引:0
|
作者
Zhou, Wangchunshu [1 ]
Xu, Ke [1 ]
机构
[1] Beihang Univ, State Key Lab Software Dev Environm, Beijing, Peoples R China
来源
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2020年 / 34卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automated evaluation of open domain natural language generation (NLG) models remains a challenge and widely used metrics such as BLEU and Perplexity can be misleading in some cases. In our paper, we propose to evaluate natural language generation models by learning to compare a pair of generated sentences by fine-tuning BERT, which has been shown to have good natural language understanding ability. We also propose to evaluate the model-level quality of NLG models with sample-level comparison results with skill rating system. While able to be trained in a fully self-supervised fashion, our model can be further fine-tuned with a little amount of human preference annotation to better imitate human judgment. In addition to evaluating trained models, we propose to apply our model as a performance indicator during training for better hyperparameter tuning and early-stopping. We evaluate our approach on both story generation and chitchat dialogue response generation. Experimental results show that our model correlates better with human preference compared with previous automated evaluation approaches. Training with the proposed metric yields better performance in human evaluation, which further demonstrates the effectiveness of the proposed model.
引用
收藏
页码:9717 / 9724
页数:8
相关论文
共 50 条
  • [21] SLIDE: A Framework Integrating Small and Large Language Models for Open-Domain Dialogues Evaluation
    Zhao, Kun
    Yang, Bohao
    Tang, Chen
    Lin, Chenghua
    Zhan, Liang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 15421 - 15435
  • [22] An Overview of Natural Language Generation Systems Evaluation
    Yang, Feng-Jen
    WORLD CONGRESS ON ENGINEERING AND COMPUTER SCIENCE, WCECS 2015, VOL I, 2015, : 71 - 74
  • [23] Evaluating the Evaluation of Diversity in Natural Language Generation
    Tevet, Guy
    Berant, Jonathan
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 326 - 346
  • [24] Natural Language Generation, its Evaluation and Metrics
    Gehrmann, Sebastian
    Adewumi, Tosin
    Aggarwal, Karmanya
    Ammanamanchi, Pawan Sasanka
    Anuoluwapo, Aremu
    Bosselut, Antoine
    Chandu, Khyathi Raghavi
    Clinciu, Miruna
    Das, Dipanjan
    Dhole, Kaustubh D.
    Du, Wanyu
    Durmus, Esin
    Gangal, Varun
    Garbacea, Cristina
    Hashimoto, Tatsunori
    Hou, Yufang
    Jernite, Yacine
    Jhamtani, Harsh
    Ji, Yangfeng
    Jolly, Shailza
    Kale, Mihir
    Kumar, Dhruv
    Ladhak, Faisal
    Madaan, Aman
    Maddela, Mounica
    Mahajan, Khyati
    Mahamood, Saad
    Majumder, Bodhisattwa Prasad
    Martins, Pedro Henrique
    McMillan-Major, Angelina
    Mille, Simon
    van Miltenburg, Emiel
    Nadeem, Moin
    Narayan, Shashi
    Nikolaev, Vitaly
    Niyongabo, Rubungo Andre
    Osei, Salomey
    Parikh, Ankur
    Perez-Beltrachini, Laura
    Rao, Niranjan Ramesh
    Raunak, Vikas
    Rodriguez, Juan Diego
    Santhanam, Sashank
    Sedoc, Joao
    Sellam, Thibault
    Shaikh, Samira
    Shimorina, Anastasia
    Sobrevilla Cabezudo, Marco Antonio
    Strobelt, Hendrik
    Subramani, Nishant
    1ST WORKSHOP ON NATURAL LANGUAGE GENERATION, EVALUATION, AND METRICS (GEM 2021), 2021, : 96 - 120
  • [25] Evaluation of African American Language Bias in Natural Language Generation
    Deas, Nicholas
    Grieser, Jessi
    Kleiner, Shana
    Patton, Desmond
    Turcan, Elsbeth
    McKeown, Kathleen
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 6805 - 6824
  • [26] A Legal Perspective on Training Models for Natural Language Processing
    de Castilho, Richard Eckart
    Dore, Giulia
    Margoni, Thomas
    Labropoulou, Penny
    Gurevych, Iryna
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 1267 - 1274
  • [27] Shortcut Learning of Large Language Models in Natural Language Understanding
    Du, Mengnan
    He, Fengxiang
    Zou, Na
    Tao, Dacheng
    Hu, Xia
    COMMUNICATIONS OF THE ACM, 2024, 67 (01) : 110 - 120
  • [28] A domain specific language notation for a language learning activity generation tool
    Gabriel Sebastián
    Ricardo Tesoriero
    Jose A. Gallud
    Multimedia Tools and Applications, 2021, 80 : 36275 - 36304
  • [29] A domain specific language notation for a language learning activity generation tool
    Sebastian, Gabriel
    Tesoriero, Ricardo
    Gallud, Jose A.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (30) : 36275 - 36304
  • [30] Formal and computational models of context for natural language generation
    van Deemter, K
    Odijk, J
    FORMAL ASPECTS OF CONTEXT, 2000, 20 : 1 - 21