Optimizing training data for persona-grounded dialogue via Synthetic Label Augmentation

被引:0
|
作者
Lee, Chanhee [1 ,2 ]
Kim, Donghyun [1 ]
Kim, Wongyu [1 ]
Lee, Kyungchan [1 ]
Ahn, Youbin [1 ]
Lee, Kyong-Ho [1 ]
Shin, Donghoon [3 ]
Lee, Yeonsoo [4 ]
机构
[1] Yonsei Univ, Dept Comp Sci, Seoul, South Korea
[2] Samsung Secur, Seoul, South Korea
[3] KT, Seongnam Si, Gyeonggi do, South Korea
[4] NCSOFT, Seongnam Si, Gyeonggi do, South Korea
关键词
Persona-grounded dialogue; Persona expansion; Data optimization; Synthetic augmentation;
D O I
10.1016/j.eswa.2024.125796
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Persona-grounded dialogue systems aim to enhance the quality of AI agent responses by bolstering persona consistency and promoting response diversity. Although model tuning has seen significant advancements, there is an ongoing need to refine the training data itself. Expanding the scope of personas has been suggested as a means to bridge this gap. Nevertheless, the lack of gold labels that align with these expanded personas poses a challenge for AI agents in training the extent of real-world knowledge. To tackle these challenges, we propose the Synthetic Label Augmentation framework. This framework (1) creates a background skeleton from the original gold labels, masking persona-related elements, (2) infuses the background skeleton with expanded-persona features, generating synthetic gold labels, (3) identifies the most appropriate synthetic gold labels among the candidates, and (4) merges them into persona-grounded dialogue dataset. Through extensive experiments on the Persona-Chat, we demonstrate that the proposed framework effectively integrates the content of expanded personas to generate synthetic gold labels suitable for the dialogue context. Furthermore, response generation experiments using the Optimized Persona-Chat show that our framework significantly enhances AI agents' performance in terms of persona consistency and response diversity.
引用
收藏
页数:11
相关论文
共 38 条
  • [1] Concept-based Persona Expansion for Improving Diversity of Persona-Grounded Dialogue
    Kim, Donghyun
    Ahn, Youbin
    Lee, Chanhee
    Kim, Wongyu
    Lee, Kyong-Ho
    Shin, Donghoon
    Lee, Yeonsoo
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 3471 - 3481
  • [2] Dual Task Framework for Improving Persona-Grounded Dialogue Dataset
    Kim, Minju
    Kwak, Beong-woo
    Kim, Youngwook
    Lee, Hong-in
    Hwang, Seung-won
    Yeo, Jinyoung
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 10912 - 10920
  • [3] Transferable Persona-Grounded Dialogues via Grounded Minimal Edits
    Wu, Chen Henry
    Zheng, Yinhe
    Mao, Xiaoxi
    Huang, Minlie
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 2368 - 2382
  • [4] A Pre-Training Based Personalized Dialogue Generation Model with Persona-Sparse Data
    Zheng, Yinhe
    Zhang, Rongsheng
    Mao, Xiaoxi
    Huang, Minlie
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9693 - 9700
  • [5] Joining datasets via data augmentation in the label space for neural networks
    Zhao, Jake
    Ou, Mingfeng
    Xue, Linji
    Cui, Yunkai
    Wu, Sai
    Chen, Gang
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [6] Optimizing the Training data of Personalized Recommendation Via AdaBoost
    Ouyang, Yuanxin
    Gu, Yi
    Jiang, Xiangtao
    Xiong, Zhang
    2012 THIRD INTERNATIONAL CONFERENCE ON THEORETICAL AND MATHEMATICAL FOUNDATIONS OF COMPUTER SCIENCE (ICTMF 2012), 2013, 38 : 613 - 619
  • [7] BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data
    Song, Haoyu
    Wang, Yan
    Zhang, Kaiyan
    Zhang, Wei-Nan
    Liu, Ting
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 167 - 177
  • [8] IPMix: Label-Preserving Data Augmentation Method for Training Robust Classifiers
    Huang, Zhenglin
    Bao, Xianan
    Zhang, Na
    Zhang, Qingqi
    Tu, Xiaomei
    Wu, Biao
    Yang, Xi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [9] Conversation Graph: Data Augmentation, Training, and Evaluation for Non-Deterministic Dialogue Management
    Gritta, Milan
    Lampouras, Gerasimos
    Iacobacci, Ignacio
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2021, 9 : 36 - 52
  • [10] Generation of Action Recognition Training Data Through Rotoscoping and Augmentation of Synthetic Animations
    Covre, Nicola
    Nunnari, Fabrizio
    Fornaser, Alberto
    De Cecco, Mariolino
    AUGMENTED REALITY, VIRTUAL REALITY, AND COMPUTER GRAPHICS (AVR 2019), PT II, 2019, 11614 : 23 - 42