MEGATRON-CNTRL: Controllable Story Generation with External Knowledge Using Large-Scale Language Models

被引:0
|
作者
Xu, Peng [1 ]
Patwary, Mostofa [2 ]
Shoeybi, Mohammad [2 ]
Puri, Raul [2 ]
Fung, Pascale [1 ]
Anandkumar, Anima [2 ]
Catanzaro, Bryan [2 ]
机构
[1] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
[2] NVIDIA, Santa Clara, CA 95051 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing pre-trained large language models have shown unparalleled generative capabilities. However, they are not controllable. In this paper, we propose MEGATRON-CNTRL, a novel framework that uses large-scale language models and adds control to text generation by incorporating an external knowledge base. Our framework consists of a keyword predictor, a knowledge retriever, a contextual knowledge ranker, and a conditional text generator. As we do not have access to ground-truth supervision for the knowledge ranker, we make use of weak supervision from sentence embedding. The empirical results show that our model generates more fluent, consistent, and coherent stories with less repetition and higher diversity compared to prior work on the ROC story dataset. We showcase the controllability of our model by replacing the keywords used to generate stories and re-running the generation process. Human evaluation results show that 77.5% of these stories are successfully controlled by the new keywords. Furthermore, by scaling our model from 124 million to 8.3 billion parameters we demonstrate that larger models improve both the quality of generation (from 74.5% to 93.0% for consistency) and controllability (from 77.5% to 91.5%).
引用
收藏
页码:2831 / 2845
页数:15
相关论文
共 50 条
  • [41] Significance of neural phonotactic models for large-scale spoken language identification
    Srivastava, Brij Mohan Lal
    Vydana, Hari
    Vuppala, Anil Kumar
    Shrivastava, Manish
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 2144 - 2151
  • [42] FedID: Federated Interactive Distillation for Large-Scale Pretraining Language Models
    Ma, Xinge
    Liu, Jiangming
    Wang, Jin
    Zhang, Xuejie
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 8566 - 8577
  • [43] Training Large-Scale News Recommenders with Pretrained Language Models in the Loop
    Xiao, Shitao
    Liu, Zheng
    Shao, Yingxia
    Di, Tao
    Middha, Bhuvan
    Wu, Fangzhao
    Xie, Xing
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4215 - 4225
  • [44] Training large-scale language models with limited GPU memory: a survey
    Tang, Yu
    Qiao, Linbo
    Yin, Lujia
    Liang, Peng
    Shen, Ao
    Yang, Zhilin
    Zhang, Lizhi
    Li, Dongsheng
    FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2025, : 309 - 331
  • [45] Role Models in Language Learning: Results of a Large-Scale International Survey
    Muir, Christine
    Dornyei, Zoltan
    Adolphs, Svenja
    APPLIED LINGUISTICS, 2021, 42 (01) : 1 - 23
  • [46] A Study on Prompt Types for Harmlessness Assessment of Large-Scale Language Models
    Shin, Yejin
    Kim, Song-yi
    Byun, Eun Young
    HCI INTERNATIONAL 2024 POSTERS, PT VII, HCII 2024, 2024, 2120 : 228 - 233
  • [47] Natural Language Processing in Large-Scale Neural Models for Medical Screenings
    Stille, Catharina Marie
    Bekolay, Trevor
    Blouw, Peter
    Kroeger, Bernd J.
    FRONTIERS IN ROBOTICS AND AI, 2019, 6
  • [48] TGEA 2.0: A Large-Scale Diagnostically Annotated Dataset with Benchmark Tasks for Text Generation of Pretrained Language Models
    Ge, Huibin
    Zhao, Xiaohu
    Liu, Chuang
    Zeng, Yulong
    Liu, Qun
    Xiong, Deyi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [49] StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models
    Guo, Zhicheng
    Cheng, Sijie
    Wang, Hao
    Liang, Shihao
    Qin, Yujia
    Li, Peng
    Liu, Zhiyuan
    Sun, Maosong
    Liu, Yang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 11143 - 11156
  • [50] Hurricane forecasts using a suite of large-scale models
    Krishnamurti, T. N.
    Biswas, Mrinal K.
    Mackey, Brian P.
    Ellingson, Robert G.
    Ruscher, Paul H.
    TELLUS SERIES A-DYNAMIC METEOROLOGY AND OCEANOGRAPHY, 2011, 63 (04) : 727 - 745