MEGATRON-CNTRL: Controllable Story Generation with External Knowledge Using Large-Scale Language Models

被引:0
|
作者
Xu, Peng [1 ]
Patwary, Mostofa [2 ]
Shoeybi, Mohammad [2 ]
Puri, Raul [2 ]
Fung, Pascale [1 ]
Anandkumar, Anima [2 ]
Catanzaro, Bryan [2 ]
机构
[1] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
[2] NVIDIA, Santa Clara, CA 95051 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing pre-trained large language models have shown unparalleled generative capabilities. However, they are not controllable. In this paper, we propose MEGATRON-CNTRL, a novel framework that uses large-scale language models and adds control to text generation by incorporating an external knowledge base. Our framework consists of a keyword predictor, a knowledge retriever, a contextual knowledge ranker, and a conditional text generator. As we do not have access to ground-truth supervision for the knowledge ranker, we make use of weak supervision from sentence embedding. The empirical results show that our model generates more fluent, consistent, and coherent stories with less repetition and higher diversity compared to prior work on the ROC story dataset. We showcase the controllability of our model by replacing the keywords used to generate stories and re-running the generation process. Human evaluation results show that 77.5% of these stories are successfully controlled by the new keywords. Furthermore, by scaling our model from 124 million to 8.3 billion parameters we demonstrate that larger models improve both the quality of generation (from 74.5% to 93.0% for consistency) and controllability (from 77.5% to 91.5%).
引用
收藏
页码:2831 / 2845
页数:15
相关论文
共 50 条
  • [31] Large-scale Point-of-Interest Category Prediction Using Natural Language Processing Models
    Zhang, Daniel
    Wang, Dong
    Zheng, Hao
    Mu, Xin
    Li, Qi
    Zhang, Yang
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 1027 - 1032
  • [32] Large-scale vortex generation (and bursting) using windshapers
    Noca, Flavio
    Reymond, Julien
    Walpen, Aurelien
    Catry, Guillaume
    AIAA SCITECH 2024 FORUM, 2024,
  • [33] Mining Insights from Large-Scale Corpora Using Fine-Tuned Language Models
    Palakodety, Shriphani
    KhudaBukhsh, Ashiqur R.
    Carbonell, Jaime G.
    ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 1890 - 1897
  • [34] Large-Scale Relation Learning for Question Answering over Knowledge Bases with Pre-trained Language Models
    Yam, Yuanmeng
    Li, Rumei
    Wang, Sirui
    Zhang, Hongzhi
    Zan, Daoguang
    Zhang, Fuzheng
    Wu, Wei
    Xu, Weiran
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3653 - 3660
  • [35] Large language models leverage external knowledge to extend clinical insight beyond language boundaries
    Wu, Jiageng
    Wu, Xian
    Qiu, Zhaopeng
    Li, Minghui
    Lin, Shixu
    Zhang, Yingying
    Zheng, Yefeng
    Yuan, Changzheng
    Yang, Jie
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (09) : 2054 - 2064
  • [36] ACIGS: An automated large-scale crops image generation system based on large visual language multi-modal models
    Liu, Bolong
    Zhang, Hao
    Liu, Jie
    Wang, Qiang
    2023 20TH ANNUAL IEEE INTERNATIONAL CONFERENCE ON SENSING, COMMUNICATION, AND NETWORKING, SECON, 2023,
  • [37] CONCEPTUAL DESIGN GENERATION USING LARGE LANGUAGE MODELS
    Ma, Kevin
    Grandi, Daniele
    McComb, Christopher
    Goucher-Lambert, Kosa
    PROCEEDINGS OF ASME 2023 INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, IDETC-CIE2023, VOL 6, 2023,
  • [38] Training large-scale language models with limited GPU memory: a survey
    Yu TANG
    Linbo QIAO
    Lujia YIN
    Peng LIANG
    Ao SHEN
    Zhilin YANG
    Lizhi ZHANG
    Dongsheng LI
    Frontiers of Information Technology & Electronic Engineering, 2025, 26 (03) : 309 - 331
  • [39] Regularized Continual Learning for Large-Scale Language Models via Probing
    Song, Xingshen
    Ren, Tianxiang
    Deng, Jinsheng
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024, 2025, 15361 : 29 - 41
  • [40] Romanization-based Large-scale Adaptation of Multilingual Language Models
    Purkayastha, Sukannya
    Ruder, Sebastian
    Pfeiffer, Jonas
    Gurevych, Iryna
    Vulic, Ivan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 7996 - 8005