MEGATRON-CNTRL: Controllable Story Generation with External Knowledge Using Large-Scale Language Models

被引:0
|
作者
Xu, Peng [1 ]
Patwary, Mostofa [2 ]
Shoeybi, Mohammad [2 ]
Puri, Raul [2 ]
Fung, Pascale [1 ]
Anandkumar, Anima [2 ]
Catanzaro, Bryan [2 ]
机构
[1] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
[2] NVIDIA, Santa Clara, CA 95051 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing pre-trained large language models have shown unparalleled generative capabilities. However, they are not controllable. In this paper, we propose MEGATRON-CNTRL, a novel framework that uses large-scale language models and adds control to text generation by incorporating an external knowledge base. Our framework consists of a keyword predictor, a knowledge retriever, a contextual knowledge ranker, and a conditional text generator. As we do not have access to ground-truth supervision for the knowledge ranker, we make use of weak supervision from sentence embedding. The empirical results show that our model generates more fluent, consistent, and coherent stories with less repetition and higher diversity compared to prior work on the ROC story dataset. We showcase the controllability of our model by replacing the keywords used to generate stories and re-running the generation process. Human evaluation results show that 77.5% of these stories are successfully controlled by the new keywords. Furthermore, by scaling our model from 124 million to 8.3 billion parameters we demonstrate that larger models improve both the quality of generation (from 74.5% to 93.0% for consistency) and controllability (from 77.5% to 91.5%).
引用
收藏
页码:2831 / 2845
页数:15
相关论文
共 50 条
  • [1] Large Language Models as Commonsense Knowledge for Large-Scale Task Planning
    Zhao, Zirui
    Lee, Wee Sun
    Hsu, David
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [2] From language models to large-scale food and biomedical knowledge graphs
    Gjorgjina Cenikj
    Lidija Strojnik
    Risto Angelski
    Nives Ogrinc
    Barbara Koroušić Seljak
    Tome Eftimov
    Scientific Reports, 13
  • [3] From language models to large-scale food and biomedical knowledge graphs
    Cenikj, Gjorgjina
    Strojnik, Lidija
    Angelski, Risto
    Ogrinc, Nives
    Seljak, Barbara Korousic
    Eftimov, Tome
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [4] Identifying large-scale interaction atlases using probabilistic graphs and external knowledge
    Chanumolu, Sree K.
    Otu, Hasan H.
    JOURNAL OF CLINICAL AND TRANSLATIONAL SCIENCE, 2022, 6 (01)
  • [5] Efficient Large -Scale Language Model Training on GPU Clusters Using Megatron-LM
    Narayanan, Deepak
    Shoeybi, Mohammad
    Casper, Jared
    LeGresley, Patrick
    Patwary, Mostofa
    Korthikanti, Vijay
    Vainbrand, Dmitri
    Kashinkunti, Prethvi
    Bernauer, Julie
    Catanzaro, Bryan
    Phanishayee, Amar
    Zaharia, Matei
    SC21: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2021,
  • [6] Large-Scale Transfer Learning for Natural Language Generation
    Golovanov, Sergey
    Kurbanov, Rauf
    Nikolenko, Sergey
    Truskovskyi, Kyryl
    Tselousov, Alexander
    Wolf, Thomas
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 6053 - 6058
  • [7] Enhancing Large Language Models Through External Domain Knowledge
    Welz, Laslo
    Lanquillon, Carsten
    ARTIFICIAL INTELLIGENCE IN HCI, PT III, AI-HCI 2024, 2024, 14736 : 135 - 146
  • [8] Thrust: Adaptively Propels Large Language Models with External Knowledge
    Zhao, Xinran
    Zhang, Hongming
    Pan, Xiaoman
    Yao, Wenlin
    Yu, Dong
    Chen, Jianshu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [9] Improving Large-scale Language Models and Resources for Filipino
    Cruz, Jan Christian Blaise
    Cheng, Charibeth
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6548 - 6555
  • [10] Benchmarking Large Language Models on Controllable Generation under Diversified Instructions
    Chen, Yihan
    Xu, Benfeng
    Wang, Quan
    Liu, Yi
    Mao, Zhendong
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 17808 - 17816