MEGATRON-CNTRL: Controllable Story Generation with External Knowledge Using Large-Scale Language Models

被引:0
|
作者
Xu, Peng [1 ]
Patwary, Mostofa [2 ]
Shoeybi, Mohammad [2 ]
Puri, Raul [2 ]
Fung, Pascale [1 ]
Anandkumar, Anima [2 ]
Catanzaro, Bryan [2 ]
机构
[1] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
[2] NVIDIA, Santa Clara, CA 95051 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing pre-trained large language models have shown unparalleled generative capabilities. However, they are not controllable. In this paper, we propose MEGATRON-CNTRL, a novel framework that uses large-scale language models and adds control to text generation by incorporating an external knowledge base. Our framework consists of a keyword predictor, a knowledge retriever, a contextual knowledge ranker, and a conditional text generator. As we do not have access to ground-truth supervision for the knowledge ranker, we make use of weak supervision from sentence embedding. The empirical results show that our model generates more fluent, consistent, and coherent stories with less repetition and higher diversity compared to prior work on the ROC story dataset. We showcase the controllability of our model by replacing the keywords used to generate stories and re-running the generation process. Human evaluation results show that 77.5% of these stories are successfully controlled by the new keywords. Furthermore, by scaling our model from 124 million to 8.3 billion parameters we demonstrate that larger models improve both the quality of generation (from 74.5% to 93.0% for consistency) and controllability (from 77.5% to 91.5%).
引用
收藏
页码:2831 / 2845
页数:15
相关论文
共 50 条
  • [21] On the Multilingual Capabilities of Very Large-Scale English Language Models
    Armengol-Estape, Jordi
    de Gibert Bonet, Ona
    Melero, Maite
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 3056 - 3068
  • [22] KNOWLEDGE TRANSFER FROM LARGE-SCALE PRETRAINED LANGUAGE MODELS TO END-TO-END SPEECH RECOGNIZERS
    Kubo, Yotaro
    Karita, Shigeki
    Bacchiani, Michiel
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8512 - 8516
  • [23] Large-Scale Random Forest Language Models for Speech Recognition
    Su, Yi
    Jelinek, Frederick
    Khudanpur, Sanjeev
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 945 - 948
  • [24] Large-Scale Language Models for Sarcasm Detection with Data Augmentation
    Zhang, Linrui
    Copus, Belinda
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT II, NLDB 2024, 2024, 14763 : 1 - 9
  • [25] Towards Artwork Explanation in Large-scale Vision Language Models
    Hayashi, Kazuki
    Sakai, Yusuke
    Kamigaito, Hidetaka
    Hayashi, Katsuhiko
    Watanabe, Taro
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 705 - 729
  • [26] MedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models
    Cai, Yan
    Wang, Linlin
    Wang, Ye
    de Melo, Gerard
    Zhang, Ya
    Wang, Yanfeng
    He, Liang
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 17709 - 17717
  • [27] Bridging large-scale neuronal recordings and large-scale network models using dimensionality reduction
    Williamson, Ryan C.
    Doiron, Brent
    Smith, Matthew A.
    Yu, Byron M.
    CURRENT OPINION IN NEUROBIOLOGY, 2019, 55 : 40 - 47
  • [28] Stochastic generation of subgrid-scale cloudy columns for large-scale models
    Räisänen, P
    Barker, HW
    Khairoutdinov, MF
    Li, JN
    Randall, DA
    QUARTERLY JOURNAL OF THE ROYAL METEOROLOGICAL SOCIETY, 2004, 130 (601) : 2047 - 2067
  • [29] Poisoning medical knowledge using large language models
    Yang, Junwei
    Xu, Hanwen
    Mirzoyan, Srbuhi
    Chen, Tong
    Liu, Zixuan
    Liu, Zequn
    Ju, Wei
    Liu, Luchen
    Xiao, Zhiping
    Zhang, Ming
    Wang, Sheng
    NATURE MACHINE INTELLIGENCE, 2024, 6 (10) : 1156 - 1168
  • [30] KARGEN: Knowledge-Enhanced Automated Radiology Report Generation Using Large Language Models
    Li, Yingshu
    Wang, Zhanyu
    Liu, Yunyi
    Wang, Lei
    Liu, Lingqiao
    Zhou, Luping
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT V, 2024, 15005 : 382 - 392