MEGATRON-CNTRL: Controllable Story Generation with External Knowledge Using Large-Scale Language Models

被引：0

作者：

Xu, Peng ^{[1
]}

Patwary, Mostofa ^{[2
]}

Shoeybi, Mohammad ^{[2
]}

Puri, Raul ^{[2
]}

Fung, Pascale ^{[1
]}

Anandkumar, Anima ^{[2
]}

Catanzaro, Bryan ^{[2
]}

机构：

[1] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China

[2] NVIDIA, Santa Clara, CA 95051 USA

来源：

PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP) | 2020年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Existing pre-trained large language models have shown unparalleled generative capabilities. However, they are not controllable. In this paper, we propose MEGATRON-CNTRL, a novel framework that uses large-scale language models and adds control to text generation by incorporating an external knowledge base. Our framework consists of a keyword predictor, a knowledge retriever, a contextual knowledge ranker, and a conditional text generator. As we do not have access to ground-truth supervision for the knowledge ranker, we make use of weak supervision from sentence embedding. The empirical results show that our model generates more fluent, consistent, and coherent stories with less repetition and higher diversity compared to prior work on the ROC story dataset. We showcase the controllability of our model by replacing the keywords used to generate stories and re-running the generation process. Human evaluation results show that 77.5% of these stories are successfully controlled by the new keywords. Furthermore, by scaling our model from 124 million to 8.3 billion parameters we demonstrate that larger models improve both the quality of generation (from 74.5% to 93.0% for consistency) and controllability (from 77.5% to 91.5%).

引用

页码：2831 / 2845

页数：15

共 50 条

[41] Significance of neural phonotactic models for large-scale spoken language identification
Srivastava, Brij Mohan Lal
Vydana, Hari
Vuppala, Anil Kumar
Shrivastava, Manish
2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 2144 - 2151
[42] FedID: Federated Interactive Distillation for Large-Scale Pretraining Language Models
Ma, Xinge
Liu, Jiangming
Wang, Jin
Zhang, Xuejie
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 8566 - 8577
[43] Training Large-Scale News Recommenders with Pretrained Language Models in the Loop
Xiao, Shitao
Liu, Zheng
Shao, Yingxia
Di, Tao
Middha, Bhuvan
Wu, Fangzhao
Xie, Xing
PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4215 - 4225
[44] Training large-scale language models with limited GPU memory: a survey
Tang, Yu
Qiao, Linbo
Yin, Lujia
Liang, Peng
Shen, Ao
Yang, Zhilin
Zhang, Lizhi
Li, Dongsheng
FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2025, : 309 - 331
[45] Role Models in Language Learning: Results of a Large-Scale International Survey
Muir, Christine
Dornyei, Zoltan
Adolphs, Svenja
APPLIED LINGUISTICS, 2021, 42 (01) : 1 - 23
[46] A Study on Prompt Types for Harmlessness Assessment of Large-Scale Language Models
Shin, Yejin
Kim, Song-yi
Byun, Eun Young
HCI INTERNATIONAL 2024 POSTERS, PT VII, HCII 2024, 2024, 2120 : 228 - 233
[47] Natural Language Processing in Large-Scale Neural Models for Medical Screenings
Stille, Catharina Marie
Bekolay, Trevor
Blouw, Peter
Kroeger, Bernd J.
FRONTIERS IN ROBOTICS AND AI, 2019, 6
[48] TGEA 2.0: A Large-Scale Diagnostically Annotated Dataset with Benchmark Tasks for Text Generation of Pretrained Language Models
Ge, Huibin
Zhao, Xiaohu
Liu, Chuang
Zeng, Yulong
Liu, Qun
Xiong, Deyi
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[49] StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models
Guo, Zhicheng
Cheng, Sijie
Wang, Hao
Liang, Shihao
Qin, Yujia
Li, Peng
Liu, Zhiyuan
Sun, Maosong
Liu, Yang
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 11143 - 11156
[50] Hurricane forecasts using a suite of large-scale models
Krishnamurti, T. N.
Biswas, Mrinal K.
Mackey, Brian P.
Ellingson, Robert G.
Ruscher, Paul H.
TELLUS SERIES A-DYNAMIC METEOROLOGY AND OCEANOGRAPHY, 2011, 63 (04) : 727 - 745

← 1 2 3 4 5 →