MEGATRON-CNTRL: Controllable Story Generation with External Knowledge Using Large-Scale Language Models

被引：0

作者：

Xu, Peng ^{[1
]}

Patwary, Mostofa ^{[2
]}

Shoeybi, Mohammad ^{[2
]}

Puri, Raul ^{[2
]}

Fung, Pascale ^{[1
]}

Anandkumar, Anima ^{[2
]}

Catanzaro, Bryan ^{[2
]}

机构：

[1] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China

[2] NVIDIA, Santa Clara, CA 95051 USA

来源：

PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP) | 2020年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Existing pre-trained large language models have shown unparalleled generative capabilities. However, they are not controllable. In this paper, we propose MEGATRON-CNTRL, a novel framework that uses large-scale language models and adds control to text generation by incorporating an external knowledge base. Our framework consists of a keyword predictor, a knowledge retriever, a contextual knowledge ranker, and a conditional text generator. As we do not have access to ground-truth supervision for the knowledge ranker, we make use of weak supervision from sentence embedding. The empirical results show that our model generates more fluent, consistent, and coherent stories with less repetition and higher diversity compared to prior work on the ROC story dataset. We showcase the controllability of our model by replacing the keywords used to generate stories and re-running the generation process. Human evaluation results show that 77.5% of these stories are successfully controlled by the new keywords. Furthermore, by scaling our model from 124 million to 8.3 billion parameters we demonstrate that larger models improve both the quality of generation (from 74.5% to 93.0% for consistency) and controllability (from 77.5% to 91.5%).

引用

页码：2831 / 2845

页数：15

共 50 条

[1] Large Language Models as Commonsense Knowledge for Large-Scale Task Planning
Zhao, Zirui
Lee, Wee Sun
Hsu, David
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[2] From language models to large-scale food and biomedical knowledge graphs
Gjorgjina Cenikj
Lidija Strojnik
Risto Angelski
Nives Ogrinc
Barbara Koroušić Seljak
Tome Eftimov
Scientific Reports, 13
[3] From language models to large-scale food and biomedical knowledge graphs
Cenikj, Gjorgjina
Strojnik, Lidija
Angelski, Risto
Ogrinc, Nives
Seljak, Barbara Korousic
Eftimov, Tome
SCIENTIFIC REPORTS, 2023, 13 (01)
[4] Identifying large-scale interaction atlases using probabilistic graphs and external knowledge
Chanumolu, Sree K.
Otu, Hasan H.
JOURNAL OF CLINICAL AND TRANSLATIONAL SCIENCE, 2022, 6 (01)
[5] Efficient Large -Scale Language Model Training on GPU Clusters Using Megatron-LM
Narayanan, Deepak
Shoeybi, Mohammad
Casper, Jared
LeGresley, Patrick
Patwary, Mostofa
Korthikanti, Vijay
Vainbrand, Dmitri
Kashinkunti, Prethvi
Bernauer, Julie
Catanzaro, Bryan
Phanishayee, Amar
Zaharia, Matei
SC21: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2021,
[6] Large-Scale Transfer Learning for Natural Language Generation
Golovanov, Sergey
Kurbanov, Rauf
Nikolenko, Sergey
Truskovskyi, Kyryl
Tselousov, Alexander
Wolf, Thomas
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 6053 - 6058
[7] Enhancing Large Language Models Through External Domain Knowledge
Welz, Laslo
Lanquillon, Carsten
ARTIFICIAL INTELLIGENCE IN HCI, PT III, AI-HCI 2024, 2024, 14736 : 135 - 146
[8] Thrust: Adaptively Propels Large Language Models with External Knowledge
Zhao, Xinran
Zhang, Hongming
Pan, Xiaoman
Yao, Wenlin
Yu, Dong
Chen, Jianshu
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[9] Improving Large-scale Language Models and Resources for Filipino
Cruz, Jan Christian Blaise
Cheng, Charibeth
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6548 - 6555
[10] Benchmarking Large Language Models on Controllable Generation under Diversified Instructions
Chen, Yihan
Xu, Benfeng
Wang, Quan
Liu, Yi
Mao, Zhendong
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 17808 - 17816

← 1 2 3 4 5 →