Symbolic Knowledge Distillation: from General Language Models to Commonsense Models

被引:0
|
作者
West, Peter [1 ,2 ]
Bhagavatula, Chandra [2 ]
Hessel, Jack [2 ]
Hwang, Jena D. [2 ]
Jiang, Liwei [1 ,2 ]
Le Bras, Ronan [2 ]
Lu, Ximing [1 ,2 ]
Welleck, Sean [1 ,2 ]
Choi, Yejin [1 ,2 ]
机构
[1] Univ Washington, Paul G Allen Sch Comp Sci & Engn, Seattle, WA 98195 USA
[2] Allen Inst Artificial Intelligence, Seattle, WA 98103 USA
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The common practice for training commonsense models has gone from-human-to-corpus-to-machine: humans author commonsense knowledge graphs in order to train commonsense models. In this work, we investigate an alternative, from-machine-to-corpus-to-machine: general language models author these commonsense knowledge graphs to train commonsense models. Our study leads to a new framework, Symbolic Knowledge Distillation. As with prior art in Knowledge Distillation (Hinton et al., 2015), our approach uses larger models to teach smaller models. A key difference is that we distill knowledge symbolically-as text-in addition to the resulting neural model. We distill only one aspect-the commonsense of a general language model teacher, allowing the student to be a different type of model, a commonsense model. Altogether, we show that careful prompt engineering and a separately trained critic model allow us to selectively distill high-quality causal commonsense from GPT-3, a general language model. Empirical results demonstrate that, for the first time, a human-authored commonsense knowledge graph is surpassed by our automatically distilled variant in all three criteria: quantity, quality, and diversity. In addition, it results in a neural commonsense model that surpasses the teacher model's commonsense capabilities despite its 100x smaller size. We apply this to the ATOMIC resource, and will share our new symbolic knowledge graph and commonsense models(1).
引用
收藏
页码:4602 / 4625
页数:24
相关论文
共 50 条
  • [1] Localized Symbolic Knowledge Distillation for Visual Commonsense Models
    Park, Jae Sung
    Hessel, Jack
    Chandu, Khyathi
    Liang, Paul Pu
    Lu, Ximing
    West, Peter
    Yu, Youngjae
    Huang, Qiuyuan
    Gao, Jianfeng
    Farhadi, Ali
    Choi, Yejin
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [2] ISD-QA: Iterative Distillation of Commonsense Knowledge from General Language Models for Unsupervised Question Answering
    Ramamurthy, Priyadharsini
    Aakur, Sathyanarayanan N.
    [J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 1229 - 1235
  • [3] DSG-KD: Knowledge Distillation From Domain-Specific to General Language Models
    Cho, Sangyeon
    Jeon, Jangyeong
    Lee, Dongjoon
    Lee, Changhee
    Kim, Junyeong
    [J]. IEEE ACCESS, 2024, 12 : 130973 - 130982
  • [4] Knowledge is Power: Symbolic Knowledge Distillation, Commonsense Morality, & Multimodal Script Knowledge
    Choi, Yejin
    [J]. WSDM'22: PROCEEDINGS OF THE FIFTEENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2022, : 3 - 3
  • [5] Commonsense Knowledge Mining from Pretrained Models
    Feldman, Joshua
    Davison, Joe
    Rush, Alexander M.
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1173 - 1178
  • [6] Enhancing pretrained language models with structured commonsense knowledge for textual inference
    Du, Li
    Ding, Xiao
    Xiong, Kai
    Liu, Ting
    Qin, Bing
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 254
  • [7] Preserving Commonsense Knowledge from Pre-trained Language Models via Causal Inference
    Zheng, Junhao
    Ma, Qianli
    Qiu, Shengjie
    Wu, Yue
    Ma, Peitian
    Liu, Junlong
    Feng, Huawen
    Shang, Xichen
    Chen, Haibin
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 9155 - 9173
  • [8] Large Language Models as Commonsense Knowledge for Large-Scale Task Planning
    Zhao, Zirui
    Lee, Wee Sun
    Hsu, David
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [9] Utilizing Language Models to Expand Vision-Based Commonsense Knowledge Graphs
    Rezaei, Navid
    Reformat, Marek Z.
    [J]. SYMMETRY-BASEL, 2022, 14 (08):
  • [10] Incorporating Domain Knowledge and Semantic Information into Language Models for Commonsense Question Answering
    Zhou, Ruiying
    Tian, Keke
    Lai, Hanjiang
    Yin, Jian
    [J]. PROCEEDINGS OF THE 2021 IEEE 24TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN (CSCWD), 2021, : 1160 - 1165