Self-conditioning Pre-Trained Language Models

被引:0
|
作者
Suau, Xavier [1 ]
Zappella, Luca [1 ]
Apostoloff, Nicholas [1 ]
机构
[1] Apple, Cupertino, CA 95014 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we aim to investigate the mechanisms that guide text generation with pre-trained Transformer-based Language Models (TLMs). Grounded on the Product of Experts formulation by Hinton (1999), we describe a generative mechanism that exploits expert units which naturally exist in TLMs. Such units are responsible for detecting concepts in the input and conditioning text generation on such concepts. We describe how to identify expert units and how to activate them during inference in order to induce any desired concept in the generated output. We find that the activation of a surprisingly small amount of units is sufficient to steer text generation (as little as 3 units in a model with 345M parameters). While the objective of this work is to learn more about how TLMs work, we show that our method is effective for conditioning without fine-tuning or using extra parameters, even on fine-grained homograph concepts. Additionally, we show that our method can be used to correct gender bias present in the output of TLMs and achieves gender parity for all evaluated contexts. We compare our method with FUDGE (Yang & Klein, 2021) and PPLM-BoW (Dathathri et al., 2020), and show that our approach is able to achieve gender parity at a lower perplexity and better Self-BLEU score. The proposed method is accessible to a wide audience thanks to its simplicity and minimal compute needs. The findings in this paper are a step forward in understanding the generative mechanisms of TLMs.
引用
收藏
页数:19
相关论文
共 50 条
  • [31] Memorisation versus Generalisation in Pre-trained Language Models
    Tanzer, Michael
    Ruder, Sebastian
    Rei, Marek
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 7564 - 7578
  • [32] Capturing Semantics for Imputation with Pre-trained Language Models
    Mei, Yinan
    Song, Shaoxu
    Fang, Chenguang
    Yang, Haifeng
    Fang, Jingyun
    Long, Jiang
    [J]. 2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, : 61 - 72
  • [33] On the Sentence Embeddings from Pre-trained Language Models
    Li, Bohan
    Zhou, Hao
    He, Junxian
    Wang, Mingxuan
    Yang, Yiming
    Li, Lei
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 9119 - 9130
  • [34] Compressing Pre-trained Language Models by Matrix Decomposition
    Ben Noach, Matan
    Goldberg, Yoav
    [J]. 1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), 2020, : 884 - 889
  • [35] Pre-trained language models for keyphrase prediction: A review
    Umair, Muhammad
    Sultana, Tangina
    Lee, Young-Koo
    [J]. ICT EXPRESS, 2024, 10 (04): : 871 - 890
  • [36] Pre-trained models for natural language processing: A survey
    QIU XiPeng
    SUN TianXiang
    XU YiGe
    SHAO YunFan
    DAI Ning
    HUANG XuanJing
    [J]. Science China(Technological Sciences), 2020, (10) - 1897
  • [37] Evaluating the Summarization Comprehension of Pre-Trained Language Models
    D. I. Chernyshev
    B. V. Dobrov
    [J]. Lobachevskii Journal of Mathematics, 2023, 44 : 3028 - 3039
  • [38] Pre-trained models for natural language processing: A survey
    XiPeng Qiu
    TianXiang Sun
    YiGe Xu
    YunFan Shao
    Ning Dai
    XuanJing Huang
    [J]. Science China Technological Sciences, 2020, 63 : 1872 - 1897
  • [39] Robust Lottery Tickets for Pre-trained Language Models
    Zheng, Rui
    Bao, Rong
    Zhou, Yuhao
    Liang, Di
    Wane, Sirui
    Wu, Wei
    Gui, Tao
    Zhang, Qi
    Huang, Xuanjing
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 2211 - 2224
  • [40] Pre-Trained Language Models for Text Generation: A Survey
    Li, Junyi
    Tang, Tianyi
    Zhao, Wayne Xin
    Nie, Jian-Yun
    Wen, Ji-Rong
    [J]. ACM COMPUTING SURVEYS, 2024, 56 (09)