Pre-Training Transformers as Energy-Based Cloze Models

被引:0
|
作者
Clark, Kevin [1 ]
Luong, Minh-Thang [2 ]
Le, Quoc V. [2 ]
Manning, Christopher D. [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] Google Brain, Mountain View, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce Electric, an energy-based cloze model for representation learning over text. Like BERT, it is a conditional generative model of tokens given their contexts. However, Electric does not use masking or output a full distribution over tokens that could occur in a context. Instead, it assigns a scalar energy score to each input token indicating how likely it is given its context. We train Electric using an algorithm based on noise-contrastive estimation and elucidate how this learning objective is closely related to the recently proposed ELECTRA pre-training method. Electric performs well when transferred to downstream tasks and is particularly effective at producing likelihood scores for text: it re-ranks speech recognition n-best lists better than language models and much faster than masked language models. Furthermore, it offers a clearer and more principled view of what ELECTRA learns during pre-training.
引用
收藏
页码:285 / 294
页数:10
相关论文
共 50 条
  • [1] Unsupervised Pre-Training for Detection Transformers
    Dai, Zhigang
    Cai, Bolun
    Lin, Yugeng
    Chen, Junying
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 12772 - 12782
  • [2] Evaluation of FractalDB Pre-training with Vision Transformers
    Nakashima K.
    Kataoka H.
    Satoh Y.
    Seimitsu Kogaku Kaishi/Journal of the Japan Society for Precision Engineering, 2023, 89 (01): : 99 - 104
  • [3] AN EMPIRICAL COMPARISON OF JOINT-TRAINING AND PRE-TRAINING FOR DOMAIN-AGNOSTIC SEMI-SUPERVISED LEARNING VIA ENERGY-BASED MODELS
    Song, Yunfu
    Zheng, Huahuan
    Ou, Zhijian
    2021 IEEE 31ST INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2021,
  • [4] TNT: Text Normalization based Pre-training of Transformers for Content Moderation
    Tan, Fei
    Hu, Yifan
    Hu, Changwei
    Li, Keqian
    Yen, Kevin
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 4735 - 4741
  • [5] Pre-training of Graph Augmented Transformers for Medication Recommendation
    Shang, Junyuan
    Ma, Tengfei
    Xiao, Cao
    Sun, Jimeng
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 5953 - 5959
  • [6] Lifting the Curse of Multilinguality by Pre-training Modular Transformers
    Pfeiffer, Jonas
    Goyal, Naman
    Lin, Xi Victoria
    Li, Xian
    Cross, James
    Riedel, Sebastian
    Artetxe, Mikel
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3479 - 3495
  • [7] Deep Pre-Training Transformers for Scientific Paper Representation
    Wang, Jihong
    Yang, Zhiguang
    Cheng, Zhanglin
    ELECTRONICS, 2024, 13 (11)
  • [8] Unsupervised Extractive Summarization by Pre-training Hierarchical Transformers
    Xu, Shusheng
    Zhang, Xingxing
    Wu, Yi
    Wei, Furu
    Zhou, Ming
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1784 - 1795
  • [9] Factored Phrase-Based Statistical Machine Pre-training with Extended Transformers
    Beyala, Vivien L.
    Li Litet, Perrin
    Nkenlifack, Marcellin J.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (09) : 51 - 59
  • [10] TUTA: Tree-based Transformers for Generally Structured Table Pre-training
    Wang, Zhiruo
    Dong, Haoyu
    Jia, Ran
    Li, Jia
    Fu, Zhiyi
    Han, Shi
    Zhang, Dongmei
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 1780 - 1790