Pre-Training Transformers as Energy-Based Cloze Models

被引:0
|
作者
Clark, Kevin [1 ]
Luong, Minh-Thang [2 ]
Le, Quoc V. [2 ]
Manning, Christopher D. [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] Google Brain, Mountain View, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce Electric, an energy-based cloze model for representation learning over text. Like BERT, it is a conditional generative model of tokens given their contexts. However, Electric does not use masking or output a full distribution over tokens that could occur in a context. Instead, it assigns a scalar energy score to each input token indicating how likely it is given its context. We train Electric using an algorithm based on noise-contrastive estimation and elucidate how this learning objective is closely related to the recently proposed ELECTRA pre-training method. Electric performs well when transferred to downstream tasks and is particularly effective at producing likelihood scores for text: it re-ranks speech recognition n-best lists better than language models and much faster than masked language models. Furthermore, it offers a clearer and more principled view of what ELECTRA learns during pre-training.
引用
收藏
页码:285 / 294
页数:10
相关论文
共 50 条
  • [31] Energy-Based Models for Cross-Modal Localization using Convolutional Transformers
    Wu, Alan
    Ryoo, Michael S.
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 11726 - 11733
  • [32] Code Smell Detection Research Based on Pre-training and Stacking Models
    Zhang, Dongwen
    Song, Shuai
    Zhang, Yang
    Liu, Haiyang
    Shen, Gaojie
    IEEE LATIN AMERICA TRANSACTIONS, 2024, 22 (01) : 22 - 30
  • [33] Pre-training and Evaluating Transformer-based Language Models for Icelandic
    Daoason, Jon Friorik
    Loftsson, Hrafn
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 7386 - 7391
  • [34] Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery
    Noman, Mubashir
    Naseer, Muzammal
    Cholakkal, Hisham
    Anwar, Rao Muhammad
    Khan, Salman
    Khan, Fahad Shahbaz
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 27811 - 27819
  • [35] Friend Ranking in Online Games via Pre-training Edge Transformers
    Yao, Liang
    Peng, Jiazhen
    Ji, Shenggong
    Liu, Qiang
    Cai, Hongyun
    He, Feng
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 2016 - 2020
  • [36] Learning Kernel Stein Discrepancy for Training Energy-Based Models
    Niu, Lu
    Li, Shaobo
    Li, Zhenping
    APPLIED SCIENCES-BASEL, 2023, 13 (22):
  • [37] Training Energy-Based Models for Time-Series Imputation
    Brakel, Philemon
    Stroobandt, Dirk
    Schrauwen, Benjamin
    JOURNAL OF MACHINE LEARNING RESEARCH, 2013, 14 : 2771 - 2797
  • [38] Efficient training of energy-based models using Jarzynski equality
    Carbone, Davide
    Hua, Mengjian
    Coste, Simon
    Vanden-Eijnden, Eric
    JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2024, 2024 (10):
  • [39] Efficient Training of Energy-Based Models Using Jarzynski Equality
    Carbone, Davide
    Hua, Mengjian
    Coste, Simon
    Vanden-Eijnden, Eric
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [40] Supervised contrastive pre-training models for mammography screening
    Cao, Zhenjie
    Deng, Zhuo
    Yang, Zhicheng
    Ma, Jie
    Ma, Lan
    JOURNAL OF BIG DATA, 2025, 12 (01)