Pre-Training Transformers as Energy-Based Cloze Models

被引:0
|
作者
Clark, Kevin [1 ]
Luong, Minh-Thang [2 ]
Le, Quoc V. [2 ]
Manning, Christopher D. [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] Google Brain, Mountain View, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce Electric, an energy-based cloze model for representation learning over text. Like BERT, it is a conditional generative model of tokens given their contexts. However, Electric does not use masking or output a full distribution over tokens that could occur in a context. Instead, it assigns a scalar energy score to each input token indicating how likely it is given its context. We train Electric using an algorithm based on noise-contrastive estimation and elucidate how this learning objective is closely related to the recently proposed ELECTRA pre-training method. Electric performs well when transferred to downstream tasks and is particularly effective at producing likelihood scores for text: it re-ranks speech recognition n-best lists better than language models and much faster than masked language models. Furthermore, it offers a clearer and more principled view of what ELECTRA learns during pre-training.
引用
收藏
页码:285 / 294
页数:10
相关论文
共 50 条
  • [21] Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves
    Takashima, Sora
    Hayamizu, Ryo
    Inoue, Nakamasa
    Kataoka, Hirokatsu
    Yokota, Rio
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18579 - 18588
  • [22] DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions
    Wang, Haochen
    Fan, Junsong
    Wang, Yuxi
    Song, Kaiyou
    Wang, Tong
    Zhang, Zhaoxiang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [23] Pre-Training Transformers for Fingerprinting to Improve Stress Prediction in fMRI
    Rosenman, Gony
    Malkiel, Itzik
    Greental, Ayam
    Hendler, Talma
    Wolf, Lior
    MEDICAL IMAGING WITH DEEP LEARNING, VOL 227, 2023, 227 : 212 - 234
  • [24] Improved Contrastive Divergence Training of Energy-Based Models
    Du, Yilun
    Li, Shuang
    Tenenbaum, Joshua
    Mordatch, Igor
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [25] Point Cloud Pre-training with Diffusion Models
    Zheng, Xiao
    Huang, Xiaoshui
    Mei, Guofeng
    Hou, Yuenan
    Lyu, Zhaoyang
    Dai, Bo
    Ouyang, Wanli
    Gong, Yongshun
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 22935 - 22945
  • [26] Pre-training Mention Representations in Coreference Models
    Varkel, Yuval
    Globerson, Amir
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 8534 - 8540
  • [27] Pre-training Language Models for Comparative Reasoning
    Yu, Mengxia
    Zhang, Zhihan
    Yu, Wenhao
    Jiang, Meng
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 12421 - 12433
  • [28] Bootstrapping ViTs: Towards Liberating Vision Transformers from Pre-training
    Zhang, Haofei
    Duan, Jiarui
    Xue, Mengqi
    Song, Jie
    Sun, Li
    Song, Mingli
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8934 - 8943
  • [29] VILA: On Pre-training for Visual Language Models
    Lin, Ji
    Yin, Hongxu
    Ping, Wei
    Molchanov, Pavlo
    Shoeybi, Mohammad
    Han, Song
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 26679 - 26689
  • [30] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
    Dai, Zhigang
    Cai, Bolun
    Lin, Yugeng
    Chen, Junying
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1601 - 1610