Training ELECTRA Augmented with Multi-word Selection

被引:0
|
作者
Shen, Jiaming [1 ]
Liu, Jialu [2 ]
Liu, Tianqi [2 ]
Yu, Cong [2 ]
Han, Jiawei [1 ]
机构
[1] Univ Illinois, Champaign, IL USA
[2] Google Res, New York, NY 10002 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-trained text encoders such as BERT and its variants have recently achieved state-of-the-art performances on many NLP tasks. While being effective, these pre-training methods typically demand massive computation resources. To accelerate pre-training, ELECTRA trains a discriminator that predicts whether each input token is replaced by a generator. However, this new task, as a binary classification, is less semantically informative. In this study, we present a new text encoder pre-training method that improves ELECTRA based on multi-task learning. Specifically, we train the discriminator to simultaneously detect replaced tokens and select original tokens from candidate sets. We further develop two techniques to effectively combine all pre-training tasks: (1) using attention-based networks for task-specific heads, and (2) sharing bottom layers of the generator and the discriminator. Extensive experiments on GLUE and SQuAD datasets demonstrate both the effectiveness and the efficiency of our proposed method.
引用
收藏
页码:2475 / 2486
页数:12
相关论文
共 50 条
  • [1] AUGMENTED MUTUAL INFORMATION FOR MULTI-WORD EXTRACTION
    Zhang, Wen
    Yoshida, Taketoshi
    Ho, Tu Bao
    Tang, Xijin
    [J]. INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2009, 5 (02): : 543 - 554
  • [2] Lexical selection in multi-word production
    Janssen, Niels
    Caramazza, Alfonso
    [J]. FRONTIERS IN PSYCHOLOGY, 2011, 2
  • [3] Multi-word terms selection for information retrieval
    Bechikh Ali, Chedi
    Haddad, Hatem
    Slimani, Yahya
    [J]. INFORMATION DISCOVERY AND DELIVERY, 2023, 51 (01) : 74 - 87
  • [4] Chunks, multi-word units et cetera: The role of multi-word units in second language acquisition
    Aguado, Karin
    [J]. DEUTSCH ALS FREMDSPRACHE-ZEITSCHRIFT ZUR THEORIE UND PRAXIS DES FACHES DEUTSCH ALS FREMDSPRACHE, 2024, 61 (01):
  • [5] Phonological similarity in multi-word units
    Gries, Stefan Th.
    [J]. COGNITIVE LINGUISTICS, 2011, 22 (03) : 491 - 510
  • [6] Verbal Multi-Word Expressions in Yiddish
    Liebeskind, Chaya
    HaCohen-Kerner, Yaakov
    [J]. NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2018), 2018, 10859 : 205 - 216
  • [7] A century in the life of multi-word verbs
    Claridge, C
    [J]. CORPUS-BASED STUDIES IN ENGLISH, 1997, (20): : 69 - 85
  • [8] On the Structural Disambiguation of Multi-word Terms
    Cabezas-Garcia, Melania
    Leon-Arauz, Pilar
    [J]. COMPUTATIONAL AND CORPUS-BASED PHRASEOLOGY, EUROPHRAS 2019, 2019, 11755 : 46 - 60
  • [9] Reactive multi-word synchronization for multiprocessors
    Ha, PH
    Tsigas, P
    [J]. 12TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PROCEEDINGS, 2003, : 184 - 193
  • [10] A multi-word term extraction system
    Chen, Jisong
    Yeh, Chung-Hsing
    Chau, Rowena
    [J]. PRICAI 2006: TRENDS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4099 : 1160 - 1165