Calibration of Pre-trained Transformers

被引：0

作者：

Desai, Shrey ^{[1
]}

Durrett, Greg ^{[1
]}

机构：

[1] Univ Texas Austin, Dept Comp Sci, Austin, TX 78712 USA

来源：

PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP) | 2020年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pre-trained Transformers are now ubiquitous in natural language processing, but despite their high end-task performance, little is known empirically about whether they are calibrated. Specifically, do these models' posterior probabilities provide an accurate empirical measure of how likely the model is to be correct on a given example? We focus on BERT (Devlin et al., 2019) and RoBERTa (Liu et al., 2019) in this work, and analyze their calibration across three tasks: natural language inference, paraphrase detection, and commonsense reasoning. For each task, we consider in-domain as well as challenging out-of-domain settings, where models face more examples they should be uncertain about. We show that: (1) when used out-of-the-box, pre-trained models are calibrated in-domain, and compared to baselines, their calibration error out-of-domain can be as much as 3.5x lower; (2) temperature scaling is effective at further reducing calibration error in-domain, and using label smoothing to deliberately increase empirical uncertainty helps calibrate posteriors out-of-domain.(1)

引用

页码：295 / 302

页数：8

共 50 条

[1] Are Pre-trained Convolutions Better than Pre-trained Transformers?
Tay, Yi
Dehghani, Mostafa
Gupta, Jai
Aribandi, Vamsi
Bahri, Dara
Qin, Zhen
Metzler, Donald
59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 4349 - 4359
[2] Emergent Modularity in Pre-trained Transformers
Zhang, Zhengyan
Zeng, Zhiyuan
Lin, Yankai
Xiao, Chaojun
Wang, Xiaozhi
Han, Xu
Liu, Zhiyuan
Xie, Ruobing
Sun, Maosong
Zhou, Jie
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 4066 - 4083
[3] Pre-trained transformers: an empirical comparison
Casola, Silvia
Lauriola, Ivano
Lavelli, Alberto
MACHINE LEARNING WITH APPLICATIONS, 2022, 9
[4] Face Inpainting with Pre-trained Image Transformers
Gonc, Kaan
Saglam, Baturay
Kozat, Suleyman S.
Dibeklioglu, Hamdi
2022 30TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2022,
[5] How Different are Pre-trained Transformers for Text Ranking?
Rau, David
Kamps, Jaap
ADVANCES IN INFORMATION RETRIEVAL, PT II, 2022, 13186 : 207 - 214
[6] Efficient feature selection for pre-trained vision transformers
Huang, Lan
Zeng, Jia
Yu, Mengqiang
Ding, Weiping
Bai, Xingyu
Wang, Kangping
COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 254
[7] Predicting Terms in IS-A Relations with Pre-trained Transformers
Nikishina, Irina
Chernomorchenko, Polina
Demidova, Anastasiia
Panchenko, Alexander
Biemann, Chris
13TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING AND THE 3RD CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, IJCNLP-AACL 2023, 2023, : 134 - 148
[8] Generative pre-trained transformers (GPT) for surface engineering
Kamnis, Spyros
SURFACE & COATINGS TECHNOLOGY, 2023, 466
[9] Generating Extended and Multilingual Summaries with Pre-trained Transformers
Calizzano, Remi
Ostendorff, Malte
Ruan, Qian
Rehm, Georg
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1640 - 1650
[10] GENERATIVE PRE-TRAINED TRANSFORMERS FOR BIOLOGICALLY INSPIRED DESIGN
Zhu, Qihao
Zhang, Xinyu
Luo, Jianxi
PROCEEDINGS OF ASME 2022 INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, IDETC-CIE2022, VOL 6, 2022,

← 1 2 3 4 5 →