共 50 条
- [1] Are Pre-trained Convolutions Better than Pre-trained Transformers? [J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 4349 - 4359
- [2] Dynamic Knowledge Distillation for Pre-trained Language Models [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 379 - 389
- [3] MINILM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
- [4] Calibration of Pre-trained Transformers [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 295 - 302
- [5] Pre-trained Adversarial Perturbations [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
- [6] AdaDS: Adaptive data selection for accelerating pre-trained language model knowledge distillation [J]. AI OPEN, 2023, 4 : 56 - 63
- [8] Pre-trained transformers: an empirical comparison [J]. MACHINE LEARNING WITH APPLICATIONS, 2022, 9
- [10] SHUFFLECOUNT: TASK-SPECIFIC KNOWLEDGE DISTILLATION FOR CROWD COUNTING [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 999 - 1003