DOBF: A Deobfuscation Pre-Training Objective for Programming Languages

被引：0

作者：

Lachaux, Marie-Anne ^{[1
]}

Roziere, Baptiste ^{[2
]}

Szafraniec, Marc ^{[1
]}

Lample, Guillaume ^{[1
]}

机构：

[1] Facebook AI Res, New York, NY 10021 USA

[2] Paris Dauphine Univ, Paris, France

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent advances in self-supervised learning have dramatically improved the state of the art on a wide variety of tasks. However, research in language model pre-training has mostly focused on natural languages, and it is unclear whether models like BERT and its variants provide the best pre-training when applied to other modalities, such as source code. In this paper, we introduce a new pre-training objective, DOBF, that leverages the structural aspect of programming languages and pre-trains a model to recover the original version of obfuscated source code. We show that models pre-trained with DOBF significantly outperform existing approaches on multiple downstream tasks, providing relative improvements of up to 12.2% in unsupervised code translation, and 5.3% in natural language code search. Incidentally, we found that our pre-trained model is able to deobfuscate fully obfuscated source files, and to suggest descriptive variable names.

引用

页数：13

共 50 条

[1] eMLM: A New Pre-training Objective for Emotion Related Tasks
Sosea, Tiberiu
Caragea, Cornelia
ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 286 - 293
[2] Pre-training via Leveraging Assisting Languages for Neural Machine Translation
Song, Haiyue
Dabre, Raj
Mao, Zhuoyuan
Cheng, Fei
Kurohashi, Sadao
Sumita, Eiichiro
58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020): STUDENT RESEARCH WORKSHOP, 2020, : 279 - 285
[3] Stop Pre-Training: Adapt Visual-Language Models to Unseen Languages
Karouimu, Yasmine
Lebret, Remi
Foroutan, Negar
Aberer, Karl
61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 366 - 375
[4] Multi-stage Pre-training over Simplified Multimodal Pre-training Models
Liu, Tongtong
Feng, Fangxiang
Wang, Xiaojie
59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2556 - 2565
[5] Evaluating Pre-training Objectives for Low-Resource Translation into Morphologically Rich Languages
Dhar, Prajit
Bisazza, Arianna
van Noord, Gertjan
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 4933 - 4943
[6] Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks
Dong, Haoyu
Cheng, Zhoujun
He, Xinyi
Zhou, Mengyu
Zhou, Anda
Zhou, Fan
Liu, Ao
Han, Shi
Zhang, Dongmei
PROCEEDINGS OF THE THIRTY-FIRST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2022, 2022, : 5426 - 5435
[7] Rethinking ImageNet Pre-training
He, Kaiming
Girshick, Ross
Dollar, Piotr
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4917 - 4926
[8] Photo Pre-Training, But for Sketch
Ke, L.
Pang, Kaiyue
Song, Yi-Zhe
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2754 - 2764
[9] Pre-Training to Learn in Context
Gu, Yuxian
Dong, Li
Wei, Furu
Huang, Minlie
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 4849 - 4870
[10] Pre-training via Paraphrasing
Lewis, Mike
Ghazvininejad, Marjan
Ghosh, Gargi
Aghajanyan, Armen
Wang, Sida
Zettlemoyer, Luke
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33

← 1 2 3 4 5 →