Keeping Pace with Ever-Increasing Data: Towards Continual Learning of Code Intelligence Models

被引:3
|
作者
Gao, Shuzheng [1 ]
Zhang, Hongyu [2 ]
Gao, Cuiyun [1 ]
Wang, Chaozheng [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Shenzhen, Peoples R China
[2] Chongqing Univ, Sch Big Data & Software Engn, Chongqing, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
NETWORK;
D O I
10.1109/ICSE48619.2023.00015
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Previous research on code intelligence usually trains a deep learning model on a fixed dataset in an offline manner. However, in real-world scenarios, new code repositories emerge incessantly, and the carried new knowledge is beneficial for providing up-to-date code intelligence services to developers. In this paper, we aim at the following problem: How to enable code intelligence models to continually learn from ever-increasing data? One major challenge here is catastrophic forgetting, meaning that the model can easily forget knowledge learned from previous datasets when learning from the new dataset. To tackle this challenge, we propose REPEAT, a novel method for continual learning of code intelligence models. Specifically, REPEAT addresses the catastrophic forgetting problem with representative exemplars replay and adaptive parameter regularization. The representative exemplars replay component selects informative and diverse exemplars in each dataset and uses them to retrain model periodically. The adaptive parameter regularization component recognizes important parameters in the model and adaptively penalizes their changes to preserve the knowledge learned before. We evaluate the proposed approach on three code intelligence tasks including code summarization, software vulnerability detection, and code clone detection. Extensive experiments demonstrate that REPEAT consistently outperforms baseline methods on all tasks. For example, REPEAT improves the conventional fine-tuning method by 1.22, 5.61, and 1.72 on code summarization, vulnerability detection and clone detection, respectively.
引用
收藏
页码:30 / 42
页数:13
相关论文
共 50 条
  • [1] ChatGPT: An ever-increasing encroachment of artificial intelligence in online assessment in distance education
    Naidu, Katharine
    Sevnarayan, Kershnee
    [J]. ONLINE JOURNAL OF COMMUNICATION AND MEDIA TECHNOLOGIES, 2023, 13 (03):
  • [2] TopicOcean: An Ever-Increasing Topic Model With Meta-learning
    Song, Yuanfeng
    Tong, Yongxin
    Bao, Siqi
    Jiang, Di
    Wu, Hua
    Wong, Raymond Chi-Wing
    [J]. 20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2020), 2020, : 1262 - 1267
  • [3] THE EVER-INCREASING ROLE OF ARTIFICIAL INTELLIGENCE IN NDE-BASED SAFETY ASSURANCE
    Venkatachalam, Raj
    Zafar, Suhaib
    Singh, Ripi
    [J]. MATERIALS EVALUATION, 2023, 81 (03) : 13 - 14
  • [4] Exploring Continual Learning for Code Generation Models
    Yadav, Prateek
    Sun, Qing
    Ding, Hantian
    Li, Xiaopeng
    Zhang, Dejiao
    Tan, Ming
    Ma, Xiaofei
    Bhatia, Parminder
    Nallapati, Ramesh
    Ramanathan, Murali Krishna
    Bansal, Mohit
    Xiang, Bing
    [J]. 61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 782 - 792
  • [5] Towards Continual Knowledge Learning of Vehicle CAN-data
    Ahmed, Sajeel
    Esbel, Ousama
    Muehlhaeuser, Max
    Guinea, Alejandro Sanchez
    [J]. 2023 IEEE INTELLIGENT VEHICLES SYMPOSIUM, IV, 2023,
  • [6] Oceanographic high-throughput communications system - A communications system to fit the ever-increasing demand for high data transmission rates
    Rosario, TN
    Hardiman, JE
    [J]. SEA TECHNOLOGY, 2003, 44 (05) : 45 - 51
  • [7] On the Usage of Continual Learning for Out-of-Distribution Generalization in Pre-trained Language Models of Code
    Weyssow, Martin
    Zhou, Xin
    Kim, Kisub
    Lo, David
    Sahraoui, Houari
    [J]. PROCEEDINGS OF THE 31ST ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2023, 2023, : 1470 - 1482
  • [8] Leveraging clinical data across healthcare institutions for continual learning of predictive risk models
    Amrollahi, Fatemeh
    Shashikumar, Supreeth P.
    Holder, Andre L.
    Nemati, Shamim
    [J]. SCIENTIFIC REPORTS, 2022, 12 (01)
  • [9] Leveraging clinical data across healthcare institutions for continual learning of predictive risk models
    Fatemeh Amrollahi
    Supreeth P. Shashikumar
    Andre L. Holder
    Shamim Nemati
    [J]. Scientific Reports, 12
  • [10] Teaching and Learning Guide for: Emotional Intelligence: Towards a Consensus of Models and Measures
    Roberts, Richard D.
    MacCann, Carolyn
    Matthews, Gerald
    Zeidner, Moshe
    [J]. SOCIAL AND PERSONALITY PSYCHOLOGY COMPASS, 2010, 4 (10): : 968 - 981