Encoding Version History Context for Better Code Representation

被引:0
|
作者
Nguyen, Huy [1 ]
Treude, Christoph [2 ]
Thongtanunam, Patanamon [1 ]
机构
[1] Univ Melbourne, Melbourne, Vic, Australia
[2] Singapore Management Univ, Singapore, Singapore
基金
澳大利亚研究理事会;
关键词
Source code representation; additional context; version history;
D O I
10.1145/3643991.3644929
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the exponential growth of AI tools that generate source code, understanding software has become crucial. When developers comprehend a program, they may refer to additional contexts to look for information, e.g. program documentation or historical code versions. Therefore, we argue that encoding this additional contextual information could also benefit code representation for deep learning. Recent papers incorporate contextual data (e.g. call hierarchy) into vector representation to address program comprehension problems. This motivates further studies to explore additional contexts, such as version history, to enhance models' understanding of programs. That is, insights from version history enable recognition of patterns in code evolution over time, recurring issues, and the effectiveness of past solutions. Our paper presents preliminary evidence of the potential benefit of encoding contextual information from the version history to predict code clones and perform code classification. We experiment with two representative deep learning models, ASTNN and CodeBERT, to investigate whether combining additional contexts with different aggregations may benefit downstream activities. The experimental result affirms the positive impact of combining version history into source code representation in all scenarios; however, to ensure the technique performs consistently, we need to conduct a holistic investigation on a larger code base using different combinations of contexts, aggregation, and models. Therefore, we propose a research agenda aimed at exploring various aspects of encoding additional context to improve code representation and its optimal utilisation in specific situations.
引用
收藏
页码:631 / 636
页数:6
相关论文
共 50 条
  • [1] Encoding History with Context-aware Representation Learning for Personalized Search
    Zhou, Yujia
    Dou, Zhicheng
    Wen, Ji-Rong
    [J]. PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1111 - 1120
  • [2] ENCODING OF ARBITRARY CURVES BASED ON THE CHAIN CODE REPRESENTATION
    KANEKO, T
    OKUDAIRA, M
    [J]. IEEE TRANSACTIONS ON COMMUNICATIONS, 1985, 33 (07) : 697 - 707
  • [3] Representation as emergence:: Evoking and encoding past and history
    Eroess, Gabor
    [J]. SEMIOTICA, 2008, 170 (1-4) : 37 - 47
  • [4] Progressive encoding of voxel surfaces based on pattern code representation
    Roh, BQ
    Kim, CS
    Lee, SU
    [J]. IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, : 613 - 616
  • [5] Glanceable Code History: Visualizing Student Code for Better Instructor Feedback
    Cassidy, Caitlin
    Goldman, Max
    Miller, Robert C.
    [J]. PROCEEDINGS OF THE FIFTH ANNUAL ACM CONFERENCE ON LEARNING AT SCALE (L@S'18), 2018,
  • [6] Fields and objects for better representation of phenomena in their geographic context
    Ruas, Anne
    Pham, Ha
    Pinson, Laura
    [J]. REVUE INTERNATIONALE DE GEOMATIQUE, 2019, 29 (02): : 185 - 205
  • [7] Function Call Graph Context Encoding for Neural Source Code Summarization
    Bansal, Aakash
    Eberhart, Zachary
    Karas, Zachary
    Huang, Yu
    Mcmillan, Collin
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (09) : 4268 - 4281
  • [8] Code Complexity and Version History for Enhancing Hybrid Bug Localization
    Seyam, Ahmed Ali
    Hamdy, Abeer
    Farhan, Marwa Salah
    [J]. IEEE ACCESS, 2021, 9 : 61101 - 61113
  • [9] Context-Sensitive Code Completion Tool for Better API Usability
    Asaduzzaman, Muhammad
    Roy, Chanchal K.
    Schneider, Kevin A.
    Hou, Daqing
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME), 2014, : 621 - 624
  • [10] Version History Based Source Code Plagiarism Detection in Proprietary Systems
    Maskeri, Girish
    Karnam, Deepthi
    Viswanathan, Sree Aurovindh
    Padmanabhuni, Srinivas
    [J]. 2012 28TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE (ICSM), 2012, : 609 - 612