NATURALCC: An Open-Source Toolkit for Code Intelligence

被引:0
|
作者
Wan, Yao [1 ]
He, Yang [2 ]
Bi, Zhangqian [1 ]
Zhang, Jianguo [3 ]
Sui, Yulei [2 ]
Zhang, Hongyu [4 ]
Hashimoto, Kazuma [5 ]
Jin, Hai [1 ]
Xu, Guandong [2 ]
Xiong, Caiming [6 ]
Yu, Philip S. [3 ]
机构
[1] Huazhong Univ Sci & Technol, Natl Engn Res Ctr Big Data Technol & Syst, Cluster & Grid Comp Lab, Serv Comp Technol & Syst Lab,Sch Comp Sci & Techn, Wuhan, Peoples R China
[2] Univ Technol Sydney, Sydney, NSW, Australia
[3] Univ Illinois, Chicago, IL 60680 USA
[4] Univ Newcastle, Callaghan, NSW, Australia
[5] Google Res, Mountain View, CA USA
[6] Salesforce Res, Palo Alto, CA USA
基金
中国国家自然科学基金;
关键词
Code intelligence; deep learning; code representation; code embedding; open source; toolkit; benchmark;
D O I
10.1145/3510454.3516863
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We present NATURALCC, an efficient and extensible open-source toolkit for machine-learning-based source code analysis (i.e., code intelligence). Using NATURALCC, researchers can conduct rapid prototyping, reproduce state-of-the-art models, and/or exercise their own algorithms. NATURALCC is built upon Fairseq and PyTorch, providing (1) a collection of code corpus with preprocessing scripts, (2) a modular and extensible framework that makes it easy to reproduce and implement a code intelligence model, and (3) a benchmark of state-of-the-art models. Furthermore, we demonstrate the usability of our toolkit over a variety of tasks (e.g., code summarization, code retrieval, and code completion) through a graphical user interface. The website of this project is http://xcodemind.github.io, where the source code and demonstration video can be found.
引用
收藏
页码:149 / 153
页数:5
相关论文
共 50 条
  • [1] An open-source toolkit for mining Wikipedia
    Milne, David
    Witten, Ian H.
    [J]. ARTIFICIAL INTELLIGENCE, 2013, 194 : 222 - 239
  • [2] GRHydro: a new open-source general-relativistic magnetohydrodynamics code for the Einstein toolkit
    Moesta, Philipp
    Mundim, Bruno C.
    Faber, Joshua A.
    Haas, Roland
    Noble, Scott C.
    Bode, Tanja
    Loeffler, Frank
    Ott, Christian D.
    Reisswig, Christian
    Schnetter, Erik
    [J]. CLASSICAL AND QUANTUM GRAVITY, 2014, 31 (01)
  • [3] PyCP: An Open-Source Conformal Predictions Toolkit
    Balasubramanian, Vineeth N.
    Baker, Aaron
    Yanez, Matthew
    Chakraborty, Shayok
    Panchanathan, Sethuraman
    [J]. ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2013, 2013, 412 : 361 - 370
  • [4] THE BAVIECA OPEN-SOURCE SPEECH RECOGNITION TOOLKIT
    Bolanos, Daniel
    [J]. 2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 354 - 359
  • [5] Flame simulations with an open-source code
    Dasgupta, Adhiraj
    Gonzalez-Juez, Esteban
    Haworth, Daniel C.
    [J]. COMPUTER PHYSICS COMMUNICATIONS, 2019, 237 : 219 - 229
  • [6] ONIX: An open-source depletion code
    de Lanversin, J. de Troullioud
    Kuett, M.
    Glaser, A.
    [J]. ANNALS OF NUCLEAR ENERGY, 2021, 151
  • [7] Open-source intelligence and privacy by design
    Koops, Bert-Jaap
    Hoepman, Jaap-Henk
    Leenes, Ronald
    [J]. COMPUTER LAW & SECURITY REVIEW, 2013, 29 (06) : 676 - 688
  • [8] Open-source intelligence for risk assessment
    Hayes, Darren R.
    Cappa, Francesco
    [J]. BUSINESS HORIZONS, 2018, 61 (05) : 689 - 697
  • [9] Open-source intelligence for conservation biology
    Katzner, Todd
    Thomason, Eve
    Huhmann, Karrin
    Conkling, Tara
    Concepcion, Camille
    Slabe, Vince
    Poessel, Sharon
    [J]. CONSERVATION BIOLOGY, 2022, 36 (06)
  • [10] GDP: an open-source GNSS data preprocessing toolkit
    Chen, Zhengsheng
    Cui, Yang
    Li, Linyang
    Zhang, Qinghua
    Lu, Zhiping
    Li, Xuerui
    Kuang, Yingcai
    Yang, Kaichun
    Rong, Fengjuan
    [J]. GPS SOLUTIONS, 2020, 24 (03)