Python']Python source code vulnerability detection with named entity recognition

被引:3
|
作者
Ehrenberg, Melanie [1 ]
Sarkani, Shahram [1 ]
Mazzuchi, Thomas A. [1 ]
机构
[1] George Washington Univ, Sch Engn & Appl Sci, Washington, DC 20052 USA
关键词
Vulnerability detection; Natural language processing; Machine learning; Named entity recognition; Transformer; !text type='Python']Python[!/text; BERT; Programming language; Common weakness enumeration; CWE;
D O I
10.1016/j.cose.2024.103802
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Vulnerabilities within source code have grown over the last 20 years to become a common threat to systems and networks. As the implementation of open -source software continues to develop, more unknown vulnerabilities will exist throughout system networks. This research proposes an enhanced vulnerability detection method specific to Python source code that utilizes pre-trained, BERT -based transformer models to apply tokenization, embedding, and named entity recognition (a natural language processing technique). The use of named entity recognition not only allows for the detection of potential vulnerabilities, but also for the classification of different vulnerability types. This research uses the publicly available CodeBERT, RoBERTa, and DistilBERT models to fine -tune for the downstream task of token classification for six different common weakness enumeration specifications. The results achieved in this research outperform previous Python-based vulnerability detection methods and demonstrate the effectiveness of applying named entity recognition to enhance the overall research into Python source code vulnerabilities.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Machine Learning Techniques For Python']Python Source Code Vulnerability Detection
    Farasat, Talaya
    Posegga, Joachim
    [J]. PROCEEDINGS OF THE FOURTEENTH ACM CONFERENCE ON DATA AND APPLICATION SECURITY AND PRIVACY, CODASPY 2024, 2024, : 151 - 153
  • [2] pyMeSHSim: an integrative python']python package for biomedical named entity recognition, normalization, and comparison of MeSH terms
    Luo, Zhi-Hui
    Shi, Meng-Wei
    Yang, Zhuang
    Zhang, Hong-Yu
    Chen, Zhen-Xia
    [J]. BMC BIOINFORMATICS, 2020, 21 (01)
  • [3] An extensive study of the effects of different deep learning models on code vulnerability detection in Python']Python code
    Wang, Rongcun
    Xu, Senlei
    Ji, Xingyu
    Tian, Yuan
    Gong, Lina
    Wang, Ke
    [J]. AUTOMATED SOFTWARE ENGINEERING, 2024, 31 (01)
  • [4] CluEval: A Python']Python tool for evaluating clustering performance in named entity disambiguation
    Kim, Jinseok
    Kim, Jenna
    [J]. SOFTWARE IMPACTS, 2023, 16
  • [5] T-NER: An All-Round Python']Python Library for Transformer-based Named Entity Recognition
    Ushio, Asahi
    Camacho-Collados, Jose
    [J]. EACL 2021: THE 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: PROCEEDINGS OF THE SYSTEM DEMONSTRATIONS, 2021, : 53 - 62
  • [6] Web Vulnerability Detection Analyzer Based on Python']Python
    Xu, Dawei
    Chen, Tianxin
    Tan, Zhonghua
    Wu, Fudong
    Gao, Jiaqi
    Yang, Yunfan
    [J]. INTERNATIONAL JOURNAL OF DIGITAL CRIME AND FORENSICS, 2022, 14 (02)
  • [7] Emotion recognition and drowsiness detection using Python']Python
    Uppal, Anmol
    Tyagi, Shweta
    Kumar, Rishi
    Sharma, Seema
    [J]. 2019 9TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING (CONFLUENCE 2019), 2019, : 464 - 469
  • [8] Chaos to Clarity with Semantic Inferencing for Python']Python Source Code Snippets
    Stein, Aviel
    Mancoridis, Spiros
    [J]. 2023 IEEE 17TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, ICSC, 2023, : 161 - 166
  • [9] Validation of an open source Natural Language Processing (NLP) and an in-house developed python']python script for named entity recognition from radiology reports of lung carcinoma cases
    Mithun, S.
    Jha, A. K.
    Sherkhane, U. B.
    Jaiswar, V.
    Prasad, R. V.
    Ortiz, C. M.
    Puts, S.
    Rangarajan, V.
    Dekker, A.
    Wee, L.
    [J]. EUROPEAN JOURNAL OF NUCLEAR MEDICINE AND MOLECULAR IMAGING, 2019, 46 (SUPPL 1) : S761 - S762
  • [10] pyMeSHSim: an integrative python package for biomedical named entity recognition, normalization, and comparison of MeSH terms
    Zhi-Hui Luo
    Meng-Wei Shi
    Zhuang Yang
    Hong-Yu Zhang
    Zhen-Xia Chen
    [J]. BMC Bioinformatics, 21