Exploring Security Commits in Python']Python

被引:2
|
作者
Sun, Shiyu [1 ]
Wang, Shu [1 ]
Wang, Xinda [1 ]
Xing, Yunlong [1 ]
Zhang, Elisa [2 ]
Sun, Kun [1 ]
机构
[1] George Mason Univ, Fairfax, VA 22030 USA
[2] Dougherty Valley High Sch, San Ramon, CA 94582 USA
关键词
Security Commit; !text type='Python']Python[!/text; Dataset Construction; Code Property Graph; Graph Learning; Vulnerability Fixes; VULNERABILITIES;
D O I
10.1109/ICSME58846.2023.00027
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Python has become the most popular programming language as it is friendly to work with for beginners. However, a recent study has found that most security issues in Python have not been indexed by CVE and may only be fixed by "silent" security commits, which pose a threat to software security and hinder the security fixes to downstream software. It is critical to identify the hidden security commits; however, the existing datasets and methods are insufficient for security commit detection in Python, due to the limited data variety, non-comprehensive code semantics, and uninterpretable learned features. In this paper, we construct the first security commit dataset in Python, namely PySecDB, which consists of three subsets including a base dataset, a pilot dataset, and an augmented dataset. The base dataset contains the security commits associated with CVE records provided by MITRE. To increase the variety of security commits, we build the pilot dataset from GitHub by filtering keywords within the commit messages. Since not all commits provide commit messages, we further construct the augmented dataset by understanding the semantics of code changes. To build the augmented dataset, we propose a new graph representation named CommitCPG and a multi-attributed graph learning model named SCOPY to identify the security commit candidates through both sequential and structural code semantics. The evaluation shows our proposed algorithms can improve the data collection efficiency by up to 40 percentage points. After manual verification by three security experts, PySecDB consists of 1,258 security commits and 2,791 non-security commits. Furthermore, we conduct an extensive case study on PySecDB and discover four common security fix patterns that cover over 85% of security commits in Python, providing insight into secure software maintenance, vulnerability detection, and automated program repair.
引用
收藏
页码:171 / 181
页数:11
相关论文
共 50 条
  • [1] Exploring the Security Awareness of the Python']Python and Java']JavaScript Open Source Communities
    Antal, Gabor
    Keleti, Marton
    Hegedus, Peter
    [J]. 2020 IEEE/ACM 17TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2020, : 16 - 20
  • [2] BGPy: The BGP Python']Python Security Simulator
    Furuness, Justin
    Morris, Cameron
    Morillo, Reynaldo
    Herzberg, Amir
    Wang, Bing
    [J]. PROCEEDINGS OF 16TH CYBER SECURITY EXPERIMENTATION AND TEST WORKSHOP, CSET 2023, 2023, : 41 - 56
  • [3] Empirical Analysis of Security Vulnerabilities in Python']Python Packages
    Alfadel, Mahmoud
    Costa, Diego Elias
    Shihab, Emad
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2021), 2021, : 446 - 457
  • [4] A Method to Enhance the Security Capability of Python']Python IDE
    Vinh Pham
    Kim, Namuk
    Seo, Eunil
    Ha, Jun Suk
    Chung, Tai-Myoung
    [J]. FUTURE DATA AND SECURITY ENGINEERING (FDSE 2019), 2019, 11814 : 399 - 410
  • [5] Share, But Be Aware: Security Smells in Python']Python Gists
    Rahman, Md Rayhanur
    Rahman, Akond
    Williams, Laurie
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2019), 2019, : 536 - 540
  • [6] On the Security of Python']Python Virtual Machines: An Empirical Study
    Lin, Xinrong
    Hua, Baojian
    Fan, Qiliang
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2022), 2022, : 223 - 234
  • [7] Empirical analysis of security vulnerabilities in Python']Python packages
    Alfadel, Mahmoud
    Costa, Diego Elias
    Shihab, Emad
    [J]. EMPIRICAL SOFTWARE ENGINEERING, 2023, 28 (03)
  • [8] Exploring Decision-Making Processes in Python']Python
    Keertipati, Smitha
    Licorish, Sherlock A.
    Savarimuthu, Bastin Tony Roy
    [J]. PROCEEDINGS OF THE 20TH INTERNATIONAL CONFERENCE ON EVALUATION AND ASSESSMENT IN SOFTWARE ENGINEERING 2016 (EASE '16), 2016,
  • [9] Development of Course Modules in Python']Python for Hardware Security Education
    Olney, Brooks
    Amador, Mateus Augusto Fernandes
    Karam, Robert
    [J]. SOUTHEASTCON 2023, 2023, : 912 - 919
  • [10] Exploring the Architectural Impact of Possible Dependencies in Python']Python Software
    Jin, Wuxia
    Cai, Yuanfang
    Kazman, Rick
    Zhang, Gang
    Zheng, Qinghua
    Liu, Ting
    [J]. 2020 35TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE 2020), 2020, : 758 - 770