BRAN: Reduce Vulnerability Search Space in Large Open Source Repositories by Learning Bug Symptoms

被引:3
|
作者
Meng, Dongyu [1 ]
Guerriero, Michele [2 ]
Machiry, Aravind [3 ]
Aghakhani, Hojjat [1 ]
Bose, Priyanka [1 ]
Continella, Andrea [4 ]
Kruegel, Christopher [1 ]
Vigna, Giovanni [1 ]
机构
[1] UC Santa Barbara, Santa Barbara, CA 93106 USA
[2] Politecn Milan, Milan, Italy
[3] Purdue Univ, W Lafayette, IN 47907 USA
[4] Univ Twente, Enschede, Netherlands
关键词
Static Analysis; Vulnerabilities; Machine Learning; CODE CHURN; SOFTWARE; METRICS; COMPLEXITY; ACCURATE;
D O I
10.1145/3433210.3453115
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Software is continually increasing in size and complexity, and therefore, vulnerability discovery would benefit from techniques that identify potentially vulnerable regions within large code bases, as this allows for easing vulnerability detection by reducing the search space. Previous work has explored the use of conventional code-quality and complexity metrics in highlighting suspicious sections of (source) code. Recently, researchers also proposed to reduce the vulnerability search space by studying code properties with neural networks. However, previous work generally failed in leveraging the rich metadata that is available for long-running, large code repositories. In this paper, we present an approach, named Bran, to reduce the vulnerability search space by combining conventional code metrics with fine-grained repository metadata. Bran locates code sections that are more likely to contain vulnerabilities in large code bases, potentially improving the efficiency of both manual and automatic code audits. In our experiments on four large code bases, Bran successfully highlights potentially vulnerable functions, outperforming several baselines, including state-of-art vulnerability prediction tools. We also assess Bran's effectiveness in assisting automated testing tools. We use Bran to guide syzkaller, a known kernel fuzzer, in fuzzing a recent version of the Linux kernel. The guided fuzzer identifies 26 bugs (10 are zero-day flaws), including arbitrary writes and reads.
引用
收藏
页码:731 / 743
页数:13
相关论文
共 37 条
  • [31] OpenPARF: An Open-source Placement and Routing Framework for Large-scale Heterogeneous FPGAs with Deep Learning Toolkit
    Mai J.
    Wang J.
    Di Z.
    Lin Y.
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2023, 45 (09): : 3118 - 3131
  • [32] TractoInferno - A large-scale, open-source, multi-site database for machine learning dMRI tractography
    Philippe Poulin
    Guillaume Theaud
    Francois Rheault
    Etienne St-Onge
    Arnaud Bore
    Emmanuelle Renauld
    Louis de Beaumont
    Samuel Guay
    Pierre-Marc Jodoin
    Maxime Descoteaux
    Scientific Data, 9
  • [33] Information based explanation methods for deep learning agents-with applications on large open-source chess models
    Hammersborg, Patrik
    Strumke, Inga
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [34] DREAMPlaceFPGA: An Open-Source Analytical Placer for Large Scale Heterogeneous FPGAs using Deep-Learning Toolkit
    Rajarathnam, Rachel Selina
    Alawieh, Mohamed Baker
    Jiang, Zixuan
    Iyer, Mahesh
    Pan, David Z.
    27TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC 2022, 2022, : 300 - 306
  • [35] TractoInferno-A large-scale, open-source, multi-site database for machine learning dMRI tractography
    Poulin, Philippe
    Theaud, Guillaume
    Rheault, Francois
    St-Onge, Etienne
    Bore, Arnaud
    Renauld, Emmanuelle
    de Beaumont, Louis
    Guay, Samuel
    Jodoin, Pierre-Marc
    Descoteaux, Maxime
    SCIENTIFIC DATA, 2022, 9 (01)
  • [36] Strategies for supporting European schools to evolve into open and committed learning communities Initial suggestions from the Open Discovery Space project large-scale implementation
    Chelioti, Eleni-Maria
    2014 14TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED LEARNING TECHNOLOGIES (ICALT), 2014, : 749 - 753
  • [37] RF-based drone detection and identification using deep learning approaches: An initiative towards a large open source drone database
    Al-Sa'd, Mohammad F.
    Al-Ali, Abdulla
    Mohamed, Amr
    Khattab, Tamer
    Erbad, Aiman
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 100 : 86 - 97