D2A: A Dataset Built for AI-Based Vulnerability Detection Methods Using Differential Analysis

被引:79
|
作者
Zheng, Yunhui [1 ]
Pujar, Saurabh [1 ]
Lewis, Burn [1 ]
Buratti, Luca [1 ]
Epstein, Edward [1 ]
Yang, Bo [1 ]
Laredo, Jim [1 ]
Morari, Alessandro [1 ]
Su, Zhong [1 ]
机构
[1] IBM Res, Armonk, NY 10504 USA
关键词
dataset; vulnerability detection; auto-labeler; STATIC ANALYSIS;
D O I
10.1109/ICSE-SEIP52600.2021.00020
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Static analysis tools are widely used fur vulnerability detection as they understand programs with complex behavior and millions or lines of code. Despite their popularity, static analysis tools are known to generate an excess of false positives. The recent ability of Machine Learning models to understand programming languages opens new possibilities when applied to static analysis. However, existing datasets to train models fur vulnerability identification stiffer from multiple limitations such as limited bug context, limited size, and synthetic and unrealistic source code. We propose D2A, a differential analysis based approach to label issues reported by static analysis tools. The D2A dataset is built by analyzing version pairs from multiple open source projects. From each project, we select hug fixing commits and we run static analysis on the versions before and after such commits. If some issues detected in a before-commit version disappear in the corresponding after-commit version, they are very likely to be real bugs that got fixed by the commit. We use D2A to generate a large labeled dataset to train models for vulnerability identification. We show that the dataset can he used to build a classifier to identify possible false alarms among the issues reported by static analysis, hence helping developers prioritize and investigate potential true positives first.
引用
收藏
页码:111 / 120
页数:10
相关论文
共 50 条
  • [41] Chemical Detection Using Mobile Platforms and AI-Based Data Processing Technologies
    Noh, Daegwon
    Oh, Eunsoon
    JOURNAL OF SENSOR AND ACTUATOR NETWORKS, 2025, 14 (01)
  • [42] Improving the Robustness of AI-Based Malware Detection Using Adversarial Machine Learning
    Patil, Shruti
    Varadarajan, Vijayakumar
    Walimbe, Devika
    Gulechha, Siddharth
    Shenoy, Sushant
    Raina, Aditya
    Kotecha, Ketan
    ALGORITHMS, 2021, 14 (10)
  • [43] Bio-net dataset: AI-based diagnostic solutions using peripheral blood smear images
    Shams, Usman Ali
    Javed, Isma
    Fizan, Muhammad
    Shah, Aqib Raza
    Mustafa, Ghulam
    Zubair, Muhammad
    Massoud, Yehia
    Mehmood, Muhammad Qasim
    Naveed, Muhammad Asif
    BLOOD CELLS MOLECULES AND DISEASES, 2024, 105
  • [44] SXAD: Shapely eXplainable AI-Based Anomaly Detection Using Log Data
    Alam, Kashif
    Kifayat, Kashif
    Sampedro, Gabriel Avelino
    Karovic Jr, Vincent
    Naeem, Tariq
    IEEE ACCESS, 2024, 12 : 95659 - 95672
  • [45] AI-based carcinoma detection and classification using histopathological images: A systematic review
    Prabhu, Swathi
    Prasad, Keerthana
    Robels-Kelly, Antonio
    Lu, Xuequan
    COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 142
  • [46] An AI-based Approach for Accurate Fall Detection and Prediction using Wearable Sensors
    Sarwar, Muhammad Azeem
    Chea, Brandon
    Widjaja, Max
    Saadeh, Wala
    2024 IEEE 67TH INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, MWSCAS 2024, 2024, : 118 - 121
  • [47] Explainable AI-based feature importance analysis for ovarian cancer classification with ensemble methods
    Kodipalli, Ashwini
    Devi, V. Susheela
    Guruvare, Shyamala
    Ismail, Taha
    FRONTIERS IN PUBLIC HEALTH, 2025, 13
  • [49] AI-based improvement in lung cancer detection on chest radiographs: results of a multi-reader study in NLST dataset
    Hyunsuk Yoo
    Sang Hyup Lee
    Chiara Daniela Arru
    Ruhani Doda Khera
    Ramandeep Singh
    Sean Siebert
    Dohoon Kim
    Yuna Lee
    Ju Hyun Park
    Hye Joung Eom
    Subba R. Digumarthy
    Mannudeep K. Kalra
    European Radiology, 2021, 31 : 9664 - 9674
  • [50] 3D Cloud of Points AI-Based Analysis to Wheels Detection in the Free-Flow Tolling Context
    dos Santos Oliveira, Erick Lemmy
    Schitz da Rocha, Luiz Gustavo
    Rudek, Marcelo
    2024 IEEE 19TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS, ICIEA 2024, 2024,