Toward Large-Scale Vulnerability Discovery using Machine Learning

被引:146
|
作者
Grieco, Gustavo [1 ]
Grinblat, Guillermo Luis [1 ]
Uzal, Lucas [1 ]
Rawat, Sanjay [2 ,4 ]
Feist, Josselin [3 ]
Mounier, Laurent [3 ]
机构
[1] CIFASIS CONICET, Rosario, Santa Fe, Argentina
[2] Vrije Univ Amsterdam, Syst Secur Grp, Amsterdam, Netherlands
[3] Univ Grenoble Alps, VERIMAG, Grenoble, France
[4] IIIT Hyderabad, Hyderabad, Telangana, India
关键词
D O I
10.1145/2857705.2857720
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With sustained growth of software complexity, finding security vulnerabilities in operating systems has become an important necessity. Nowadays, OS are shipped with thousands of binary executables. Unfortunately, methodologies and tools for an OS scale program testing within a limited time budget are still missing. In this paper we present an approach that uses lightweight static and dynamic features to predict if a test case is likely to contain a software vulnerability using machine learning techniques. To show the effectiveness of our approach, we set up a large experiment to detect easily exploitable memory corruptions using 1039 Debian programs obtained from its bug tracker, collected 138,308 unique execution traces and statically explored 76,083 different subsequences of function calls. We managed to predict with reasonable accuracy which programs contained dangerous memory corruptions. We also developed and implemented VDiscovER, a tool that uses state-of-the-art Machine Learning techniques to predict vulnerabilities in test cases. Such tool will be released as open-source to encourage the research of vulnerability discovery at a large scale, together with VDISCOVERY, a public dataset that collects raw analyzed data.
引用
收藏
页码:85 / 96
页数:12
相关论文
共 50 条
  • [31] Quick extreme learning machine for large-scale classification
    Audi Albtoush
    Manuel Fernández-Delgado
    Eva Cernadas
    Senén Barro
    Neural Computing and Applications, 2022, 34 : 5923 - 5938
  • [32] Measuring human perceptions of a large-scale urban region using machine learning
    Zhang, Fan
    Zhou, Bolei
    Liu, Liu
    Liu, Yu
    Fung, Helene H.
    Lin, Hui
    Ratti, Carlo
    LANDSCAPE AND URBAN PLANNING, 2018, 180 : 148 - 160
  • [33] Humanization of antibodies using a machine learning approach on large-scale repertoire data
    Marks, Claire
    Hummer, Alissa M.
    Chin, Mark
    Deane, Charlotte M.
    BIOINFORMATICS, 2021, 37 (22) : 4041 - 4047
  • [34] Large-scale data mining using genetics-based machine learning
    Bacardit, Jaume
    Llora, Xavier
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2013, 3 (01) : 37 - 61
  • [35] Machine learning for large-scale crop yield forecasting
    Paudel, Dilli
    Boogaard, Hendrik
    de Wit, Allard
    Janssen, Sander
    Osinga, Sjoukje
    Pylianidis, Christos
    Athanasiadis, Ioannis N.
    AGRICULTURAL SYSTEMS, 2021, 187
  • [36] Compressed linear algebra for large-scale machine learning
    Ahmed Elgohary
    Matthias Boehm
    Peter J. Haas
    Frederick R. Reiss
    Berthold Reinwald
    The VLDB Journal, 2018, 27 : 719 - 744
  • [37] A review of Nystrom methods for large-scale machine learning
    Sun, Shiliang
    Zhao, Jing
    Zhu, Jiang
    INFORMATION FUSION, 2015, 26 : 36 - 48
  • [38] Introduction to Special Issue on Large-Scale Machine Learning
    Hsu, Chun-Nan
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
  • [39] Large-scale machine learning for metagenomics sequence classification
    Vervier, Kevin
    Mahe, Pierre
    Tournoud, Maud
    Veyrieras, Jean-Baptiste
    Vert, Jean-Philippe
    BIOINFORMATICS, 2016, 32 (07) : 1023 - 1032
  • [40] Large-Scale Strategic Games and Adversarial Machine Learning
    Alpcan, Tansu
    Rubinstein, Benjamin I. P.
    Leckie, Christopher
    2016 IEEE 55TH CONFERENCE ON DECISION AND CONTROL (CDC), 2016, : 4420 - 4426