Toward Large-Scale Vulnerability Discovery using Machine Learning

被引：146

作者：

Grieco, Gustavo ^{[1
]}

Grinblat, Guillermo Luis ^{[1
]}

Uzal, Lucas ^{[1
]}

Rawat, Sanjay ^{[2
,4
]}

Feist, Josselin ^{[3
]}

Mounier, Laurent ^{[3
]}

机构：

[1] CIFASIS CONICET, Rosario, Santa Fe, Argentina

[2] Vrije Univ Amsterdam, Syst Secur Grp, Amsterdam, Netherlands

[3] Univ Grenoble Alps, VERIMAG, Grenoble, France

[4] IIIT Hyderabad, Hyderabad, Telangana, India

来源：

CODASPY'16: PROCEEDINGS OF THE SIXTH ACM CONFERENCE ON DATA AND APPLICATION SECURITY AND PRIVACY | 2016年

关键词：

D O I：

10.1145/2857705.2857720

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With sustained growth of software complexity, finding security vulnerabilities in operating systems has become an important necessity. Nowadays, OS are shipped with thousands of binary executables. Unfortunately, methodologies and tools for an OS scale program testing within a limited time budget are still missing. In this paper we present an approach that uses lightweight static and dynamic features to predict if a test case is likely to contain a software vulnerability using machine learning techniques. To show the effectiveness of our approach, we set up a large experiment to detect easily exploitable memory corruptions using 1039 Debian programs obtained from its bug tracker, collected 138,308 unique execution traces and statically explored 76,083 different subsequences of function calls. We managed to predict with reasonable accuracy which programs contained dangerous memory corruptions. We also developed and implemented VDiscovER, a tool that uses state-of-the-art Machine Learning techniques to predict vulnerabilities in test cases. Such tool will be released as open-source to encourage the research of vulnerability discovery at a large scale, together with VDISCOVERY, a public dataset that collects raw analyzed data.

引用

页码：85 / 96

页数：12

共 50 条

[21] Chimera: Large-Scale Classification using Machine Learning, Rules, and Crowdsourcing
Sun, Chong
Rampalli, Narasimhan
Yang, Frank
Doan, Anhai
PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 7 (13): : 1529 - 1540
[22] Deep learning large-scale drug discovery and repurposing
Yu, Min
Li, Weiming
Yu, Yunru
Zhao, Yu
Xiao, Lizhi
Lauschke, Volker M.
Cheng, Yiyu
Zhang, Xingcai
Wang, Yi
NATURE COMPUTATIONAL SCIENCE, 2024, 4 (08): : 600 - 614
[23] Open Challenges in Developing Generalizable Large-Scale Machine-Learning Models for Catalyst Discovery
Kolluru, Adeesh
Shuaibi, Muhammed
Palizhati, Aini
Shoghi, Nima
Das, Abhishek
Wood, Brandon
Zitnick, C. Lawrence
Kitchin, John R.
Ulissi, Zachary W.
ACS CATALYSIS, 2022, 12 (14): : 8572 - 8581
[24] Efficient Large-Scale Machine Learning Techniques for Rapid Motif Discovery in Energy Data Streams
Lykothanasi, K. K.
Sioutas, S.
Tsichlas, K.
ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2022, PART I, 2022, 646 : 331 - 342
[25] Toward a Large-Scale Characterization of the Learning Chain Reaction
Samsonovich, Alexei V.
COGNITION IN FLUX, 2010, : 2308 - 2313
[26] Ageing Analysis of Embedded SRAM on a Large-Scale Testbed Using Machine Learning
Lanzieri, Leandro
Kietzmann, Peter
Fey, Goerschwin
Schlarb, Holger
Schmidt, Thomas C.
2023 26TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN, DSD 2023, 2023, : 335 - 342
[27] A GENERIC TRUST FRAMEWORK FOR LARGE-SCALE OPEN SYSTEMS USING MACHINE LEARNING
Liu, Xin
Tredan, Gilles
Datta, Anwitaman
COMPUTATIONAL INTELLIGENCE, 2014, 30 (04) : 700 - 721
[28] Large-Scale Machine Learning for Business Sector Prediction
Angenent, Mitch N.
Barata, Antonio Pereira
Takes, Frank W.
PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20), 2020, : 1143 - 1146
[29] Compressed Linear Algebra for Large-Scale Machine Learning
Elgohary, Ahmed
Boehm, Matthias
Haas, Peter J.
Reiss, Frederick R.
Reinwald, Berthold
PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (12): : 960 - 971
[30] Angel: a new large-scale machine learning system
Jiang, Jie
Yu, Lele
Jiang, Jiawei
Liu, Yuhong
Cui, Bin
NATIONAL SCIENCE REVIEW, 2018, 5 (02) : 216 - 236

← 1 2 3 4 5 →