Software Vulnerability Discovery via Learning Multi-Domain Knowledge Bases

被引:77
|
作者
Lin, Guanjun [1 ]
Zhang, Jun [1 ]
Luo, Wei [2 ]
Pan, Lei [2 ]
De Vel, Olivier [3 ]
Montague, Paul [3 ]
Xiang, Yang [1 ]
机构
[1] Swinburne Univ Technol, Sch Software & Elect Engn, Melbourne, Vic 3122, Australia
[2] Deakin Univ, Sch Informat Technol, Geelong, Vic 3216, Australia
[3] Def Sci & Technol Grp DSTG, Dept Def, Canberra, ACT 2610, Australia
关键词
Software; Feature extraction; Deep learning; Feeds; Task analysis; Neural networks; Data mining; Vulnerability discovery; representation learning; deep learning;
D O I
10.1109/TDSC.2019.2954088
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Machine learning (ML) has great potential in automated code vulnerability discovery. However, automated discovery application driven by off-the-shelf machine learning tools often performs poorly due to the shortage of high-quality training data. The scarceness of vulnerability data is almost always a problem for any developing software project during its early stages, which is referred to as the cold-start problem. This article proposes a framework that utilizes transferable knowledge from pre-existing data sources. In order to improve the detection performance, multiple vulnerability-relevant data sources were selected to form a broader base for learning transferable knowledge. The selected vulnerability-relevant data sources are cross-domain, including historical vulnerability data from different software projects and data from the Software Assurance Reference Database (SARD) consisting of synthetic vulnerability examples and proof-of-concept test cases. To extract the information applicable in vulnerability detection from the cross-domain data sets, we designed a deep-learning-based framework with Long-short Term Memory (LSTM) cells. Our framework combines the heterogeneous data sources to learn unified representations of the patterns of the vulnerable source codes. Empirical studies showed that the unified representations generated by the proposed deep learning networks are feasible and effective, and are transferable for real-world vulnerability detection. Our experiments demonstrated that by leveraging two heterogeneous data sources, the performance of our vulnerability detection outperformed the static vulnerability discovery tool Flawfinder. The findings of this article may stimulate further research in ML-based vulnerability detection using heterogeneous data sources.
引用
收藏
页码:2469 / 2485
页数:17
相关论文
共 50 条
  • [31] Unsupervised multi-domain image translation with domain representation learning
    Liu, Huajun
    Chen, Lei
    Sui, Haigang
    Zhu, Qing
    Lei, Dian
    Liu, Shubo
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2021, 99
  • [32] Cross-domain Face Presentation Attack Detection via Multi-domain Disentangled Representation Learning
    Wang, Guoqing
    Han, Hu
    Shan, Shiguang
    Chen, Xilin
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 6677 - 6686
  • [33] A MULTI-DOMAIN KNOWLEDGE TRANSFER METHOD FOR CONCEPTUAL DESIGN COMBINE WITH FBS AND KNOWLEDGE GRAPHA MULTI-DOMAIN KNOWLEDGE TRANSFER METHOD FOR CONCEPTUAL DESIGN COMBINE WITH FBS AND KNOWLEDGE GRAPH
    Lai, Bing
    Zhao, Wu
    Yu, Zeyuan
    Guo, Xin
    Zhang, Kai
    PROCEEDINGS OF ASME 2022 INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, IDETC-CIE2022, VOL 2, 2022,
  • [34] Multi-Domain Multi-Task Rehearsal for Lifelong Learning
    Lyu, Fan
    Wang, Shuai
    Feng, Wei
    Ye, Zihan
    Hu, Fuyuan
    Wang, Song
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8819 - 8827
  • [35] Scalarization for Multi-Task and Multi-Domain Learning at Scale
    Royer, Amelie
    Blankevoort, Tijmen
    Bejnordi, Babak Ehteshami
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [36] Updating Guaranteed Bandwidth in Multi-Domain Software Defined Networks
    Wibowo, Franciscus X. A.
    Gregory, Mark A.
    2017 27TH INTERNATIONAL TELECOMMUNICATION NETWORKS AND APPLICATIONS CONFERENCE (ITNAC), 2017, : 27 - 32
  • [37] Multi-Task and Multi-Domain Learning with Tensor Networks
    Garg, Yash
    Prater-Bennette, Ashley
    Asif, M. Salman
    SIGNAL PROCESSING, SENSOR/INFORMATION FUSION, AND TARGET RECOGNITION XXXII, 2023, 12547
  • [38] Distributed Traffic Engineering for Multi-Domain Software Defined Networks
    Zhao, Laiping
    Hua, Jingyu
    Liu, Yangyang
    Qu, Wenyu
    Zhang, Suohao
    Zhong, Sheng
    2019 39TH IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2019), 2019, : 492 - 502
  • [39] A Multi-Domain Software Architecture for Safe and Secure Autonomous Driving
    Belluardo, Luca
    Stevanato, Andrea
    Casini, Daniel
    Cicero, Giorgiomaria
    Biondi, Alessandro
    Buttazzo, Giorgio
    2021 IEEE 27TH INTERNATIONAL CONFERENCE ON EMBEDDED AND REAL-TIME COMPUTING SYSTEMS AND APPLICATIONS (RTCSA 2021), 2021, : 73 - 82
  • [40] Study about software interface of modeling and simulation in multi-domain
    State Key Laboratory for Armament Launch Theory and Technology, Second Artillery Engineering Institute, Xi'an 710025, China
    不详
    Xitong Fangzhen Xuebao, 2006, SUPPL. 2 (203-206):