Software Vulnerability Discovery via Learning Multi-Domain Knowledge Bases

被引:77
|
作者
Lin, Guanjun [1 ]
Zhang, Jun [1 ]
Luo, Wei [2 ]
Pan, Lei [2 ]
De Vel, Olivier [3 ]
Montague, Paul [3 ]
Xiang, Yang [1 ]
机构
[1] Swinburne Univ Technol, Sch Software & Elect Engn, Melbourne, Vic 3122, Australia
[2] Deakin Univ, Sch Informat Technol, Geelong, Vic 3216, Australia
[3] Def Sci & Technol Grp DSTG, Dept Def, Canberra, ACT 2610, Australia
关键词
Software; Feature extraction; Deep learning; Feeds; Task analysis; Neural networks; Data mining; Vulnerability discovery; representation learning; deep learning;
D O I
10.1109/TDSC.2019.2954088
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Machine learning (ML) has great potential in automated code vulnerability discovery. However, automated discovery application driven by off-the-shelf machine learning tools often performs poorly due to the shortage of high-quality training data. The scarceness of vulnerability data is almost always a problem for any developing software project during its early stages, which is referred to as the cold-start problem. This article proposes a framework that utilizes transferable knowledge from pre-existing data sources. In order to improve the detection performance, multiple vulnerability-relevant data sources were selected to form a broader base for learning transferable knowledge. The selected vulnerability-relevant data sources are cross-domain, including historical vulnerability data from different software projects and data from the Software Assurance Reference Database (SARD) consisting of synthetic vulnerability examples and proof-of-concept test cases. To extract the information applicable in vulnerability detection from the cross-domain data sets, we designed a deep-learning-based framework with Long-short Term Memory (LSTM) cells. Our framework combines the heterogeneous data sources to learn unified representations of the patterns of the vulnerable source codes. Empirical studies showed that the unified representations generated by the proposed deep learning networks are feasible and effective, and are transferable for real-world vulnerability detection. Our experiments demonstrated that by leveraging two heterogeneous data sources, the performance of our vulnerability detection outperformed the static vulnerability discovery tool Flawfinder. The findings of this article may stimulate further research in ML-based vulnerability detection using heterogeneous data sources.
引用
收藏
页码:2469 / 2485
页数:17
相关论文
共 50 条
  • [41] Multi-domain Software Defined Networking: Research status and challenges
    Wibowo, Franciscus X. A.
    Gregory, Mark A.
    Ahmed, Khandakar
    Gomez, Karina M.
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2017, 87 : 32 - 45
  • [42] Domain Knowledge Discovery Guided by Software Trace Links
    Guo, Jin L. C.
    Monaikul, Natawut
    Cleland-Huang, Jane
    2018 5TH INTERNATIONAL WORKSHOP ON ARTIFICIAL INTELLIGENCE FOR REQUIREMENTS ENGINEERING (AIRE 2018), 2018, : 1 - 7
  • [43] Multi-Domain Recommendation to Attract Users via Domain Preference Modeling
    Ju, Hyunjun
    Kang, SeongKu
    Lee, Dongha
    Hwang, Junyoung
    Jang, Sanghwan
    Yu, Hwanjo
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 8, 2024, : 8582 - 8590
  • [44] Broadening microwave absorption via a multi-domain structure
    Liu, Zhengwang
    Che, Renchao
    Wei, Yong
    Liu, Yupu
    Elzatahry, Ahmed A.
    Dahyan, Daifallah Al.
    Zhao, Dongyuan
    APL MATERIALS, 2017, 5 (04):
  • [45] Multi-domain Causal Structure Learning in Linear Systems
    Ghassami, AmirEmad
    Kiyavash, Negar
    Huang, Biwei
    Zhang, Kun
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [46] Utilizing online content as domain knowledge in a multi-domain dynamic dialogue system
    Wootton, Craig
    McTear, Michael
    Anderson, Terry
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 693 - 696
  • [47] Cognitive Heterogeneous Multi-Domain Networks with Hierarchical Learning
    Ben Yoo, S. J.
    2018 IEEE PHOTONICS SOCIETY SUMMER TOPICAL MEETING SERIES (SUM), 2018,
  • [48] Budget-Aware Adapters for Multi-Domain Learning
    Berriel, Rodrigo
    Lathuiliere, Stephane
    Nabi, Moin
    Klein, Tassilo
    Oliveira-Santos, Thiago
    Sebe, Nicu
    Ricci, Elisa
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 382 - 391
  • [49] Learning a Pricing Strategy in Multi-Domain DWDM Networks
    Gurzi, Pasquale
    Steenhaut, Kris
    Nowe, Ann
    Vrancx, Peter
    2011 18TH IEEE WORKSHOP ON LOCAL AND METROPOLITAN AREA NETWORKS (LANMAN), 2011,
  • [50] Exploiting data diversity in multi-domain federated learning
    Madni, Hussain Ahmad
    Umer, Rao Muhammad
    Foresti, Gian Luca
    MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2024, 5 (02):