A Lightweight Cross-Version Binary Code Similarity Detection Based on Similarity and Correlation Coefficient Features

被引:8
|
作者
Guo, Hui [1 ]
Huang, Shuguang [1 ]
Huang, Cheng [2 ]
Zhang, Min [1 ]
Pan, Zulie [1 ]
Shi, Fan [1 ]
Huang, Hui [1 ]
Hu, Donghui [3 ]
Wang, Xiaoping [1 ]
机构
[1] Natl Univ Def Technol, Coll Elect Engn, Hefei 230011, Peoples R China
[2] Sichuan Univ, Coll Cybersecur, Chengdu 610065, Peoples R China
[3] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei 230009, Peoples R China
来源
IEEE ACCESS | 2020年 / 8卷
关键词
Binary code similarity detection; cross-version binary; malware detection; similarity coefficient; correlation coefficient;
D O I
10.1109/ACCESS.2020.3004813
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The technique of binary code similarity detection (BCSD) has been applied in many fields, such as malware detection, plagiarism detection and vulnerability search, etc. Existing solutions for the BCSD problem usually compare specific features between binaries based on the control flow graphs of functions from binaries or compute the embedding vector of binary functions and solve the problem based on deep learning algorithms. In this paper, from another research perspective, we propose a new and lightweight method to solve cross-version BCSD problem based on multiple features. It transforms binary functions into vectors and signals and computes the similarity coefficient value and correlation coefficient value for solving cross-version BCSD problem. Without relying on the CFG of functions, deep learning algorithms and other related attributes, our method works directly on the raw bytes of each binary and it can be used as an alternative method to coping with various complex situations that exist in the real-world environment. We implement the method and evaluate it on a custom dataset with about 423,282 samples. The result shows that the method could perform well in cross-version BCSD field, and the recall of our method could reach 96.63%, which is almost the same as the state-of-the-art static solution.
引用
收藏
页码:120501 / 120512
页数:12
相关论文
共 50 条
  • [31] A Cross-Project Defect Prediction Approach Based on Code Semantics and Cross-Version Structural Information
    Zou, Yifan
    Wang, Huiqiang
    Lv, Hongwu
    Zhao, Shuai
    Tian, Haoye
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2024, 34 (07) : 1135 - 1171
  • [32] Multi-Level Cross-Architecture Binary Code Similarity Metric
    Meng Qiao
    Xiaochuan Zhang
    Huihui Sun
    Zheng Shan
    Fudong Liu
    Wenjie Sun
    Xingwei Li
    Arabian Journal for Science and Engineering, 2021, 46 : 8603 - 8615
  • [33] Cross-Modality Binary Code Learning via Fusion Similarity Hashing
    Liu, Hong
    Ji, Rongrong
    Wu, Yongjian
    Huang, Feiyue
    Zhang, Baochang
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6345 - 6353
  • [34] Cross-architecture Binary Function Similarity Detection based on Composite Feature Model
    Li, Xiaonan
    Zhang, Guimin
    Li, Qingbao
    Zhang, Ping
    Chen, Zhifeng
    Liu, Jinjin
    Yue, Shudan
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2023, 17 (08): : 2101 - 2123
  • [35] Self-similarity based lightweight intrusion detection method
    Kwon, Hyukmin
    Kim, Eunjin
    Yu, Song Jin
    Kim, Huy Kang
    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2011, 14 (11): : 3683 - 3690
  • [36] GraphBinMatch: Graph-based Similarity Learning for Cross-Language Binary and Source Code Matching
    TehraniJamsaz, Ali
    Chen, Hanze
    Jannesari, Ali
    2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW 2024, 2024, : 506 - 515
  • [37] A Review of Deep Learning-Based Binary Code Similarity Analysis
    Du, Jiang
    Wei, Qiang
    Wang, Yisen
    Sun, Xiangjie
    ELECTRONICS, 2023, 12 (22)
  • [38] Codeformer: A GNN-Nested Transformer Model for Binary Code Similarity Detection
    Liu, Guangming
    Zhou, Xin
    Pang, Jianmin
    Yue, Feng
    Liu, Wenfu
    Wang, Junchao
    ELECTRONICS, 2023, 12 (07)
  • [39] A Semantics-Based Hybrid Approach on Binary Code Similarity Comparison
    Hu, Yikun
    Wang, Hui
    Zhang, Yuanyuan
    Li, Bodong
    Gu, Dawu
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2021, 47 (06) : 1241 - 1258
  • [40] BinCola: Diversity-Sensitive Contrastive Learning for Binary Code Similarity Detection
    Jiang, Shuai
    Fu, Cai
    He, Shuai
    Lv, Jianqiang
    Han, Lansheng
    Hu, Hong
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2024, 50 (10) : 2485 - 2497