A Lightweight Cross-Version Binary Code Similarity Detection Based on Similarity and Correlation Coefficient Features

被引:8
|
作者
Guo, Hui [1 ]
Huang, Shuguang [1 ]
Huang, Cheng [2 ]
Zhang, Min [1 ]
Pan, Zulie [1 ]
Shi, Fan [1 ]
Huang, Hui [1 ]
Hu, Donghui [3 ]
Wang, Xiaoping [1 ]
机构
[1] Natl Univ Def Technol, Coll Elect Engn, Hefei 230011, Peoples R China
[2] Sichuan Univ, Coll Cybersecur, Chengdu 610065, Peoples R China
[3] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei 230009, Peoples R China
来源
IEEE ACCESS | 2020年 / 8卷
关键词
Binary code similarity detection; cross-version binary; malware detection; similarity coefficient; correlation coefficient;
D O I
10.1109/ACCESS.2020.3004813
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The technique of binary code similarity detection (BCSD) has been applied in many fields, such as malware detection, plagiarism detection and vulnerability search, etc. Existing solutions for the BCSD problem usually compare specific features between binaries based on the control flow graphs of functions from binaries or compute the embedding vector of binary functions and solve the problem based on deep learning algorithms. In this paper, from another research perspective, we propose a new and lightweight method to solve cross-version BCSD problem based on multiple features. It transforms binary functions into vectors and signals and computes the similarity coefficient value and correlation coefficient value for solving cross-version BCSD problem. Without relying on the CFG of functions, deep learning algorithms and other related attributes, our method works directly on the raw bytes of each binary and it can be used as an alternative method to coping with various complex situations that exist in the real-world environment. We implement the method and evaluate it on a custom dataset with about 423,282 samples. The result shows that the method could perform well in cross-version BCSD field, and the recall of our method could reach 96.63%, which is almost the same as the state-of-the-art static solution.
引用
收藏
页码:120501 / 120512
页数:12
相关论文
共 50 条
  • [21] Code similarity detection through control statement and program features
    Sudhamani, M.
    Rangarajan, Lalitha
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 132 : 63 - 75
  • [22] Asteria: Deep Learning-based AST-Encoding for Cross-platform Binary Code Similarity Detection
    Yang, Shouguo
    Cheng, Long
    Zeng, Yicheng
    Lang, Zhe
    Zhu, Hongsong
    Shi, Zhiqiang
    51ST ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN 2021), 2021, : 224 - 236
  • [23] New Similarity Correlation Functions for Sets and Binary Data based on Jaccard Similarity Measure
    Batyrshin, Ildar
    Rudas, Imre
    18TH INTERNATIONAL SYMPOSIUM ON APPLIED COMPUTATIONAL INTELLIGENCE AND INFORMATICS, SACI 2024, 2024, : 145 - 149
  • [24] Similarity-Based Correlation Functions for Binary Data
    Batyrshin, Ildar Z.
    Ramirez-Mejia, Ivan
    Batyrshin, Ilnur I.
    Solovyev, Valery
    ADVANCES IN COMPUTATIONAL INTELLIGENCE, MICAI 2020, PT II, 2020, 12469 : 224 - 233
  • [25] UniASM: Binary code similarity detection without fine-tuning
    Gu, Yeming
    Shu, Hui
    Kang, Fei
    Hu, Fan
    NEUROCOMPUTING, 2025, 630
  • [26] jTrans: Jump-Aware Transformer for Binary Code Similarity Detection
    Wang, Hao
    Qu, Wenjie
    Katz, Gilad
    Zhu, Wenyu
    Gao, Zeyu
    Qiu, Han
    Zhuge, Jianwei
    Zhang, Chao
    PROCEEDINGS OF THE 31ST ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2022, 2022, : 1 - 13
  • [27] Cross-Language Code Similarity and Applications in Clone Detection and Code Search
    Mathew, George Varghese
    ProQuest Dissertations and Theses Global, 2022,
  • [28] Fast Cross-Platform Binary Code Similarity Detection Framework Based on CFGs Taking Advantage of NLP and Inductive GNN
    Jinxue PENG
    Yong WANG
    Jingfeng XUE
    Zhenyan LIU
    Chinese Journal of Electronics, 2024, 33 (01) : 128 - 138
  • [29] Fast Cross-Platform Binary Code Similarity Detection Framework Based on CFGs Taking Advantage of NLP and Inductive GNN
    Peng, Jinxue
    Wang, Yong
    Xue, Jingfeng
    Liu, Zhenyan
    CHINESE JOURNAL OF ELECTRONICS, 2024, 33 (01) : 128 - 138
  • [30] Multi-Level Cross-Architecture Binary Code Similarity Metric
    Qiao, Meng
    Zhang, Xiaochuan
    Sun, Huihui
    Shan, Zheng
    Liu, Fudong
    Sun, Wenjie
    Li, Xingwei
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2021, 46 (09) : 8603 - 8615