A Lightweight Cross-Version Binary Code Similarity Detection Based on Similarity and Correlation Coefficient Features

被引:8
|
作者
Guo, Hui [1 ]
Huang, Shuguang [1 ]
Huang, Cheng [2 ]
Zhang, Min [1 ]
Pan, Zulie [1 ]
Shi, Fan [1 ]
Huang, Hui [1 ]
Hu, Donghui [3 ]
Wang, Xiaoping [1 ]
机构
[1] Natl Univ Def Technol, Coll Elect Engn, Hefei 230011, Peoples R China
[2] Sichuan Univ, Coll Cybersecur, Chengdu 610065, Peoples R China
[3] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei 230009, Peoples R China
来源
IEEE ACCESS | 2020年 / 8卷
关键词
Binary code similarity detection; cross-version binary; malware detection; similarity coefficient; correlation coefficient;
D O I
10.1109/ACCESS.2020.3004813
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The technique of binary code similarity detection (BCSD) has been applied in many fields, such as malware detection, plagiarism detection and vulnerability search, etc. Existing solutions for the BCSD problem usually compare specific features between binaries based on the control flow graphs of functions from binaries or compute the embedding vector of binary functions and solve the problem based on deep learning algorithms. In this paper, from another research perspective, we propose a new and lightweight method to solve cross-version BCSD problem based on multiple features. It transforms binary functions into vectors and signals and computes the similarity coefficient value and correlation coefficient value for solving cross-version BCSD problem. Without relying on the CFG of functions, deep learning algorithms and other related attributes, our method works directly on the raw bytes of each binary and it can be used as an alternative method to coping with various complex situations that exist in the real-world environment. We implement the method and evaluate it on a custom dataset with about 423,282 samples. The result shows that the method could perform well in cross-version BCSD field, and the recall of our method could reach 96.63%, which is almost the same as the state-of-the-art static solution.
引用
收藏
页码:120501 / 120512
页数:12
相关论文
共 50 条
  • [41] Similarity-based Android malware detection using Hamming distance of static binary features
    Taheri, Rahim
    Ghahramani, Meysam
    Javidan, Reza
    Shojafar, Mohammad
    Pooranian, Zahra
    Conti, Mauro
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 105 : 230 - 247
  • [42] Similarity-based face image retrieval using sparsely embedded deep features and binary code learning
    Elboushaki, Abdessamad
    Hannane, Rachida
    Afdel, Karim
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2024, 13 (03)
  • [43] VDSimilar: Vulnerability detection based on code similarity of vulnerabilities and patches
    Sun, Hao
    Cui, Lei
    Li, Lun
    Ding, Zhenquan
    Hao, Zhiyu
    Cui, Jiancong
    Liu, Peng
    COMPUTERS & SECURITY, 2021, 110
  • [44] Similarity Code File Detection Model Based on Frequent Itemsets
    Jiang, Jian-hong
    Wang, Ke
    2018 INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATION AND NETWORK TECHNOLOGY (CCNT 2018), 2018, 291 : 254 - 262
  • [45] A method for efficient malicious code detection based on conceptual similarity
    Kim, Sungsuk
    Choi, Chang
    Choi, Junho
    Kim, Pankoo
    Kim, Hanil
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2006, PT 4, 2006, 3983 : 567 - 576
  • [46] A Semantics-Based Approach on Binary Function Similarity Detection
    Zhang, Yuntao
    Fang, Binxing
    Xiong, Zehui
    Wang, Yanhao
    Liu, Yuwei
    Zheng, Chao
    Zhang, Qinnan
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (15): : 25910 - 25924
  • [47] Detection of Correlated Alarms Based on Similarity Coefficients of Binary Data
    Yang, Zijiang
    Wang, Jiandong
    Chen, Tongwen
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2013, 10 (04) : 1014 - 1025
  • [48] Binary Vulnerability Similarity Detection Based on Function Parameter Dependency
    Xia, Bing
    Liu, Wenbo
    INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2023, 19 (01)
  • [49] Order Matters: Semantic-Aware Neural Networks for Binary Code Similarity Detection
    Yu, Zeping
    Cao, Rui
    Tang, Qiyi
    Nie, Sen
    Huang, Junzhou
    Wu, Shi
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 1145 - 1152
  • [50] Multi-semantic feature fusion attention network for binary code similarity detection
    Bangling Li
    Yuting Zhang
    Huaxi Peng
    Qiguang Fan
    Shen He
    Yan Zhang
    Songquan Shi
    Yang Zhang
    Ailiang Ma
    Scientific Reports, 13