BinCola: Diversity-Sensitive Contrastive Learning for Binary Code Similarity Detection

被引:1
|
作者
Jiang, Shuai [1 ]
Fu, Cai [1 ]
He, Shuai [1 ]
Lv, Jianqiang [1 ]
Han, Lansheng [1 ]
Hu, Hong [2 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Cyber Sci & Engn, Wuhan 430074, Peoples R China
[2] Penn State Univ, Coll Informat Sci & Technol, University Pk, PA 16802 USA
关键词
Feature extraction; Contrastive learning; Vectors; Source coding; Software; Semantics; Training; Diversity sensitive; binary analysis; similarity detection; attention mechanism; NETWORKS;
D O I
10.1109/TSE.2024.3411072
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Binary Code Similarity Detection (BCSD) is a fundamental binary analysis technique in the area of software security. Recently, advanced deep learning algorithms are integrated into BCSD platforms to achieve superior performance on well-known benchmarks. However, real-world large programs embed more complex diversities due to different compilers, various optimization levels, multiple architectures and even obfuscations. Existing BCSD solutions suffer from low accuracy issues in such complicated real-world application scenarios. In this paper, we propose BinCola, a novel Transformer-based dual diversity-sensitive contrastive learning framework that comprehensively considers the diversity of compiler options and candidate functions in the real-world application scenarios and employs the attention mechanism to fuse multi-granularity function features for enhancing generality and scalability. BinCola simultaneously compares multiple candidate functions across various compilation option scenarios to learn the differences caused by distinct compiler options and different candidate functions. We evaluate BinCola's performance in a variety of ways, including binary similarity detection and real-world vulnerability search in multiple application scenarios. The results demonstrate that BinCola achieves superior performance compared to state-of-the-art (SOTA) methods, with improvements of 2.80%, 33.62%, 22.41%, and 34.25% in cross-architecture, cross-optimization level, cross-compiler, and cross-obfuscation scenarios, respectively.
引用
收藏
页码:2485 / 2497
页数:13
相关论文
共 50 条
  • [1] CODER: Coupled Diversity-Sensitive Momentum Contrastive Learning for Image-Text Retrieval
    Wang, Haoran
    He, Dongliang
    Wu, Wenhao
    Xia, Boyang
    Yang, Min
    Li, Fu
    Yu, Yunlong
    Ji, Zhong
    Ding, Errui
    Wang, Jingdong
    COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 700 - 716
  • [2] SimCGE: Simple Contrastive Learning of Graph Embeddings for Cross-Version Binary Code Similarity Detection
    Xia, Fengliang
    Wu, Guixing
    Zhao, Guochao
    Li, Xiangyu
    INFORMATION AND COMMUNICATIONS SECURITY, ICICS 2022, 2022, 13407 : 458 - 471
  • [3] Binary Code Similarity Detection
    Liu, Zian
    2021 36TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING ASE 2021, 2021, : 1056 - 1060
  • [4] BinDeep: A deep learning approach to binary code similarity detection
    Tian, Donghai
    Jia, Xiaoqi
    Ma, Rui
    Liu, Shuke
    Liu, Wenjing
    Hu, Changzhen
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 168
  • [5] BlockMatch: A Fine-Grained Binary Code Similarity Detection Approach Using Contrastive Learning for Basic Block Matching
    Luo, Zhenhao
    Wang, Pengfei
    Xie, Wei
    Zhou, Xu
    Wang, Baosheng
    APPLIED SCIENCES-BASEL, 2023, 13 (23):
  • [6] Practical Binary Code Similarity Detection with BERT-based Transferable Similarity Learning
    Ahn, Sunwoo
    Ahn, Seonggwan
    Koo, Hyungjoon
    Paek, Yunheung
    PROCEEDINGS OF THE 38TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE, ACSAC 2022, 2022, : 361 - 374
  • [7] Path-Sensitive Code Embedding via Contrastive Learning for Software Vulnerability Detection
    Cheng, Xiao
    Zhan, Guanqin
    Wang, Haoyu
    Sui, Yulei
    PROCEEDINGS OF THE 31ST ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2022, 2022, : 519 - 531
  • [8] Binary Code Similarity Detection: State and Future
    Li, Zhenshan
    Liu, Hao
    Shan, Ruijie
    Sun, Yanbin
    Jiang, Yu
    Hu, Ning
    2023 IEEE 12TH INTERNATIONAL CONFERENCE ON CLOUD NETWORKING, CLOUDNET, 2023, : 408 - 412
  • [9] A Survey of Binary Code Similarity Detection Techniques
    Ruan, Liting
    Xu, Qizhen
    Zhu, Shunzhi
    Huang, Xujing
    Lin, Xinyang
    ELECTRONICS, 2024, 13 (09)
  • [10] Code Clone Detection Based on Contrastive Learning
    Xie, Chunli
    Liang, Yao
    Lv, Quanrun
    Wan, Zexuan
    2024 IEEE 4TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND ARTIFICIAL INTELLIGENCE, SEAI 2024, 2024, : 151 - 156