BinCola: Diversity-Sensitive Contrastive Learning for Binary Code Similarity Detection

被引:1
|
作者
Jiang, Shuai [1 ]
Fu, Cai [1 ]
He, Shuai [1 ]
Lv, Jianqiang [1 ]
Han, Lansheng [1 ]
Hu, Hong [2 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Cyber Sci & Engn, Wuhan 430074, Peoples R China
[2] Penn State Univ, Coll Informat Sci & Technol, University Pk, PA 16802 USA
关键词
Feature extraction; Contrastive learning; Vectors; Source coding; Software; Semantics; Training; Diversity sensitive; binary analysis; similarity detection; attention mechanism; NETWORKS;
D O I
10.1109/TSE.2024.3411072
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Binary Code Similarity Detection (BCSD) is a fundamental binary analysis technique in the area of software security. Recently, advanced deep learning algorithms are integrated into BCSD platforms to achieve superior performance on well-known benchmarks. However, real-world large programs embed more complex diversities due to different compilers, various optimization levels, multiple architectures and even obfuscations. Existing BCSD solutions suffer from low accuracy issues in such complicated real-world application scenarios. In this paper, we propose BinCola, a novel Transformer-based dual diversity-sensitive contrastive learning framework that comprehensively considers the diversity of compiler options and candidate functions in the real-world application scenarios and employs the attention mechanism to fuse multi-granularity function features for enhancing generality and scalability. BinCola simultaneously compares multiple candidate functions across various compilation option scenarios to learn the differences caused by distinct compiler options and different candidate functions. We evaluate BinCola's performance in a variety of ways, including binary similarity detection and real-world vulnerability search in multiple application scenarios. The results demonstrate that BinCola achieves superior performance compared to state-of-the-art (SOTA) methods, with improvements of 2.80%, 33.62%, 22.41%, and 34.25% in cross-architecture, cross-optimization level, cross-compiler, and cross-obfuscation scenarios, respectively.
引用
收藏
页码:2485 / 2497
页数:13
相关论文
共 50 条
  • [21] αDiff: Cross-Version Binary Code Similarity Detection with DNN
    Liu, Bingchang
    Huo, Wei
    Zhang, Chao
    Li, Wenchao
    Li, Feng
    Piao, Aihua
    Zou, Wei
    PROCEEDINGS OF THE 2018 33RD IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMTED SOFTWARE ENGINEERING (ASE' 18), 2018, : 667 - 678
  • [22] Enhancing Representation of Spiking Neural Networks via Similarity-Sensitive Contrastive Learning
    Zhang, Yuhan
    Liu, Xiaode
    Chen, Yuanpei
    Peng, Weihang
    Guo, Yufei
    Huang, Xuhui
    Ma, Zhe
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 15, 2024, : 16926 - 16934
  • [23] Cross-Modality Binary Code Learning via Fusion Similarity Hashing
    Liu, Hong
    Ji, Rongrong
    Wu, Yongjian
    Huang, Feiyue
    Zhang, Baochang
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6345 - 6353
  • [24] A Lightweight Cross-Version Binary Code Similarity Detection Based on Similarity and Correlation Coefficient Features
    Guo, Hui
    Huang, Shuguang
    Huang, Cheng
    Zhang, Min
    Pan, Zulie
    Shi, Fan
    Huang, Hui
    Hu, Donghui
    Wang, Xiaoping
    IEEE ACCESS, 2020, 8 : 120501 - 120512
  • [25] Semantic aware-based instruction embedding for binary code similarity detection
    Jia, Yuhao
    Yu, Zhicheng
    Hong, Zhen
    PLOS ONE, 2024, 19 (06):
  • [26] Codeformer: A GNN-Nested Transformer Model for Binary Code Similarity Detection
    Liu, Guangming
    Zhou, Xin
    Pang, Jianmin
    Yue, Feng
    Liu, Wenfu
    Wang, Junchao
    ELECTRONICS, 2023, 12 (07)
  • [27] Sensing the diversity of rumors: Rumor detection with hierarchical prototype contrastive learning
    Zheng, Peng
    Dou, Yong
    Yan, Yeqing
    INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (06)
  • [28] Asteria-Pro: Enhancing Deep Learning-based Binary Code Similarity Detection by Incorporating Domain Knowledge
    Yang, Shouguo
    Dong, Chaopeng
    Xiao, Yang
    Cheng, Yiran
    Shi, Zhiqiang
    Li, Zhi
    Sun, Limin
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2024, 33 (01)
  • [29] Asteria: Deep Learning-based AST-Encoding for Cross-platform Binary Code Similarity Detection
    Yang, Shouguo
    Cheng, Long
    Zeng, Yicheng
    Lang, Zhe
    Zhu, Hongsong
    Shi, Zhiqiang
    51ST ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN 2021), 2021, : 224 - 236
  • [30] Evaluating few-shot and contrastive learning methods for code clone detection
    Khajezade, Mohamad
    Fard, Fatemeh H.
    Shehata, Mohamed S.
    EMPIRICAL SOFTWARE ENGINEERING, 2024, 29 (06)