BlockMatch: A Fine-Grained Binary Code Similarity Detection Approach Using Contrastive Learning for Basic Block Matching

被引:1
|
作者
Luo, Zhenhao [1 ]
Wang, Pengfei [1 ]
Xie, Wei [1 ]
Zhou, Xu [1 ]
Wang, Baosheng [1 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Changsha 410073, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 23期
关键词
binary code similarity detection; basic block matching; contrastive learning;
D O I
10.3390/app132312751
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Binary code similarity detection (BCSD) plays a vital role in computer security and software engineering. Traditional BCSD methods heavily rely on specific features and necessitate rich expert knowledge, which are sensitive to code alterations. To improve the robustness against minor code alterations, recent research has shifted towards machine learning-based approaches. However, existing BCSD approaches mainly focus on function-level matching and face challenges related to large batch optimization and high quality sample selection at the basic block level. To overcome these challenges, we propose BlockMatch, a novel fine-grained BCSD approach that leverages natural language processing (NLP) techniques and contrastive learning for basic block matching. We treat instructions of basic blocks as a language and utilize a DeBERTa model to capture relative position relations and contextual semantics for encoding instruction sequences. For various operands in binary code, we propose a root operand model pre-training task to mitigate semantic missing of unseen operands. We then employ a mean pooling layer to generate basic block embeddings for detecting binary code similarity. Additionally, we propose a contrastive training framework, including a block augmentation model to generate high-quality training samples, improving the effectiveness of model training. Inspired by contrastive learning, we adopt the NT-Xent loss as our objective function, which allows larger sample sizes for model training and mitigates the convergence issues caused by limited local positive/negative samples. By conducting extensive experiments, we evaluate BlockMatch and compare it against state-of-the-art approaches such as PalmTree and SAFE. The results demonstrate that BlockMatch achieves a recall@1 of 0.912 at the basic block level under the cross-compiler scenario (pool size = 10), which outperforms PalmTree (0.810) and SAFE (0.798). Furthermore, our ablation study shows that the proposed contrastive training framework and root operand model pre-training task help our model achieve superior performance.
引用
收藏
页数:22
相关论文
共 24 条
  • [1] Fine-grained Similarity Matching with a Similarity Filtration Pyramid for Code Search
    Tan, Cong
    Yang, Shun
    [J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [2] Class-Balanced Contrastive Learning for Fine-Grained Airplane Detection
    Li, Yan
    Wang, Qixiong
    Luo, Xiaoyan
    Yin, Jihao
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [3] Fine-grained biomedical knowledge negation detection via contrastive learning
    Zhu, Tiantian
    Xiang, Yang
    Chen, Qingcai
    Qin, Yang
    Hu, Baotian
    Zhang, Wentai
    [J]. KNOWLEDGE-BASED SYSTEMS, 2023, 272
  • [4] Instance Switching-Based Contrastive Learning for Fine-Grained Airplane Detection
    Zeng, Lanxin
    Guo, Haowen
    Yang, Wen
    Yu, Huai
    Yu, Lei
    Zhang, Peng
    Zou, Tongyuan
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [5] Unsupervised Deep Hashing With Fine-Grained Similarity-Preserving Contrastive Learning for Image Retrieval
    Cao, Hu
    Huang, Lei
    Nie, Jie
    Wei, Zhiqiang
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (05) : 4095 - 4108
  • [6] Fine-grained Patient Similarity Measuring using Deep Metric Learning
    Ni, Jiazhi
    Liu, Jie
    Zhang, Chenxin
    Ye, Dan
    Ma, Zhirou
    [J]. CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 1189 - 1198
  • [7] PCLDet: Prototypical Contrastive Learning for Fine-Grained Object Detection in Remote Sensing Images
    Ouyang, Lihan
    Guo, Guangmiao
    Fang, Leyuan
    Ghamisi, Pedram
    Yue, Jun
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [8] Searching for Fine-Grained Queries in Radiology Reports Using Similarity-Preserving Contrastive Embedding
    Syeda-Mahmood, Tanveer
    Shi, Luyao
    [J]. MACHINE LEARNING FOR HEALTHCARE CONFERENCE, VOL 182, 2022, 182 : 785 - 799
  • [9] Code Clone Detection using Coarse and Fine-grained Hybrid Approaches
    Sheneamer, Abdullah
    Kalita, Jugal
    [J]. 2015 IEEE SEVENTH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND INFORMATION SYSTEMS (ICICIS), 2015, : 472 - 480
  • [10] BinDeep: A deep learning approach to binary code similarity detection
    Tian, Donghai
    Jia, Xiaoqi
    Ma, Rui
    Liu, Shuke
    Liu, Wenjing
    Hu, Changzhen
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 168