Efficiently Identifying Binary Similarity Based on Deep Hashing and Contrastive Learning

被引：0

作者：

Xiong, Jiaqi ^{[1
]}

Cheng, Shaoyin ^{[2
]}

Gao, Han ^{[1
]}

Zhang, Weiming ^{[1
]}

机构：

[1] Univ Sci & Technol China, CAS Key Lab Elect Space Informat, Hefei, Peoples R China

[2] Anhui Prov Key Lab Cyberspace Secur Situat Awaren, Hefei, Peoples R China

来源：

2023 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYTICS, ICCCBDA | 2023年

关键词：

Big Data; Reverse Engineering; Deep Learning; Deep Hashing; Binary Diffing;

D O I：

10.1109/ICCCBDA56900.2023.10154664

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Binary similarity is to identify the semantic similarities of two or more binary code snippets. In recent years, deep learning-based methods have shown promising results. They formalize code similarity as the nearest neighbor retrieval task, and the overall workflow can be divided into two stages: 1) feeding the code snippets into the embedding model to get the corresponding high-dimensional vectors as fingerprints (i.e., constructing the codebase). 2) using the codebase for nearest neighbor retrieval to get the top-k results. Most existing studies only focus on the first stage (more specifically, the embedding model) while ignoring the overhead of the retrieval stage. In real-world scenarios, the codebase could be quite large and contain massive embeddings, which keeps the precise nearest neighbor retrieval prohibitive expensive. To mitigate the issue above, this paper proposes a novel approach, dubbed BinCH, which can efficiently perform code search without sacrificing accuracy.

引用

页码：128 / 133

页数：6