Efficiently Identifying Binary Similarity Based on Deep Hashing and Contrastive Learning

被引:0
|
作者
Xiong, Jiaqi [1 ]
Cheng, Shaoyin [2 ]
Gao, Han [1 ]
Zhang, Weiming [1 ]
机构
[1] Univ Sci & Technol China, CAS Key Lab Elect Space Informat, Hefei, Peoples R China
[2] Anhui Prov Key Lab Cyberspace Secur Situat Awaren, Hefei, Peoples R China
来源
2023 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYTICS, ICCCBDA | 2023年
关键词
Big Data; Reverse Engineering; Deep Learning; Deep Hashing; Binary Diffing;
D O I
10.1109/ICCCBDA56900.2023.10154664
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Binary similarity is to identify the semantic similarities of two or more binary code snippets. In recent years, deep learning-based methods have shown promising results. They formalize code similarity as the nearest neighbor retrieval task, and the overall workflow can be divided into two stages: 1) feeding the code snippets into the embedding model to get the corresponding high-dimensional vectors as fingerprints (i.e., constructing the codebase). 2) using the codebase for nearest neighbor retrieval to get the top-k results. Most existing studies only focus on the first stage (more specifically, the embedding model) while ignoring the overhead of the retrieval stage. In real-world scenarios, the codebase could be quite large and contain massive embeddings, which keeps the precise nearest neighbor retrieval prohibitive expensive. To mitigate the issue above, this paper proposes a novel approach, dubbed BinCH, which can efficiently perform code search without sacrificing accuracy.
引用
收藏
页码:128 / 133
页数:6
相关论文
empty
未找到相关数据