GraphBinMatch: Graph-based Similarity Learning for Cross-Language Binary and Source Code Matching

被引:0
|
作者
TehraniJamsaz, Ali [1 ]
Chen, Hanze [1 ]
Jannesari, Ali [1 ]
机构
[1] Iowa State Univ, Ames, IA 50011 USA
基金
美国国家科学基金会;
关键词
cross-language; code similarity; binary-source matching;
D O I
10.1109/IPDPSW63119.2024.00103
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Matching binary to source code and vice versa has various applications in different fields, such as computer security, software engineering, and reverse engineering. Even though there exist methods that try to match source code with binary code to accelerate the reverse engineering process, most of them arc designed to focus on one programming language. However, in real life, programs are developed using different programming languages depending on their requirements. Thus, cross-language binary-to-source code matching has recently gained more attention. Nonetheless, the existing approaches still stniggle to have precise predictions due to the inherent difficulties when the problem of matching binary code and source code needs to be addressed across programming languages. In this paper, we address the problem of cross-language binary source code matching. We propose GraphBinMatch, an approach based on a graph neural network that learns the similarity between binary and source codes. We evaluate GraphBinMatch on several tasks, such as cross-language binary-to-source code matching and cross-language source-to-source matching We also evaluate the performance of our approach on single-language binary-to-source code matching. Experimental results show that GraphBinMatch significantly outperforms stale-of-the-art, with improvements as high as 15% over the Fl score.
引用
下载
收藏
页码:506 / 515
页数:10
相关论文
共 50 条
  • [1] Cross-Language Binary-Source Code Matching with Intermediate Representations
    Gui, Yi
    Wan, Yao
    Zhang, Hongyu
    Huang, Huifang
    Sui, Yulei
    Xu, Guandong
    Shao, Zhiyuan
    Jin, Hai
    2022 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2022), 2022, : 601 - 612
  • [2] Flowchart-Based Cross-Language Source Code Similarity Detection
    Zhang, Feng
    Li, Guofan
    Liu, Cong
    Song, Qian
    SCIENTIFIC PROGRAMMING, 2020, 2020
  • [3] Graph-Based Similarity Analysis: A New Approach to Cross-Language Plagiarism Detection
    Franco-Salvador, Marc
    Gupta, Parth
    Rosso, Paolo
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2013, (50): : 21 - 28
  • [4] Learning Graph-based Code Representations for Source-level Functional Similarity Detection
    Liu, Jiahao
    Zeng, Jun
    Wang, Xiang
    Liang, Zhenkai
    2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE, 2023, : 345 - 357
  • [5] Cross-Language Learning for Product Matching
    Peeters, Ralph
    Bizer, Christian
    COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION, 2022, : 236 - 238
  • [6] Modeling Functional Similarity in Source Code With Graph-Based Siamese Networks
    Mehrotra, Nikita
    Agarwal, Navdha
    Gupta, Piyush
    Anand, Saket
    Lo, David
    Purandare, Rahul
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (10) : 3771 - 3789
  • [7] Cross-Language Code Similarity and Applications in Clone Detection and Code Search
    Mathew, George Varghese
    ProQuest Dissertations and Theses Global, 2022,
  • [8] Improving Cross-Language Code Clone Detection via Code Representation Learning and Graph Neural Networks
    Mehrotra, Nikita
    Sharma, Akash
    Jindal, Anmol
    Purandare, Rahul
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (11) : 4846 - 4868
  • [9] Towards the Detection of Cross-Language Source Code Reuse
    Flores, Enrique
    Barron-Cedeno, Alberto
    Rosso, Paolo
    Moreno, Lidia
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, 2011, 6716 : 250 - 253
  • [10] Cross-language plagiarism detection over continuous-space- and knowledge graph-based representations of language
    Franco-Salvador, Marc
    Gupta, Parth
    Rosso, Paolo
    Banchs, Rafael E.
    KNOWLEDGE-BASED SYSTEMS, 2016, 111 : 87 - 99