Scalable Program Clone Search through Spectral Analysis

被引:0
|
作者
Benoit, Tristan [1 ]
Marion, Jean-Yves [1 ]
Bardin, Sebastien [2 ]
机构
[1] Univ Lorraine, CNRS, LORIA, Nancy, France
[2] Univ Paris Saclay, CEA LIST, Saclay, France
基金
欧盟地平线“2020”;
关键词
binary code analysis; clone search; spectral analysis; CODE; GRAPHS;
D O I
10.1145/3611643.3616279
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We consider the problem of program clone search, i.e. given a target program and a repository of known programs (all in executable format), the goal is to find the program in the repository most similar to the target program - with potential applications in terms of reverse engineering, program clustering, malware lineage and software theft detection. Recent years have witnessed a blooming in code similarity techniques, yet most of them focus on function-level similarity and function clone search, while we are interested in program-level similarity and program clone search. Actually, our study shows that prior similarity approaches are either too slow to handle large program repositories, or not precise enough, or yet not robust against slight variations introduced by compilers, source code versions or light obfuscations. We propose a novel spectral analysis method for program-level similarity and program clone search called Programs Spectral Similarity (PSS). In a nutshell, PSS one-time spectral feature extraction is tailored for large repositories, making it a perfect fit for program clone search. We have compared the different approaches with extensive benchmarks, showing that PSS reaches a sweet spot in terms of precision, speed and robustness.
引用
收藏
页码:808 / 820
页数:13
相关论文
共 50 条
  • [1] Scalable code clone search for malware analysis
    Farhadi, Mohammad Reza
    Fung, Benjamin C. M.
    Fung, Yin Bun
    Charland, Philippe
    Preda, Stere
    Debbabi, Mourad
    [J]. DIGITAL INVESTIGATION, 2015, 15 : 46 - 60
  • [2] SeByte: Scalable clone and similarity search for bytecode
    Keivanloo, Iman
    Roy, Chanchal K.
    Rilling, Juergen
    [J]. SCIENCE OF COMPUTER PROGRAMMING, 2014, 95 : 426 - 444
  • [3] Scalable code clone detection and search based on adaptive prefix filtering
    Nishi, Manziba Akanda
    Damevski, Kostadin
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2018, 137 : 130 - 142
  • [4] Siamese: scalable and incremental code clone search via multiple code representations
    Ragkhitwetsagul, Chaiyong
    Krinke, Jens
    [J]. EMPIRICAL SOFTWARE ENGINEERING, 2019, 24 (04) : 2236 - 2284
  • [5] Siamese: scalable and incremental code clone search via multiple code representations
    Chaiyong Ragkhitwetsagul
    Jens Krinke
    [J]. Empirical Software Engineering, 2019, 24 : 2236 - 2284
  • [6] Scalable and Approximate Program Dependence Analysis
    Lee, Seongmin
    [J]. 2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: COMPANION PROCEEDINGS (ICSE-COMPANION 2020), 2020, : 162 - 165
  • [7] Clone detection through srcClone: A program slicing based approach
    Alomari, Hakam W.
    Stephan, Matthew
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2022, 184
  • [8] Clone detection through srcClone: A program slicing based approach
    Alomari, Hakam W.
    Stephan, Matthew
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2022, 184
  • [9] AN EFFICIENT SEARCH PROGRAM FOR ASTM INFRARED SPECTRAL DATA
    TANABE, K
    TAMURA, T
    HIRAISHI, J
    SAEKI, S
    [J]. BUNSEKI KAGAKU, 1982, 31 (01) : E27 - E32
  • [10] GRASS: Graph Spectral Sparsification Leveraging Scalable Spectral Perturbation Analysis
    Feng, Zhuo
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2020, 39 (12) : 4944 - 4957