Scalable code clone search for malware analysis

被引:15
|
作者
Farhadi, Mohammad Reza [1 ,3 ]
Fung, Benjamin C. M. [1 ]
Fung, Yin Bun [3 ]
Charland, Philippe [2 ]
Preda, Stere [3 ]
Debbabi, Mourad [3 ]
机构
[1] McGill Univ, Sch Informat Studies, Montreal, PQ, Canada
[2] DRDC Valcartier Res Ctr, Mission Crit Cyber Secur Sect, Quebec City, PQ, Canada
[3] Concordia Univ, Concordia Inst Informat Syst Engn, Montreal, PQ, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Assembly code clone detection; Malware analysis; Reverse engineering; Software fingerprinting; Software security;
D O I
10.1016/j.diin.2015.06.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reverse engineering is the primary step to analyze a piece of malware. After having disassembled a malware binary, a reverse engineer needs to spend extensive effort analyzing the resulting assembly code, and then documenting it through comments in the assembly code for future references. In this paper, we have developed an assembly code clone search system called ScalClone based on our previous work on assembly code clone detection systems. The objective of the system is to identify the code clones of a target malware from a collection of previously analyzed malware binaries. Our new contributions are summarized as follows: First, we introduce two assembly code clone search methods for malware analysis with a high recall rate. Second, our methods allow malware analysts to discover both exact and inexact clones at different token normalization levels. Third, we present a scalable system with a database model to support large-scale assembly code search. Finally, experimental results on real-life malware binaries suggest that our proposed methods can effectively identify assembly code clones with the consideration of different scenarios of code mutations. (C) 2015 Elsevier Ltd. All rights reserved.
引用
收藏
页码:46 / 60
页数:15
相关论文
共 50 条
  • [1] Siamese: scalable and incremental code clone search via multiple code representations
    Ragkhitwetsagul, Chaiyong
    Krinke, Jens
    [J]. EMPIRICAL SOFTWARE ENGINEERING, 2019, 24 (04) : 2236 - 2284
  • [2] Siamese: scalable and incremental code clone search via multiple code representations
    Chaiyong Ragkhitwetsagul
    Jens Krinke
    [J]. Empirical Software Engineering, 2019, 24 : 2236 - 2284
  • [3] Scalable code clone detection and search based on adaptive prefix filtering
    Nishi, Manziba Akanda
    Damevski, Kostadin
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2018, 137 : 130 - 142
  • [4] Scalable Program Clone Search through Spectral Analysis
    Benoit, Tristan
    Marion, Jean-Yves
    Bardin, Sebastien
    [J]. PROCEEDINGS OF THE 31ST ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2023, 2023, : 808 - 820
  • [5] A Survey of Approaches for Code Clone Search
    Choi E.
    Mizuno O.
    Fujiwara Y.
    Yoshida N.
    [J]. Computer Software, 2022, 39 (03): : 47 - 59
  • [6] SeByte: Scalable clone and similarity search for bytecode
    Keivanloo, Iman
    Roy, Chanchal K.
    Rilling, Juergen
    [J]. SCIENCE OF COMPUTER PROGRAMMING, 2014, 95 : 426 - 444
  • [7] VUDDY: A Scalable Approach for Vulnerable Code Clone Discovery
    Kim, Seulbae
    Woo, Seunghoon
    Lee, Heejo
    Oh, Hakjoo
    [J]. 2017 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2017, : 595 - 614
  • [8] Clone-Seeker: Effective Code Clone Search Using Annotations
    Hammad, Muhammad
    Babur, Onder
    Basit, Hamid Abdul
    Van den Brand, Mark
    [J]. IEEE ACCESS, 2022, 10 : 11696 - 11713
  • [9] Scalable and Multifaceted Search and Its Application for Binary Malware Files
    Kim, Donghoon
    Hur, Junnyung
    Yoon, Myungkeun
    [J]. IEEE ACCESS, 2021, 9 (09): : 112770 - 112779
  • [10] Surfacing code in the dark: an instant clone search approach
    Jin-woo Park
    Mu-Woong Lee
    Jong-Won Roh
    Seung-won Hwang
    Sunghun Kim
    [J]. Knowledge and Information Systems, 2014, 41 : 727 - 759