Recovery of high-level intermediate representations of algorithms from binary code

被引:0
|
作者
Bugerya, Alexander Borisovich [1 ]
Kulagin, Ivan Ivanovich [2 ]
Padaryan, Vartan Andronikovich [2 ,3 ]
Solovev, Mikhail Aleksandrovich [2 ,3 ]
Tikhonov, Andrei Yur'evich [2 ]
机构
[1] Russian Acad Sci, Keldysh Inst Appl Math, Moscow, Russia
[2] Russian Acad Sci, Ivannikov Inst Syst Programming, Moscow, Russia
[3] Lomonosov Moscow State Univ, Moscow, Russia
关键词
flowcharts; intermediate representation; binary code analysis; data flow analysis;
D O I
10.1109/IVMEM.2019.00015
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
One of the tasks of binary code security analysis is detection of undocumented features in software. This task is hard to automate, and it requires participation of a cybersecurity expert. The way of representation of the algorithm under analysis strongly determines the analysis effort and quality of its results. Existing intermediate representations and languages are intended for use in software that either carries out optimizing transformations or analyzes binary code. Such representations and intermediate languages are unsuitable for manual data flow analysis. This paper proposes a high-level hierarchical flowchart-based representation of a program algorithm as well as an algorithm for its construction. The proposed representation is based on a hypergraph and it allows both automatic and manual data flow analysis on different detail levels. The hypergraph nodes represent functions. Every node contains a set of other nodes which are fragments. The fragment is a linear sequence of instructions that does not contain call and ret instructions. Edges represent data flows between nodes and correspond to memory buffers and registers. In the future this representation can be used to implement automatic analysis algorithms. An approach is proposed to increasing quality of the developed algorithm representation using grouping of single data flows into one flow connecting logical algorithm modules.
引用
收藏
页码:57 / 63
页数:7
相关论文
共 50 条
  • [1] From High-Level Inference Algorithms to Efficient Code
    Walia, Rajan
    Narayanan, Praveen
    Carette, Jacques
    Tobin-Hochstadt, Sam
    Shan, Chung-chieh
    [J]. PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2019, 3 (ICFP):
  • [2] CODE-GENERATION METHODOLOGY USING TREE-PARSERS AND HIGH-LEVEL INTERMEDIATE REPRESENTATIONS
    AHALT, SC
    LEATHRUM, JF
    [J]. JOURNAL OF PROGRAMMING LANGUAGES, 1993, 1 (02): : 103 - 126
  • [3] Learning Efficient Binary Codes From High-Level Feature Representations for Multilabel Image Retrieval
    Ma, Lei
    Li, Hongliang
    Meng, Fanman
    Wu, Qingbo
    Ngan, King Ngi
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2017, 19 (11) : 2545 - 2560
  • [4] DYNAMIC CONTROL AND PROTOTYPING OF PARALLEL ALGORITHMS FOR INTERMEDIATE-LEVEL AND HIGH-LEVEL VISION
    WALLACE, AM
    MICHAELSON, GJ
    MCANDREW, P
    WAUGH, KG
    AUSTIN, WJ
    [J]. COMPUTER, 1992, 25 (02) : 43 - 53
  • [5] From High-Level Specification to High-Performance Code
    Franchetti, Franz
    Moura, Jose M. F.
    Padua, David A.
    Dongarra, Jack
    [J]. PROCEEDINGS OF THE IEEE, 2018, 106 (11) : 1875 - 1878
  • [6] ALGORITHMS FOR HIGH-LEVEL SYNTHESIS
    PAULIN, PG
    KNIGHT, JP
    [J]. IEEE DESIGN & TEST OF COMPUTERS, 1989, 6 (06): : 18 - 31
  • [7] Next-Generation Intermediate Representations for Binary Code Analysis
    M. A. Solovev
    M. G. Bakulin
    M. S. Gorbachev
    D. V. Manushin
    V. A. Padaryan
    S. S. Panasenko
    [J]. Programming and Computer Software, 2019, 45 : 424 - 437
  • [8] A novel code representation for detecting Java']Java code clones using high-level and abstract compiled code representations
    Quradaa, Fahmi H.
    Shahzad, Sara
    Saeed, Rashad
    Sufyan, Mubarak M.
    [J]. PLOS ONE, 2024, 19 (05):
  • [9] A high-level intermediate language and the algorithms for compiling finite-domain constraints
    Zhou, NF
    [J]. LOGIC PROGRAMMING - PROCEEDINGS OF THE 1998 JOINT INTERNATIONAL CONFERENCE AND SYMPOSIUM ON LOGIC PROGRAMMING, 1998, : 70 - 84
  • [10] Next-Generation Intermediate Representations for Binary Code Analysis
    Solovev, M. A.
    Bakulin, M. G.
    Gorbachev, M. S.
    Manushin, D., V
    Padaryan, V. A.
    Panasenko, S. S.
    [J]. PROGRAMMING AND COMPUTER SOFTWARE, 2019, 45 (07) : 424 - 437