인공신경망 기계번역 및 어셈블리 언어 모델 기반 자동 바이너리 리프팅

Vol. 35, No. 2, pp. 241-251, 4월. 2025
10.13089/JKIISC.2025.35.2.241, Full Text:
Keywords: Binary Lifting, neural machine translation, Compiler
Abstract

When binaries are translated into an intermediate representation, it becomes possible to analyze them without depending on a specific instruction set architecture. However, existing heuristic-based binary lifting techniques produce imprecise results, making them difficult to use for static analysis. Moreover, improving them requires substantial development resources, leading to a practical halt in their advancement. To address this issue, this paper proposes an automated binary lifting technique using neural machine translation. The proposed model is based on a Transformer architecture, and notably utilizes a pre-trained assembly language model as the encoder, enabling accurate translation of assembly sequences into LLVM IR sequences. To achieve this, we utilized the LLVM framework to extract 50,000 pairs of semantically equivalent sequences from randomly generated C programs, thereby constructing a large-scale training dataset. Evaluating the trained model's performance using BLEU scores showed approximately 20% higher accuracy than prior research and about 60% greater accuracy compared to the GPT-4o model.

Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from December 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
또올가, 임강빈, 윤요섭, 정민찬, "Automatic Binary Lifting Based on Neural Machine Translation and Assembly Language Model," Journal of The Korea Institute of Information Security and Cryptology, vol. 35, no. 2, pp. 241-251, 2025. DOI: 10.13089/JKIISC.2025.35.2.241.

[ACM Style]
또올가, 임강빈, 윤요섭, and 정민찬. 2025. Automatic Binary Lifting Based on Neural Machine Translation and Assembly Language Model. Journal of The Korea Institute of Information Security and Cryptology, 35, 2, (2025), 241-251. DOI: 10.13089/JKIISC.2025.35.2.241.