SK2Decompile

This commit is contained in:
albertan017 2025-10-04 22:22:24 +08:00 committed by GitHub
commit 766f3ec79f
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -22,6 +22,7 @@ Reverse Engineering: Decompiling Binary Code with Large Language Models
[![GitHub Tread](https://trendshift.io/api/badge/repositories/8664)](https://trendshift.io/repositories/8664)
## Updates
* [2025-05-20]: Release SK²Decompile: LLM-based Two-Phase Binary Decompilation from Skeleton to Skin. Phase 1 Structure Recovery (Skeleton): Transform binary/pseudo-code into obfuscated intermediate representations 🤗 [HF Link](https://huggingface.co/LLM4Binary/sk2decompile-struct-6.7b). Phase 2 Identifier Naming (Skin): Generate human-readable source code with meaningful identifiers 🤗 [HF Link](https://huggingface.co/LLM4Binary/sk2decompile-ident-6.7).
* [2025-05-20]: Release [decompile-bench](https://huggingface.co/collections/LLM4Binary/decompile-bench-68259091c8d49d0ebd5efda9), contains two million binary-source function pairs for training, and 70K function pairs for evaluation. Please refer to the [decompile-bench](https://github.com/albertan017/LLM4Decompile/tree/main/decompile-bench) folder for details.
* [2024-10-17]: Release [decompile-ghidra-100k](https://huggingface.co/datasets/LLM4Binary/decompile-ghidra-100k), a subset of 100k training samples (25k per optimization level). We provide a [training script](https://github.com/albertan017/LLM4Decompile/blob/main/train/README.md) that runs in ~3.5 hours on a single A100 40G GPU. It achieves a 0.26 re-executability rate, with a total cost of under $20 for quick replication of LLM4Decompile.
* [2024-09-26]: Update a [Colab notebook](https://colab.research.google.com/drive/1X5TuUKuNuksGJZz6Cc83KKI0ATBP9q7r?usp=sharing) to demonstrate the usage of the LLM4Decompile model, including examples for the LLM4Decompile-End and LLM4Decompile-Ref models.