mirror of
https://github.com/albertan017/LLM4Decompile.git
synced 2026-06-17 01:55:50 +00:00
Merge pull request #73 from BaiRiDreamer/main
Merge VERL RL training + BringUpBench evaluation pipeline
This commit is contained in:
commit
85b364bf09
30 changed files with 7308 additions and 6 deletions
|
|
@ -34,8 +34,19 @@ Binary/Pseudo-code → [Phase 1: Skeleton] → Normalized IR → [Phase 2: Skin]
|
|||
SK2Decompile/
|
||||
├── Preprocess/ # Data preprocessing and normalization tools
|
||||
├── LLaMA-Factory/ # Supervised Fine-Tuning (SFT) implementation
|
||||
├── verl/ # Reinforcement Learning (RL) with compiler-based rewards
|
||||
├── verl/ # Reinforcement Learning (RL) with VERL/GRPO
|
||||
│ └── SK2DECOMPILE/
|
||||
│ ├── data/ # Example RL training data
|
||||
│ ├── reward_functions/ # Custom reward functions (4 variants)
|
||||
│ ├── scripts/ # Training launch scripts
|
||||
│ └── README.md # Detailed RL documentation
|
||||
├── evaluation/ # Comprehensive evaluation suite
|
||||
│ ├── bringupbench/ # BringUpBench evaluation (Section A.6)
|
||||
│ │ ├── scripts/ # Pipeline scripts (compile, decompile, evaluate)
|
||||
│ │ ├── data/ # Pre-built function maps and inference results
|
||||
│ │ ├── reports/ # Evaluation result summaries
|
||||
│ │ └── README.md # Detailed BringUpBench documentation
|
||||
│ └── ... # HumanEval, MBPP evaluation scripts
|
||||
└── README.md # This file
|
||||
```
|
||||
|
||||
|
|
@ -107,20 +118,32 @@ llamafactory-cli train LLaMA-Factory/SK2DECOMPILE/train/norm2code-example.yaml
|
|||
|
||||
### Phase 2: Reinforcement Learning (RL)
|
||||
|
||||
Fine-tune models using compiler-based rewards for improved correctness:
|
||||
After SFT, we apply GRPO (Group Relative Policy Optimization) to further align each model with task-specific objectives (Section 3.5 of the paper):
|
||||
|
||||
- **Structure Recovery**
|
||||
- **Identifier Naming**
|
||||
|
||||
Our RL training is based on [VERL](https://github.com/volcengine/verl) v0.4.1 (Sheng et al., 2024).
|
||||
|
||||
#### Setup VERL
|
||||
```bash
|
||||
cd ../verl
|
||||
# Follow installation instructions in verl/README.md
|
||||
git clone https://github.com/volcengine/verl.git
|
||||
cd verl && git checkout v0.4.1 && pip install -e .
|
||||
pip install tree-sitter==0.24.0 tree-sitter-c==0.23.4 openai
|
||||
```
|
||||
|
||||
#### Run RL Training
|
||||
```bash
|
||||
bash verl/SK2DECOMPILE/train/sk2decompile-rl.sh
|
||||
# Structure Recovery RL
|
||||
bash verl/SK2DECOMPILE/scripts/run_struct_rl.sh
|
||||
|
||||
# Identifier Naming RL (requires embedding server)
|
||||
bash verl/SK2DECOMPILE/scripts/run_ident_rl.sh
|
||||
```
|
||||
|
||||
**RL Training Data:** `verl/SK2DECOMPILE/data/sk2decompile-rl-examples.parquet`
|
||||
See [`verl/SK2DECOMPILE/README.md`](verl/SK2DECOMPILE/README.md) for the full reproduction guide, including how to integrate reward functions into VERL and prepare training data.
|
||||
|
||||
**RL Training Data:** `verl/SK2DECOMPILE/data/sk2decompile-rl-examples.jsonl`
|
||||
|
||||
## Evaluation
|
||||
```
|
||||
|
|
@ -181,6 +204,12 @@ python gpt_judge.py --json_file your_json_file_path
|
|||
--api_key your_openai_api_key
|
||||
```
|
||||
|
||||
**BringUpBench Evaluation** (Section A.6 of the paper)
|
||||
|
||||
We also evaluate on [BringUpBench](https://github.com/toddmaustin/bringup-bench) — 90 self-contained C programs with 505 functions across O0–O3. SK²Decompile achieves **42.3% compilation rate** and **27.0% re-executability rate**, compared to IDA Pro's 23.6% / 21.7%.
|
||||
|
||||
See [`evaluation/bringupbench/README.md`](evaluation/bringupbench/README.md) for the full reproduction pipeline, pre-built data, and detailed results.
|
||||
|
||||
## 📊 Results
|
||||
|
||||
Our approach achieves state-of-the-art performance:
|
||||
|
|
|
|||
249
sk2decompile/evaluation/bringupbench/README.md
Normal file
249
sk2decompile/evaluation/bringupbench/README.md
Normal file
|
|
@ -0,0 +1,249 @@
|
|||
# SK²Decompile — Evaluation on BringUpBench
|
||||
|
||||
This directory contains the evaluation pipeline for SK²Decompile on the [BringUpBench](https://github.com/toddmaustin/bringup-bench) benchmark, as described in **Section A.6** of our paper:
|
||||
|
||||
> **SK²Decompile: LLM-based Two-Phase Binary Decompilation from Skeleton to Skin**
|
||||
> [[arXiv:2509.22114]](https://arxiv.org/abs/2509.22114)
|
||||
|
||||
## Overview
|
||||
|
||||
[BringUpBench](https://github.com/toddmaustin/bringup-bench) (Austin, 2024) is a benchmark suite of **90 self-contained C programs** designed for bringing up newly designed CPUs, accelerators, compilers, and operating systems. It has **zero library dependencies** — all programs rely solely on a built-in `libmin` library and only 4 system calls — making it an ideal, standardized test bed for decompilation evaluation on complex, real-world binaries.
|
||||
|
||||
We compiled, decompiled, and executed all projects across optimization levels O0–O3, yielding **505 functions** in total. We compared SK²Decompile against the industry-standard rule-based decompiler, **IDA Pro** (Hex-Rays).
|
||||
|
||||
## Results
|
||||
|
||||
### SK²Decompile vs IDA Pro
|
||||
|
||||
| Opt Level | Functions | SK²Decompile Compilable | SK²Decompile Executable | IDA Compilable | IDA Executable |
|
||||
|:---------:|:---------:|:-----------------------:|:-----------------------:|:--------------:|:--------------:|
|
||||
| O0 | 382 | **50.26%** | **49.48%** | — | — |
|
||||
| O1 | 379 | **40.90%** | **39.05%** | — | — |
|
||||
| O2 | 368 | **37.77%** | **34.24%** | — | — |
|
||||
| O3 | 359 | **31.75%** | **29.53%** | — | — |
|
||||
| **Avg** | **1488** | **42.3%** | **27.0%** | **23.6%** | **21.7%** |
|
||||
|
||||
> The average row reports the paper's aggregate numbers (Table 8 in Section A.6). Per-opt-level IDA baselines are not separately reported in the paper. Detailed per-benchmark breakdowns are available in `reports/`.
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
bringupbench/
|
||||
├── README.md # This file
|
||||
├── config.env # Environment configuration (paths)
|
||||
├── scripts/
|
||||
│ ├── build-host-opt-levels.sh # Step 1: Compile benchmarks at O0-O3
|
||||
│ ├── decompile-all-pseudo.sh # Step 2: IDA Pro batch decompilation
|
||||
│ ├── dump_pseudo.py # IDA headless decompilation helper
|
||||
│ ├── disasm-all-objdump.sh # Step 3: objdump batch disassembly
|
||||
│ ├── build-func-maps.py # Step 4: Build function-level mappings
|
||||
│ ├── clean-all-benchmarks.sh # Utility: clean all build artifacts
|
||||
│ └── eval_infer_out.py # Step 5: Automated evaluation
|
||||
├── data/
|
||||
│ ├── func_maps/ # Pre-built function mappings (JSONL)
|
||||
│ │ ├── merged.O0.func_map.jsonl # O0: 493 functions
|
||||
│ │ ├── merged.O1.func_map.jsonl # O1: 449 functions
|
||||
│ │ ├── merged.O2.func_map.jsonl # O2: 441 functions
|
||||
│ │ └── merged.O3.func_map.jsonl # O3: 439 functions
|
||||
│ └── infer_results/ # SK²Decompile inference results
|
||||
│ ├── merged.O0.func_map.infer.jsonl # O0: 382 evaluated functions
|
||||
│ ├── merged.O1.func_map.infer.jsonl # O1: 379 evaluated functions
|
||||
│ ├── merged.O2.func_map.infer.jsonl # O2: 368 evaluated functions
|
||||
│ └── merged.O3.func_map.infer.jsonl # O3: 359 evaluated functions
|
||||
└── reports/ # Evaluation result summaries
|
||||
├── O0_results.md
|
||||
├── O1_results.md
|
||||
├── O2_results.md
|
||||
└── O3_results.md
|
||||
```
|
||||
|
||||
## Reproduction Pipeline
|
||||
|
||||
Our evaluation pipeline consists of five steps, as described in the paper:
|
||||
|
||||
```
|
||||
Source (.c)
|
||||
│
|
||||
▼ Step 1: Compilation
|
||||
Binary (.host.O0 ~ .host.O3)
|
||||
│
|
||||
├──▶ Step 2: Baseline Extraction (IDA Pro) ──▶ Pseudocode (.pseudo)
|
||||
│
|
||||
├──▶ Step 3: Ground Truth Mapping ──▶ Function Maps (.func_map.jsonl)
|
||||
│
|
||||
▼ Step 4: Decompilation (SK²Decompile)
|
||||
Inferred C code (.func_map.infer.jsonl)
|
||||
│
|
||||
▼ Step 5: Validation
|
||||
Evaluation Reports (reports/)
|
||||
```
|
||||
|
||||
### Prerequisites
|
||||
|
||||
| Dependency | Purpose | Installation |
|
||||
|------------|---------|-------------|
|
||||
| [Bringup-Bench](https://github.com/toddmaustin/bringup-bench) | Upstream benchmark suite (90 C programs) | `git clone https://github.com/toddmaustin/bringup-bench.git` |
|
||||
| GCC | Compile benchmarks | `apt install gcc` |
|
||||
| IDA Pro + Hex-Rays | Decompile binaries to pseudocode | Commercial software |
|
||||
| objdump (binutils) | Disassemble binaries | `apt install binutils` |
|
||||
| clang-format | Pseudocode normalization | `apt install clang-format` |
|
||||
| Python >= 3.10 | Run evaluation scripts | `apt install python3` |
|
||||
|
||||
### Quick Start (Evaluation Only)
|
||||
|
||||
If you only want to reproduce the evaluation step (Step 5), the pre-built data is included in `data/`. You only need the Bringup-Bench source repository:
|
||||
|
||||
```bash
|
||||
# 1. Clone Bringup-Bench
|
||||
git clone https://github.com/toddmaustin/bringup-bench.git
|
||||
|
||||
# 2. Configure paths
|
||||
cd bringupbench
|
||||
vim config.env # Set BENCH_REPO_ROOT to your bringup-bench path
|
||||
|
||||
# 3. Run evaluation (e.g., O0)
|
||||
python3 scripts/eval_infer_out.py data/infer_results/merged.O0.func_map.infer.jsonl
|
||||
|
||||
# 4. Check results
|
||||
cat reports/O0_results.md
|
||||
```
|
||||
|
||||
### Full Pipeline (From Scratch)
|
||||
|
||||
To reproduce the entire pipeline from compilation to evaluation:
|
||||
|
||||
```bash
|
||||
cd bringupbench
|
||||
vim config.env # Set BENCH_REPO_ROOT and IDA_BIN
|
||||
```
|
||||
|
||||
**Step 1: Compile benchmarks at O0–O3**
|
||||
|
||||
Build all 90 Bringup-Bench programs at four optimization levels, producing `<name>.host.O{0,1,2,3}` binaries.
|
||||
|
||||
```bash
|
||||
scripts/build-host-opt-levels.sh
|
||||
```
|
||||
|
||||
**Step 2: Baseline Extraction (IDA Pro)**
|
||||
|
||||
Use IDA Pro in headless mode to decompile all binaries, producing `.pseudo` files with Hex-Rays pseudocode.
|
||||
|
||||
```bash
|
||||
scripts/decompile-all-pseudo.sh
|
||||
```
|
||||
|
||||
Each function is delimited by `/* function_name @ 0xADDRESS */` in the output.
|
||||
|
||||
**Step 3: Ground Truth Mapping**
|
||||
|
||||
Parse source code, pseudocode, and assembly; match functions by name across all three representations; normalize pseudocode (remove IDA-specific types, hex-to-decimal conversion, clang-format).
|
||||
|
||||
```bash
|
||||
# Disassemble (optional, for assembly mapping)
|
||||
scripts/disasm-all-objdump.sh
|
||||
|
||||
# Build function-level mappings
|
||||
python3 scripts/build-func-maps.py
|
||||
```
|
||||
|
||||
Output: per-binary `.func_map.jsonl` files. Merge them per optimization level:
|
||||
|
||||
```bash
|
||||
cat $BENCH_REPO_ROOT/*/*.host.O0.func_map.jsonl > data/func_maps/merged.O0.func_map.jsonl
|
||||
cat $BENCH_REPO_ROOT/*/*.host.O1.func_map.jsonl > data/func_maps/merged.O1.func_map.jsonl
|
||||
cat $BENCH_REPO_ROOT/*/*.host.O2.func_map.jsonl > data/func_maps/merged.O2.func_map.jsonl
|
||||
cat $BENCH_REPO_ROOT/*/*.host.O3.func_map.jsonl > data/func_maps/merged.O3.func_map.jsonl
|
||||
```
|
||||
|
||||
**Step 4: Decompilation (SK²Decompile Inference)**
|
||||
|
||||
Feed the `pseudo_normalize` field from the function maps to SK²Decompile. The two-phase inference pipeline (see `../sk2decompile_inf.py`) produces C code for each function. Results should be written into the JSONL with the `pseudo.content-fix` field containing the final decompiled function body.
|
||||
|
||||
```bash
|
||||
# Example: use the main SK²Decompile inference pipeline
|
||||
cd ../ # back to sk2decompile/evaluation/
|
||||
python3 sk2decompile_inf.py \
|
||||
--dataset_path bringupbench/data/func_maps/merged.O0.func_map.jsonl \
|
||||
--model_path LLM4Binary/sk2decompile-struct-6.7b \
|
||||
--recover_model_path LLM4Binary/sk2decompile-ident-6.7b
|
||||
```
|
||||
|
||||
**Step 5: Validation**
|
||||
|
||||
For each function, replace the original source with the decompiled output, rebuild in an isolated workspace, and run the project's test suite.
|
||||
|
||||
```bash
|
||||
python3 scripts/eval_infer_out.py data/infer_results/merged.O0.func_map.infer.jsonl \
|
||||
--jobs 16 \
|
||||
--command-timeout 20
|
||||
```
|
||||
|
||||
Common options:
|
||||
|
||||
```bash
|
||||
--jobs N # Parallel workers (default: 96)
|
||||
--command-timeout S # Timeout per make command in seconds (default: 20)
|
||||
--limit N # Process only first N cases (for debugging)
|
||||
--keep-workspaces # Keep temporary build directories
|
||||
```
|
||||
|
||||
## Data Format
|
||||
|
||||
### func_map.jsonl (Function Mappings)
|
||||
|
||||
Each line is a JSON object containing the source, pseudocode, and assembly for one function:
|
||||
|
||||
```jsonc
|
||||
{
|
||||
"source": {
|
||||
"path": "ackermann/ackermann.c", // Source file (relative to BENCH_REPO_ROOT)
|
||||
"function_name": "ackermann", // Function name
|
||||
"content": "int ackermann(int m, ...) { ... }\n" // Complete function body
|
||||
},
|
||||
"pseudo": {
|
||||
"path": "ackermann/ackermann.host.O0.pseudo",
|
||||
"function_name": "ackermann",
|
||||
"address": "0x11e9", // Function address in binary
|
||||
"label": "ackermann",
|
||||
"content": "__int64 __fastcall ackermann(...) { ... }\n" // Raw IDA pseudocode
|
||||
},
|
||||
"pseudo_normalize": "int ackermann(...) { ... }", // Normalized pseudocode
|
||||
"binary": "ackermann/ackermann.host.O0", // Binary file path
|
||||
"assembly": "<ackermann>:\npush %rbp\n..." // Cleaned objdump output
|
||||
}
|
||||
```
|
||||
|
||||
### func_map.infer.jsonl (Inference Results)
|
||||
|
||||
Extends `func_map.jsonl` with SK²Decompile inference outputs:
|
||||
|
||||
```jsonc
|
||||
{
|
||||
// ... all fields from func_map.jsonl ...
|
||||
"pseudo": {
|
||||
// ... all fields above, plus:
|
||||
"content-fix": "..." // Final decompiled function (used for source replacement)
|
||||
},
|
||||
"infer-out-model1": "...", // Phase 1 (Structure Recovery) raw output
|
||||
"infer-out-model2": "...", // Phase 2 (Identifier Naming) raw output
|
||||
"pseudo_normalize-fix": "..." // Corrected normalized pseudocode
|
||||
}
|
||||
```
|
||||
|
||||
## Evaluation Metrics
|
||||
|
||||
| Metric | Definition |
|
||||
|--------|-----------|
|
||||
| **Replacement Rate** | Fraction of functions where the decompiled output can be located and substituted into the original source file |
|
||||
| **Compilable Rate** | Fraction of functions where the modified source compiles successfully (`make build`) |
|
||||
| **Executable Rate** | Fraction of functions where the compiled program passes its test suite (`make test`, output matches reference) |
|
||||
|
||||
The evaluation uses BringUpBench's own build infrastructure (`Makefile`, `libmin`, `libtarg`) to compile and validate. Each function is tested in an isolated workspace to prevent cross-contamination.
|
||||
|
||||
## Notes
|
||||
|
||||
- BringUpBench programs are self-contained with zero external dependencies, making them ideal for evaluating decompilation without the confounding factor of missing headers or libraries.
|
||||
- The `func_maps/` data contains more functions than `infer_results/` because some functions are filtered during inference (e.g., exceeding token limits).
|
||||
- All scripts load paths from `config.env`. You can also override via environment variables or CLI arguments (priority: CLI > env > config.env).
|
||||
- For the complete SK²Decompile methodology and other benchmark results (HumanEval, MBPP, ExeBench, GitHub2025), see the [main README](../../README.md).
|
||||
14
sk2decompile/evaluation/bringupbench/config.env
Normal file
14
sk2decompile/evaluation/bringupbench/config.env
Normal file
|
|
@ -0,0 +1,14 @@
|
|||
# BringUpBench Evaluation — Environment Configuration
|
||||
# All scripts resolve paths from this file.
|
||||
# Values can be overridden by same-named environment variables or CLI arguments.
|
||||
# Priority: CLI args > environment variables > config.env
|
||||
|
||||
# Absolute path to the Bringup-Bench repository
|
||||
# Clone from: https://github.com/toddmaustin/bringup-bench.git
|
||||
BENCH_REPO_ROOT=/path/to/bringup-bench
|
||||
|
||||
# IDA Pro command-line executable (required for Step 2: decompilation)
|
||||
IDA_BIN=/path/to/idat
|
||||
|
||||
# Default build target (host = native x86-64 Linux)
|
||||
DEFAULT_TARGET=host
|
||||
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
296
sk2decompile/evaluation/bringupbench/reports/O0_results.md
Normal file
296
sk2decompile/evaluation/bringupbench/reports/O0_results.md
Normal file
|
|
@ -0,0 +1,296 @@
|
|||
# Infer-Out Model 2 Evaluation (merged.O0.func_map.infer-host)
|
||||
|
||||
- Timestamp: 20251119-171008
|
||||
- Source JSONL: merged.O0.func_map.infer.jsonl
|
||||
- Target: host
|
||||
- Total cases: 382
|
||||
- Replacement success: 382 (100.00%)
|
||||
- Compilable: 192 (50.26%)
|
||||
- Executable: 189 (49.48%)
|
||||
|
||||
## Benchmark Breakdown
|
||||
| Benchmark | Cases | Replacement% | Build% | Exec% |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| ackermann | 2 | 100.00% | 50.00% | 50.00% |
|
||||
| aes | 9 | 100.00% | 33.33% | 33.33% |
|
||||
| anagram | 12 | 100.00% | 58.33% | 58.33% |
|
||||
| audio-codec | 4 | 100.00% | 50.00% | 50.00% |
|
||||
| avl-tree | 14 | 100.00% | 35.71% | 35.71% |
|
||||
| banner | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| bit-kernels | 5 | 100.00% | 100.00% | 100.00% |
|
||||
| blake2b | 6 | 100.00% | 16.67% | 16.67% |
|
||||
| bloom-filter | 3 | 100.00% | 33.33% | 33.33% |
|
||||
| boyer-moore-search | 3 | 100.00% | 0.00% | 0.00% |
|
||||
| bubble-sort | 2 | 100.00% | 100.00% | 100.00% |
|
||||
| c-interp | 10 | 100.00% | 70.00% | 70.00% |
|
||||
| ccmac | 2 | 100.00% | 50.00% | 50.00% |
|
||||
| checkers | 15 | 100.00% | 80.00% | 80.00% |
|
||||
| cipher | 3 | 100.00% | 33.33% | 33.33% |
|
||||
| congrad | 6 | 100.00% | 66.67% | 66.67% |
|
||||
| connect4-minimax | 13 | 100.00% | 61.54% | 61.54% |
|
||||
| convex-hull | 4 | 100.00% | 75.00% | 75.00% |
|
||||
| dhrystone | 5 | 100.00% | 60.00% | 60.00% |
|
||||
| distinctness | 2 | 100.00% | 0.00% | 0.00% |
|
||||
| fft-int | 4 | 100.00% | 50.00% | 50.00% |
|
||||
| flood-fill | 2 | 100.00% | 50.00% | 50.00% |
|
||||
| frac-calc | 10 | 100.00% | 60.00% | 60.00% |
|
||||
| fuzzy-match | 4 | 100.00% | 25.00% | 25.00% |
|
||||
| fy-shuffle | 4 | 100.00% | 50.00% | 50.00% |
|
||||
| gcd-list | 2 | 100.00% | 0.00% | 0.00% |
|
||||
| grad-descent | 4 | 100.00% | 75.00% | 75.00% |
|
||||
| graph-tests | 19 | 100.00% | 21.05% | 21.05% |
|
||||
| hanoi | 2 | 100.00% | 50.00% | 50.00% |
|
||||
| heapsort | 2 | 100.00% | 50.00% | 50.00% |
|
||||
| heat-calc | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| huff-encode | 12 | 100.00% | 91.67% | 91.67% |
|
||||
| idct-alg | 4 | 100.00% | 50.00% | 50.00% |
|
||||
| indirect-test | 2 | 100.00% | 50.00% | 50.00% |
|
||||
| k-means | 6 | 100.00% | 100.00% | 100.00% |
|
||||
| kadane | 2 | 100.00% | 50.00% | 50.00% |
|
||||
| kepler | 7 | 100.00% | 28.57% | 28.57% |
|
||||
| knapsack | 3 | 100.00% | 33.33% | 33.33% |
|
||||
| knights-tour | 3 | 100.00% | 66.67% | 66.67% |
|
||||
| life | 14 | 100.00% | 78.57% | 71.43% |
|
||||
| longdiv | 7 | 100.00% | 71.43% | 71.43% |
|
||||
| lu-decomp | 3 | 100.00% | 33.33% | 33.33% |
|
||||
| lz-compress | 2 | 100.00% | 100.00% | 100.00% |
|
||||
| mandelbrot | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| matmult | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| max-subseq | 2 | 100.00% | 0.00% | 0.00% |
|
||||
| mersenne | 3 | 100.00% | 0.00% | 0.00% |
|
||||
| minspan | 8 | 100.00% | 62.50% | 62.50% |
|
||||
| monte-carlo | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| murmur-hash | 2 | 100.00% | 0.00% | 0.00% |
|
||||
| n-queens | 3 | 100.00% | 66.67% | 66.67% |
|
||||
| natlog | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| nbody-sim | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| nr-solver | 1 | 100.00% | 100.00% | 100.00% |
|
||||
| packet-filter | 3 | 100.00% | 33.33% | 33.33% |
|
||||
| parrondo | 3 | 100.00% | 33.33% | 33.33% |
|
||||
| pascal | 3 | 100.00% | 100.00% | 100.00% |
|
||||
| pi-calc | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| primal-test | 3 | 100.00% | 0.00% | 0.00% |
|
||||
| priority-queue | 5 | 100.00% | 80.00% | 80.00% |
|
||||
| qsort-demo | 5 | 100.00% | 0.00% | 0.00% |
|
||||
| qsort-test | 3 | 100.00% | 66.67% | 66.67% |
|
||||
| quaternions | 4 | 100.00% | 0.00% | 0.00% |
|
||||
| rabinkarp-search | 2 | 100.00% | 50.00% | 50.00% |
|
||||
| rand-test | 2 | 100.00% | 0.00% | 0.00% |
|
||||
| ransac | 2 | 100.00% | 50.00% | 50.00% |
|
||||
| regex-parser | 11 | 100.00% | 72.73% | 63.64% |
|
||||
| rho-factor | 4 | 100.00% | 75.00% | 75.00% |
|
||||
| rle-compress | 2 | 100.00% | 50.00% | 50.00% |
|
||||
| rsa-cipher | 4 | 100.00% | 0.00% | 0.00% |
|
||||
| sat-solver | 5 | 100.00% | 60.00% | 60.00% |
|
||||
| shortest-path | 3 | 100.00% | 66.67% | 66.67% |
|
||||
| sieve | 2 | 100.00% | 50.00% | 50.00% |
|
||||
| simple-grep | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| spelt2num | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| spirograph | 2 | 100.00% | 50.00% | 50.00% |
|
||||
| sudoku-solver | 4 | 100.00% | 75.00% | 75.00% |
|
||||
| tetris-sim | 12 | 100.00% | 75.00% | 75.00% |
|
||||
| tiny-NN | 2 | 100.00% | 50.00% | 50.00% |
|
||||
| topo-sort | 7 | 100.00% | 0.00% | 0.00% |
|
||||
| totient | 4 | 100.00% | 75.00% | 75.00% |
|
||||
| transcend | 3 | 100.00% | 66.67% | 66.67% |
|
||||
| uniquify | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| vectors-3d | 8 | 100.00% | 12.50% | 12.50% |
|
||||
| verlet | 4 | 100.00% | 25.00% | 0.00% |
|
||||
| weekday | 2 | 100.00% | 0.00% | 0.00% |
|
||||
|
||||
## Compilation Failures
|
||||
- ackermann/ackermann.c::main@0x13b9
|
||||
- aes/aes.c::aes_decrypt@0x1a65
|
||||
- aes/aes.c::aes_encrypt@0x1943
|
||||
- aes/aes.c::inv_shift_rows@0x1396
|
||||
- aes/aes.c::key_expansion@0x179a
|
||||
- aes/aes.c::main@0x1b87
|
||||
- aes/aes.c::shift_rows@0x12e5
|
||||
- anagram/anagram.c::BuildMask@0x13e7
|
||||
- anagram/anagram.c::BuildWord@0x17e5
|
||||
- anagram/anagram.c::FindAnagram@0x1ba6
|
||||
- anagram/anagram.c::ReadDict@0x121f
|
||||
- anagram/anagram.c::main@0x1f71
|
||||
- audio-codec/audio-codec.c::decode@0x12f5
|
||||
- audio-codec/audio-codec.c::main@0x14b3
|
||||
- avl-tree/avlcore.c::DeleteByElement@0x240f
|
||||
- avl-tree/avlcore.c::DeleteByElementRecursive@0x21af
|
||||
- avl-tree/avlcore.c::DeleteLeftMost@0x2086
|
||||
- avl-tree/avlcore.c::FindByElement@0x1a46
|
||||
- avl-tree/avlcore.c::Height@0x2475
|
||||
- avl-tree/avlcore.c::Insert@0x1fc4
|
||||
- avl-tree/avlcore.c::SingleLeftRotation@0x1b3a
|
||||
- avl-tree/avl-tree.c::main@0x1399
|
||||
- avl-tree/avl-tree.c::printTree@0x11e9
|
||||
- banner/banner.c::main@0x11e9
|
||||
- blake2b/blake2b.c::BLAKE2B@0x1a9b
|
||||
- blake2b/blake2b.c::F@0x1502
|
||||
- blake2b/blake2b.c::G@0x1258
|
||||
- blake2b/blake2b.c::blake2b@0x1cd3
|
||||
- blake2b/blake2b.c::test@0x2071
|
||||
- bloom-filter/bloom-filter.c::bad_search@0x11e9
|
||||
- bloom-filter/bloom-filter.c::main@0x123d
|
||||
- boyer-moore-search/boyer-moore-search.c::badCharHeuristic@0x11e9
|
||||
- boyer-moore-search/boyer-moore-search.c::main@0x146d
|
||||
- boyer-moore-search/boyer-moore-search.c::search@0x126d
|
||||
- c-interp/c-interp.c::eval@0x457c
|
||||
- c-interp/c-interp.c::main@0x4e03
|
||||
- c-interp/c-interp.c::next@0x11e9
|
||||
- ccmac/ccmac.c::main@0x127e
|
||||
- checkers/functions.c::fill_print_initial@0x1793
|
||||
- checkers/functions.c::generate_node_children@0x29ff
|
||||
- checkers/checkers.c::main@0x11e9
|
||||
- cipher/cipher.c::encipher@0x11e9
|
||||
- cipher/cipher.c::main@0x13cd
|
||||
- congrad/congrad.c::cg_solve@0x1643
|
||||
- congrad/congrad.c::main@0x199b
|
||||
- connect4-minimax/connect4-minimax.c::init_board@0x11e9
|
||||
- connect4-minimax/connect4-minimax.c::main@0x2299
|
||||
- connect4-minimax/connect4-minimax.c::minimax@0x1d07
|
||||
- connect4-minimax/connect4-minimax.c::play_game@0x20d1
|
||||
- connect4-minimax/connect4-minimax.c::score_position@0x1a02
|
||||
- convex-hull/convex-hull.c::main@0x13e7
|
||||
- dhrystone/dhrystone.c::Proc_1@0x199f
|
||||
- dhrystone/dhrystone.c::main@0x11e9
|
||||
- distinctness/distinctness.c::isDistinct@0x11e9
|
||||
- distinctness/distinctness.c::main@0x15d8
|
||||
- fft-int/fft-int.c::db_from_ampl@0x1807
|
||||
- fft-int/fft-int.c::fix_fft@0x11e9
|
||||
- flood-fill/flood-fill.c::main@0x144d
|
||||
- frac-calc/frac-calc.c::copyr@0x14d4
|
||||
- frac-calc/frac-calc.c::divtokens@0x15b8
|
||||
- frac-calc/frac-calc.c::help@0x13d9
|
||||
- frac-calc/frac-calc.c::main@0x11e9
|
||||
- fuzzy-match/fuzzy-match.c::compute_score@0x2379
|
||||
- fuzzy-match/fuzzy-match.c::fuzzy_match_recurse@0x2283
|
||||
- fuzzy-match/fuzzy-match.c::main@0x24b3
|
||||
- fy-shuffle/fy-shuffle.c::main@0x1378
|
||||
- fy-shuffle/fy-shuffle.c::rand_int@0x11e9
|
||||
- gcd-list/gcd-list.c::gcd@0x11e9
|
||||
- gcd-list/gcd-list.c::main@0x125e
|
||||
- grad-descent/grad-descent.c::main@0x1413
|
||||
- graph-tests/graph-tests.c::addEdge@0x12c9
|
||||
- graph-tests/graph-tests.c::addVertex@0x19f6
|
||||
- graph-tests/graph-tests.c::bfs@0x15ce
|
||||
- graph-tests/graph-tests.c::bfs_test@0x16e9
|
||||
- graph-tests/graph-tests.c::bubbleSort@0x1829
|
||||
- graph-tests/graph-tests.c::createGraph@0x1221
|
||||
- graph-tests/graph-tests.c::createNode@0x11e9
|
||||
- graph-tests/graph-tests.c::createQueue@0x1372
|
||||
- graph-tests/graph-tests.c::dequeue@0x145d
|
||||
- graph-tests/graph-tests.c::enqueue@0x13d7
|
||||
- graph-tests/graph-tests.c::insertAtTheBegin@0x17b1
|
||||
- graph-tests/graph-tests.c::link_list@0x18b8
|
||||
- graph-tests/graph-tests.c::main@0x1d6c
|
||||
- graph-tests/graph-tests.c::printQueue@0x151b
|
||||
- graph-tests/graph-tests.c::swap@0x17f8
|
||||
- hanoi/hanoi.c::main@0x12d4
|
||||
- heapsort/heapsort.c::main@0x155f
|
||||
- heat-calc/heat-calc.c::main@0x11e9
|
||||
- huff-encode/huff-encode.c::main@0x192d
|
||||
- idct-alg/idct-alg.c::C@0x11e9
|
||||
- idct-alg/idct-alg.c::main@0x1472
|
||||
- indirect-test/indirect-test.c::main@0x12c9
|
||||
- kadane/kadane.c::main@0x1276
|
||||
- kepler/kepler.c::bin_fact@0x1b3e
|
||||
- kepler/kepler.c::binary@0x12c6
|
||||
- kepler/kepler.c::e_series@0x1389
|
||||
- kepler/kepler.c::j_series@0x1501
|
||||
- kepler/kepler.c::main@0x1608
|
||||
- knapsack/knapsack.c::main@0x138e
|
||||
- knapsack/knapsack.c::max@0x11e9
|
||||
- knights-tour/knights-tour.c::solveKT@0x12d6
|
||||
- life/life.c::getNumNeigbors@0x156f
|
||||
- life/life.c::main@0x11e9
|
||||
- life/life.c::process@0x1426
|
||||
- longdiv/longdiv.c::main@0x18fd
|
||||
- longdiv/longdiv.c::sub@0x11e9
|
||||
- lu-decomp/lu-decomp.c::main@0x1520
|
||||
- lu-decomp/lu-decomp.c::print_matrix@0x11e9
|
||||
- mandelbrot/mandelbrot.c::main@0x1220
|
||||
- matmult/matmult.c::main@0x11e9
|
||||
- max-subseq/max-subseq.c::lcsAlgo@0x11e9
|
||||
- max-subseq/max-subseq.c::main@0x171a
|
||||
- mersenne/mersenne.c::genrand@0x12ee
|
||||
- mersenne/mersenne.c::main@0x153a
|
||||
- mersenne/mersenne.c::sgenrand@0x11e9
|
||||
- minspan/minspan.c::displayPath@0x1af2
|
||||
- minspan/minspan.c::main@0x1d8f
|
||||
- minspan/minspan.c::minSpanTree@0x1297
|
||||
- monte-carlo/monte-carlo.c::main@0x11e9
|
||||
- murmur-hash/murmur-hash.c::main@0x13a9
|
||||
- murmur-hash/murmur-hash.c::murmurhash@0x11e9
|
||||
- n-queens/n-queens.c::main@0x12ec
|
||||
- natlog/natlog.c::main@0x11e9
|
||||
- nbody-sim/nbody-sim.c::main@0x11e9
|
||||
- packet-filter/packet-filter.c::generate_packet@0x11e9
|
||||
- packet-filter/packet-filter.c::main@0x14c3
|
||||
- parrondo/parrondo.c::cointoss@0x11e9
|
||||
- parrondo/parrondo.c::main@0x12cb
|
||||
- pi-calc/pi-calc.c::main@0x11e9
|
||||
- primal-test/primal-test.c::main@0x1459
|
||||
- primal-test/primal-test.c::miller_rabin_int@0x12fd
|
||||
- primal-test/primal-test.c::powm@0x11e9
|
||||
- priority-queue/priority-queue.c::main@0x13ee
|
||||
- qsort-demo/qsort-demo.c::main@0x17bf
|
||||
- qsort-demo/qsort-demo.c::print_struct_array@0x155e
|
||||
- qsort-demo/qsort-demo.c::sort_cstrings_example@0x1401
|
||||
- qsort-demo/qsort-demo.c::sort_integers_example@0x1280
|
||||
- qsort-demo/qsort-demo.c::sort_structs_example@0x1603
|
||||
- qsort-test/qsort-test.c::main@0x1415
|
||||
- quaternions/quaternions.c::euler_from_quat@0x1447
|
||||
- quaternions/quaternions.c::quat_from_euler@0x11e9
|
||||
- quaternions/quaternions.c::quaternion_multiply@0x1655
|
||||
- quaternions/quaternions.c::test@0x18b2
|
||||
- rabinkarp-search/rabinkarp-search.c::main@0x1341
|
||||
- rand-test/rand-test.c::main@0x1913
|
||||
- rand-test/rand-test.c::run_tests@0x1258
|
||||
- ransac/ransac.c::main@0x1466
|
||||
- regex-parser/regex-parser.c::main@0x32b9
|
||||
- regex-parser/regex-parser.c::re_compile@0x22e1
|
||||
- regex-parser/regex-parser.c::re_print@0x278f
|
||||
- rho-factor/rho-factor.c::main@0x5c7d
|
||||
- rle-compress/rle-compress.c::run_length_encode@0x11e9
|
||||
- rsa-cipher/rsa-cipher.c::main@0x1634
|
||||
- rsa-cipher/rsa-cipher.c::mod_inverse@0x1363
|
||||
- rsa-cipher/rsa-cipher.c::mod_pow@0x11e9
|
||||
- rsa-cipher/rsa-cipher.c::print_hex_int128@0x14ef
|
||||
- sat-solver/sat-solver.c::main@0x1518
|
||||
- sat-solver/sat-solver.c::printFormula@0x1391
|
||||
- shortest-path/shortest-path.c::main@0x1469
|
||||
- sieve/sieve.c::main@0x1300
|
||||
- simple-grep/simple-grep.c::main@0x11e9
|
||||
- spelt2num/spelt2num.c::main@0x11e9
|
||||
- spirograph/spirograph.c::spirograph@0x11e9
|
||||
- sudoku-solver/sudoku-solver.c::main@0x1532
|
||||
- tetris-sim/tetris-sim.c::best_move@0x1810
|
||||
- tetris-sim/tetris-sim.c::evaluate_board@0x1686
|
||||
- tetris-sim/tetris-sim.c::main@0x1ba5
|
||||
- tiny-NN/tiny-NN.c::train@0x1485
|
||||
- topo-sort/topo-sort.c::addEdge@0x12cf
|
||||
- topo-sort/topo-sort.c::createGraph@0x1259
|
||||
- topo-sort/topo-sort.c::createListNode@0x1221
|
||||
- topo-sort/topo-sort.c::createStackNode@0x11e9
|
||||
- topo-sort/topo-sort.c::main@0x153d
|
||||
- topo-sort/topo-sort.c::topologicalSort@0x13fd
|
||||
- topo-sort/topo-sort.c::topologicalSortUtil@0x1332
|
||||
- totient/totient.c::my_gcd@0x11e9
|
||||
- transcend/transcend.c::init_inputs_f64@0x1235
|
||||
- uniquify/uniquify.c::main@0x1228
|
||||
- vectors-3d/vectors-3d.c::get_cross_matrix@0x1601
|
||||
- vectors-3d/vectors-3d.c::print_vector@0x144f
|
||||
- vectors-3d/vectors-3d.c::test@0x17fb
|
||||
- vectors-3d/vectors-3d.c::unit_vec@0x1510
|
||||
- vectors-3d/vectors-3d.c::vector_add@0x126d
|
||||
- vectors-3d/vectors-3d.c::vector_prod@0x1373
|
||||
- vectors-3d/vectors-3d.c::vector_sub@0x11e9
|
||||
- verlet/verlet.c::main@0x170b
|
||||
- verlet/verlet.c::vb_init@0x1271
|
||||
- verlet/verlet.c::vb_step_avg@0x13aa
|
||||
- weekday/weekday.c::dayOfWeek@0x11e9
|
||||
- weekday/weekday.c::main@0x130d
|
||||
|
||||
## Execution Failures
|
||||
- life/life.c::init@0x1237
|
||||
- regex-parser/regex-parser.c::matchpattern@0x313f
|
||||
- verlet/verlet.c::vb_checksum@0x160b
|
||||
334
sk2decompile/evaluation/bringupbench/reports/O1_results.md
Normal file
334
sk2decompile/evaluation/bringupbench/reports/O1_results.md
Normal file
|
|
@ -0,0 +1,334 @@
|
|||
# Infer-Out Model 2 Evaluation (merged.O1.func_map.infer-host)
|
||||
|
||||
- Timestamp: 20251119-171212
|
||||
- Source JSONL: merged.O1.func_map.infer.jsonl
|
||||
- Target: host
|
||||
- Total cases: 379
|
||||
- Replacement success: 379 (100.00%)
|
||||
- Compilable: 155 (40.90%)
|
||||
- Executable: 148 (39.05%)
|
||||
|
||||
## Benchmark Breakdown
|
||||
| Benchmark | Cases | Replacement% | Build% | Exec% |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| ackermann | 2 | 100.00% | 50.00% | 50.00% |
|
||||
| aes | 9 | 100.00% | 33.33% | 33.33% |
|
||||
| anagram | 13 | 100.00% | 53.85% | 53.85% |
|
||||
| audio-codec | 3 | 100.00% | 0.00% | 0.00% |
|
||||
| avl-tree | 17 | 100.00% | 29.41% | 29.41% |
|
||||
| banner | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| bit-kernels | 5 | 100.00% | 80.00% | 80.00% |
|
||||
| blake2b | 5 | 100.00% | 20.00% | 20.00% |
|
||||
| bloom-filter | 4 | 100.00% | 50.00% | 50.00% |
|
||||
| boyer-moore-search | 3 | 100.00% | 0.00% | 0.00% |
|
||||
| bubble-sort | 3 | 100.00% | 100.00% | 100.00% |
|
||||
| c-interp | 10 | 100.00% | 60.00% | 60.00% |
|
||||
| ccmac | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| checkers | 16 | 100.00% | 81.25% | 81.25% |
|
||||
| cipher | 3 | 100.00% | 33.33% | 0.00% |
|
||||
| congrad | 2 | 100.00% | 0.00% | 0.00% |
|
||||
| connect4-minimax | 13 | 100.00% | 61.54% | 61.54% |
|
||||
| convex-hull | 4 | 100.00% | 75.00% | 75.00% |
|
||||
| dhrystone | 5 | 100.00% | 40.00% | 40.00% |
|
||||
| distinctness | 2 | 100.00% | 0.00% | 0.00% |
|
||||
| fft-int | 4 | 100.00% | 75.00% | 75.00% |
|
||||
| flood-fill | 2 | 100.00% | 50.00% | 50.00% |
|
||||
| frac-calc | 10 | 100.00% | 40.00% | 40.00% |
|
||||
| fuzzy-match | 3 | 100.00% | 33.33% | 33.33% |
|
||||
| fy-shuffle | 3 | 100.00% | 33.33% | 33.33% |
|
||||
| gcd-list | 2 | 100.00% | 0.00% | 0.00% |
|
||||
| grad-descent | 4 | 100.00% | 0.00% | 0.00% |
|
||||
| graph-tests | 19 | 100.00% | 21.05% | 21.05% |
|
||||
| hanoi | 2 | 100.00% | 50.00% | 50.00% |
|
||||
| heapsort | 2 | 100.00% | 50.00% | 50.00% |
|
||||
| heat-calc | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| huff-encode | 13 | 100.00% | 92.31% | 92.31% |
|
||||
| idct-alg | 3 | 100.00% | 66.67% | 33.33% |
|
||||
| indirect-test | 2 | 100.00% | 50.00% | 50.00% |
|
||||
| k-means | 6 | 100.00% | 50.00% | 50.00% |
|
||||
| kadane | 2 | 100.00% | 50.00% | 50.00% |
|
||||
| kepler | 7 | 100.00% | 14.29% | 14.29% |
|
||||
| knapsack | 3 | 100.00% | 33.33% | 33.33% |
|
||||
| knights-tour | 3 | 100.00% | 66.67% | 66.67% |
|
||||
| life | 14 | 100.00% | 21.43% | 14.29% |
|
||||
| longdiv | 7 | 100.00% | 71.43% | 71.43% |
|
||||
| lu-decomp | 3 | 100.00% | 33.33% | 33.33% |
|
||||
| lz-compress | 2 | 100.00% | 100.00% | 100.00% |
|
||||
| mandelbrot | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| matmult | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| max-subseq | 2 | 100.00% | 0.00% | 0.00% |
|
||||
| mersenne | 3 | 100.00% | 0.00% | 0.00% |
|
||||
| minspan | 8 | 100.00% | 37.50% | 25.00% |
|
||||
| monte-carlo | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| murmur-hash | 2 | 100.00% | 0.00% | 0.00% |
|
||||
| n-queens | 3 | 100.00% | 66.67% | 66.67% |
|
||||
| natlog | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| nbody-sim | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| nr-solver | 1 | 100.00% | 100.00% | 100.00% |
|
||||
| packet-filter | 4 | 100.00% | 25.00% | 25.00% |
|
||||
| parrondo | 2 | 100.00% | 0.00% | 0.00% |
|
||||
| pascal | 3 | 100.00% | 33.33% | 33.33% |
|
||||
| pi-calc | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| primal-test | 3 | 100.00% | 33.33% | 33.33% |
|
||||
| priority-queue | 5 | 100.00% | 80.00% | 80.00% |
|
||||
| qsort-demo | 7 | 100.00% | 28.57% | 28.57% |
|
||||
| qsort-test | 5 | 100.00% | 80.00% | 80.00% |
|
||||
| quaternions | 4 | 100.00% | 0.00% | 0.00% |
|
||||
| rabinkarp-search | 2 | 100.00% | 0.00% | 0.00% |
|
||||
| rand-test | 3 | 100.00% | 0.00% | 0.00% |
|
||||
| ransac | 2 | 100.00% | 0.00% | 0.00% |
|
||||
| regex-parser | 8 | 100.00% | 25.00% | 12.50% |
|
||||
| rho-factor | 4 | 100.00% | 75.00% | 75.00% |
|
||||
| rle-compress | 2 | 100.00% | 0.00% | 0.00% |
|
||||
| rsa-cipher | 4 | 100.00% | 0.00% | 0.00% |
|
||||
| sat-solver | 5 | 100.00% | 60.00% | 60.00% |
|
||||
| shortest-path | 3 | 100.00% | 66.67% | 66.67% |
|
||||
| sieve | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| simple-grep | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| spelt2num | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| spirograph | 2 | 100.00% | 50.00% | 50.00% |
|
||||
| sudoku-solver | 4 | 100.00% | 50.00% | 50.00% |
|
||||
| tetris-sim | 12 | 100.00% | 75.00% | 66.67% |
|
||||
| tiny-NN | 5 | 100.00% | 40.00% | 40.00% |
|
||||
| topo-sort | 7 | 100.00% | 0.00% | 0.00% |
|
||||
| totient | 4 | 100.00% | 50.00% | 50.00% |
|
||||
| transcend | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| uniquify | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| vectors-3d | 8 | 100.00% | 12.50% | 0.00% |
|
||||
| verlet | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| weekday | 2 | 100.00% | 0.00% | 0.00% |
|
||||
|
||||
## Compilation Failures
|
||||
- ackermann/ackermann.c::main@0x131c
|
||||
- aes/aes.c::aes_decrypt@0x161b
|
||||
- aes/aes.c::aes_encrypt@0x1560
|
||||
- aes/aes.c::inv_shift_rows@0x12cd
|
||||
- aes/aes.c::key_expansion@0x14c3
|
||||
- aes/aes.c::main@0x16d1
|
||||
- aes/aes.c::shift_rows@0x1248
|
||||
- anagram/anagram.c::BuildMask@0x1372
|
||||
- anagram/anagram.c::BuildWord@0x15cd
|
||||
- anagram/anagram.c::DumpWords@0x17e8
|
||||
- anagram/anagram.c::FindAnagram@0x1839
|
||||
- anagram/anagram.c::ReadDict@0x1233
|
||||
- anagram/anagram.c::main@0x1a93
|
||||
- audio-codec/audio-codec.c::decode@0x1271
|
||||
- audio-codec/audio-codec.c::encode@0x11e9
|
||||
- audio-codec/audio-codec.c::main@0x12d7
|
||||
- avl-tree/avlcore.c::CheckTreeNodeRotation@0x186a
|
||||
- avl-tree/element.c::Compare@0x1764
|
||||
- avl-tree/avlcore.c::DeleteByElement@0x1d2b
|
||||
- avl-tree/avlcore.c::DeleteByElementRecursive@0x1b8b
|
||||
- avl-tree/avlcore.c::DoubleLeftRotation@0x1845
|
||||
- avl-tree/avlcore.c::DoubleRightRotation@0x1821
|
||||
- avl-tree/avlcore.c::FindByElement@0x1790
|
||||
- avl-tree/avlcore.c::Height@0x1d6e
|
||||
- avl-tree/avlcore.c::Insert@0x1a73
|
||||
- avl-tree/avlcore.c::InsertNode@0x199b
|
||||
- avl-tree/avl-tree.c::main@0x1380
|
||||
- avl-tree/avl-tree.c::printTree@0x11e9
|
||||
- banner/banner.c::main@0x11e9
|
||||
- bit-kernels/bit-kernels.c::main@0x12e8
|
||||
- blake2b/blake2b.c::F@0x1258
|
||||
- blake2b/blake2b.c::G@0x11e9
|
||||
- blake2b/blake2b.c::blake2b@0x1616
|
||||
- blake2b/blake2b.c::test@0x1982
|
||||
- bloom-filter/bloom-filter.c::bad_search@0x11e9
|
||||
- bloom-filter/bloom-filter.c::main@0x1217
|
||||
- boyer-moore-search/boyer-moore-search.c::badCharHeuristic@0x11e9
|
||||
- boyer-moore-search/boyer-moore-search.c::main@0x1329
|
||||
- boyer-moore-search/boyer-moore-search.c::search@0x1223
|
||||
- c-interp/c-interp.c::eval@0x35d3
|
||||
- c-interp/c-interp.c::function_body@0x310b
|
||||
- c-interp/c-interp.c::main@0x3c45
|
||||
- c-interp/c-interp.c::next@0x11e9
|
||||
- ccmac/ccmac.c::main@0x11e9
|
||||
- checkers/functions.c::fill_print_initial@0x15dd
|
||||
- checkers/functions.c::link_new_node@0x204d
|
||||
- checkers/checkers.c::main@0x11e9
|
||||
- cipher/cipher.c::encipher@0x11e9
|
||||
- cipher/cipher.c::main@0x12b3
|
||||
- congrad/congrad.c::cg_spmv@0x11e9
|
||||
- congrad/congrad.c::main@0x125a
|
||||
- connect4-minimax/connect4-minimax.c::init_board@0x11e9
|
||||
- connect4-minimax/connect4-minimax.c::main@0x1c5d
|
||||
- connect4-minimax/connect4-minimax.c::minimax@0x17ed
|
||||
- connect4-minimax/connect4-minimax.c::play_game@0x1b13
|
||||
- connect4-minimax/connect4-minimax.c::score_position@0x158e
|
||||
- convex-hull/convex-hull.c::main@0x130d
|
||||
- dhrystone/dhrystone.c::PFunc_1@0x12ab
|
||||
- dhrystone/dhrystone.c::PFunc_2@0x12c8
|
||||
- dhrystone/dhrystone.c::main@0x1311
|
||||
- distinctness/distinctness.c::isDistinct@0x11e9
|
||||
- distinctness/distinctness.c::main@0x1342
|
||||
- fft-int/fft-int.c::db_from_ampl@0x1513
|
||||
- flood-fill/flood-fill.c::main@0x130f
|
||||
- frac-calc/frac-calc.c::avaliatokens@0x1421
|
||||
- frac-calc/frac-calc.c::calcula@0x172a
|
||||
- frac-calc/frac-calc.c::copyr@0x12b5
|
||||
- frac-calc/frac-calc.c::divtokens@0x1636
|
||||
- frac-calc/frac-calc.c::help@0x11e9
|
||||
- frac-calc/frac-calc.c::main@0x18c1
|
||||
- fuzzy-match/fuzzy-match.c::fuzzy_match_recurse@0x21e9
|
||||
- fuzzy-match/fuzzy-match.c::main@0x2391
|
||||
- fy-shuffle/fy-shuffle.c::fy_shuffle@0x11e9
|
||||
- fy-shuffle/fy-shuffle.c::main@0x12de
|
||||
- gcd-list/gcd-list.c::gcd@0x11e9
|
||||
- gcd-list/gcd-list.c::main@0x121c
|
||||
- grad-descent/grad-descent.c::derivateWRTBias@0x1247
|
||||
- grad-descent/grad-descent.c::derivateWRTWeight@0x11e9
|
||||
- grad-descent/grad-descent.c::gradientDescent@0x129d
|
||||
- grad-descent/grad-descent.c::main@0x1312
|
||||
- graph-tests/graph-tests.c::addEdge@0x127b
|
||||
- graph-tests/graph-tests.c::addVertex@0x1743
|
||||
- graph-tests/graph-tests.c::bfs@0x144f
|
||||
- graph-tests/graph-tests.c::bfs_test@0x150f
|
||||
- graph-tests/graph-tests.c::bubbleSort@0x15e7
|
||||
- graph-tests/graph-tests.c::createGraph@0x1206
|
||||
- graph-tests/graph-tests.c::createNode@0x11e9
|
||||
- graph-tests/graph-tests.c::createQueue@0x12cd
|
||||
- graph-tests/graph-tests.c::dequeue@0x1357
|
||||
- graph-tests/graph-tests.c::enqueue@0x130a
|
||||
- graph-tests/graph-tests.c::insertAtTheBegin@0x15ae
|
||||
- graph-tests/graph-tests.c::link_list@0x163c
|
||||
- graph-tests/graph-tests.c::main@0x1a0e
|
||||
- graph-tests/graph-tests.c::printQueue@0x13cc
|
||||
- graph-tests/graph-tests.c::swap@0x15da
|
||||
- hanoi/hanoi.c::main@0x1261
|
||||
- heapsort/heapsort.c::main@0x13d4
|
||||
- heat-calc/heat-calc.c::main@0x11e9
|
||||
- huff-encode/huff-encode.c::main@0x15ef
|
||||
- idct-alg/idct-alg.c::main@0x140e
|
||||
- indirect-test/indirect-test.c::main@0x1257
|
||||
- k-means/k-means.c::calculateNearst@0x11e9
|
||||
- k-means/k-means.c::main@0x1922
|
||||
- k-means/k-means.c::printEPS@0x1546
|
||||
- kadane/kadane.c::main@0x123b
|
||||
- kepler/kepler.c::J@0x18c0
|
||||
- kepler/kepler.c::bin_fact@0x1718
|
||||
- kepler/kepler.c::binary@0x121d
|
||||
- kepler/kepler.c::e_series@0x17a2
|
||||
- kepler/kepler.c::j_series@0x19bb
|
||||
- kepler/kepler.c::main@0x131f
|
||||
- knapsack/knapsack.c::main@0x128b
|
||||
- knapsack/knapsack.c::max@0x11e9
|
||||
- knights-tour/knights-tour.c::solveKT@0x1341
|
||||
- life/life.c::getDown@0x1406
|
||||
- life/life.c::getDownLeft@0x1487
|
||||
- life/life.c::getDownRight@0x14b4
|
||||
- life/life.c::getLeft@0x1390
|
||||
- life/life.c::getNumNeigbors@0x14e2
|
||||
- life/life.c::getRight@0x13b7
|
||||
- life/life.c::getUp@0x13df
|
||||
- life/life.c::getUpLeft@0x142e
|
||||
- life/life.c::getUpRight@0x145a
|
||||
- life/life.c::main@0x1664
|
||||
- life/life.c::process@0x15a3
|
||||
- longdiv/longdiv.c::main@0x1691
|
||||
- longdiv/longdiv.c::sub@0x11e9
|
||||
- lu-decomp/lu-decomp.c::main@0x13ad
|
||||
- lu-decomp/lu-decomp.c::print_matrix@0x11e9
|
||||
- mandelbrot/mandelbrot.c::main@0x120d
|
||||
- matmult/matmult.c::main@0x11e9
|
||||
- max-subseq/max-subseq.c::lcsAlgo@0x11e9
|
||||
- max-subseq/max-subseq.c::main@0x14c4
|
||||
- mersenne/mersenne.c::genrand@0x125b
|
||||
- mersenne/mersenne.c::main@0x1398
|
||||
- mersenne/mersenne.c::sgenrand@0x11e9
|
||||
- minspan/minspan.c::displayGraph@0x13f5
|
||||
- minspan/minspan.c::displayGraph1@0x14f3
|
||||
- minspan/minspan.c::displayPath@0x15fa
|
||||
- minspan/minspan.c::main@0x175b
|
||||
- minspan/minspan.c::minSpanTree@0x1231
|
||||
- monte-carlo/monte-carlo.c::main@0x11e9
|
||||
- murmur-hash/murmur-hash.c::main@0x12a3
|
||||
- murmur-hash/murmur-hash.c::murmurhash@0x11e9
|
||||
- n-queens/n-queens.c::main@0x12b1
|
||||
- natlog/natlog.c::main@0x11e9
|
||||
- nbody-sim/nbody-sim.c::main@0x11e9
|
||||
- packet-filter/packet-filter.c::check_packet_filter@0x133d
|
||||
- packet-filter/packet-filter.c::generate_packet@0x11e9
|
||||
- packet-filter/packet-filter.c::main@0x145c
|
||||
- parrondo/parrondo.c::main@0x127d
|
||||
- parrondo/parrondo.c::play_c@0x1238
|
||||
- pascal/pascal.c::main@0x12d1
|
||||
- pascal/pascal.c::print_centered@0x122b
|
||||
- pi-calc/pi-calc.c::main@0x11e9
|
||||
- primal-test/primal-test.c::main@0x13ea
|
||||
- primal-test/primal-test.c::miller_rabin_int@0x1243
|
||||
- priority-queue/priority-queue.c::main@0x130a
|
||||
- qsort-demo/qsort-demo.c::main@0x163f
|
||||
- qsort-demo/qsort-demo.c::print_struct_array@0x1470
|
||||
- qsort-demo/qsort-demo.c::sort_cstrings_example@0x13b3
|
||||
- qsort-demo/qsort-demo.c::sort_integers_example@0x1292
|
||||
- qsort-demo/qsort-demo.c::sort_structs_example@0x14d2
|
||||
- qsort-test/qsort-test.c::main@0x133f
|
||||
- quaternions/quaternions.c::euler_from_quat@0x136c
|
||||
- quaternions/quaternions.c::main@0x15bf
|
||||
- quaternions/quaternions.c::quat_from_euler@0x11e9
|
||||
- quaternions/quaternions.c::quaternion_multiply@0x1487
|
||||
- rabinkarp-search/rabinkarp-search.c::main@0x1366
|
||||
- rabinkarp-search/rabinkarp-search.c::search@0x11e9
|
||||
- rand-test/rand-test.c::bad_rand@0x11e9
|
||||
- rand-test/rand-test.c::main@0x1514
|
||||
- rand-test/rand-test.c::run_tests@0x1220
|
||||
- ransac/ransac.c::main@0x13cf
|
||||
- ransac/ransac.c::ransac_line_fitting@0x1238
|
||||
- regex-parser/regex-parser.c::main@0x2b4b
|
||||
- regex-parser/regex-parser.c::matchalphanum@0x21fc
|
||||
- regex-parser/regex-parser.c::matchcharclass@0x222a
|
||||
- regex-parser/regex-parser.c::matchone@0x23e1
|
||||
- regex-parser/regex-parser.c::re_compile@0x270b
|
||||
- regex-parser/regex-parser.c::re_print@0x2964
|
||||
- rho-factor/rho-factor.c::main@0x3ef0
|
||||
- rle-compress/rle-compress.c::main@0x1318
|
||||
- rle-compress/rle-compress.c::run_length_encode@0x11e9
|
||||
- rsa-cipher/rsa-cipher.c::main@0x1527
|
||||
- rsa-cipher/rsa-cipher.c::mod_inverse@0x12f3
|
||||
- rsa-cipher/rsa-cipher.c::mod_pow@0x11e9
|
||||
- rsa-cipher/rsa-cipher.c::print_hex_int128@0x1444
|
||||
- sat-solver/sat-solver.c::main@0x141e
|
||||
- sat-solver/sat-solver.c::printFormula@0x12ff
|
||||
- shortest-path/shortest-path.c::main@0x1333
|
||||
- sieve/sieve.c::main@0x11e9
|
||||
- simple-grep/simple-grep.c::main@0x11e9
|
||||
- spelt2num/spelt2num.c::main@0x11e9
|
||||
- spirograph/spirograph.c::spirograph@0x11e9
|
||||
- sudoku-solver/sudoku-solver.c::isSafe@0x11e9
|
||||
- sudoku-solver/sudoku-solver.c::main@0x13e5
|
||||
- tetris-sim/tetris-sim.c::best_move@0x157c
|
||||
- tetris-sim/tetris-sim.c::evaluate_board@0x144b
|
||||
- tetris-sim/tetris-sim.c::main@0x180d
|
||||
- tiny-NN/tiny-NN.c::main@0x16a4
|
||||
- tiny-NN/tiny-NN.c::sampleSine@0x1251
|
||||
- tiny-NN/tiny-NN.c::train@0x133c
|
||||
- topo-sort/topo-sort.c::addEdge@0x127d
|
||||
- topo-sort/topo-sort.c::createGraph@0x1223
|
||||
- topo-sort/topo-sort.c::createListNode@0x1206
|
||||
- topo-sort/topo-sort.c::createStackNode@0x11e9
|
||||
- topo-sort/topo-sort.c::main@0x1424
|
||||
- topo-sort/topo-sort.c::topologicalSort@0x132c
|
||||
- topo-sort/topo-sort.c::topologicalSortUtil@0x12b7
|
||||
- totient/totient.c::main@0x12bf
|
||||
- totient/totient.c::my_gcd@0x11e9
|
||||
- transcend/transcend.c::main@0x11e9
|
||||
- uniquify/uniquify.c::main@0x1201
|
||||
- vectors-3d/vectors-3d.c::get_cross_matrix@0x13c2
|
||||
- vectors-3d/vectors-3d.c::main@0x14cb
|
||||
- vectors-3d/vectors-3d.c::print_vector@0x12dc
|
||||
- vectors-3d/vectors-3d.c::unit_vec@0x1331
|
||||
- vectors-3d/vectors-3d.c::vector_add@0x121f
|
||||
- vectors-3d/vectors-3d.c::vector_prod@0x127e
|
||||
- vectors-3d/vectors-3d.c::vector_sub@0x11e9
|
||||
- verlet/verlet.c::main@0x11e9
|
||||
- weekday/weekday.c::dayOfWeek@0x11e9
|
||||
- weekday/weekday.c::main@0x12ea
|
||||
|
||||
## Execution Failures
|
||||
- cipher/cipher.c::decipher@0x1251
|
||||
- idct-alg/idct-alg.c::idct_2d@0x1216
|
||||
- life/life.c::init@0x11e9
|
||||
- minspan/minspan.c::displayTree@0x16b7
|
||||
- regex-parser/regex-parser.c::matchpattern@0x2491
|
||||
- tetris-sim/tetris-sim.c::clear_lines@0x12b6
|
||||
- vectors-3d/vectors-3d.c::get_angle@0x1429
|
||||
345
sk2decompile/evaluation/bringupbench/reports/O2_results.md
Normal file
345
sk2decompile/evaluation/bringupbench/reports/O2_results.md
Normal file
|
|
@ -0,0 +1,345 @@
|
|||
# Infer-Out Model 2 Evaluation (merged.O2.func_map.infer-host)
|
||||
|
||||
- Timestamp: 20251119-170633
|
||||
- Source JSONL: merged.O2.func_map.infer.jsonl
|
||||
- Target: host
|
||||
- Total cases: 368
|
||||
- Replacement success: 368 (100.00%)
|
||||
- Compilable: 139 (37.77%)
|
||||
- Executable: 126 (34.24%)
|
||||
|
||||
## Benchmark Breakdown
|
||||
| Benchmark | Cases | Replacement% | Build% | Exec% |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| ackermann | 2 | 100.00% | 50.00% | 50.00% |
|
||||
| aes | 10 | 100.00% | 20.00% | 20.00% |
|
||||
| anagram | 13 | 100.00% | 46.15% | 46.15% |
|
||||
| audio-codec | 3 | 100.00% | 33.33% | 33.33% |
|
||||
| avl-tree | 15 | 100.00% | 20.00% | 20.00% |
|
||||
| banner | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| bit-kernels | 3 | 100.00% | 66.67% | 66.67% |
|
||||
| blake2b | 4 | 100.00% | 0.00% | 0.00% |
|
||||
| bloom-filter | 4 | 100.00% | 50.00% | 50.00% |
|
||||
| boyer-moore-search | 3 | 100.00% | 0.00% | 0.00% |
|
||||
| bubble-sort | 3 | 100.00% | 100.00% | 100.00% |
|
||||
| c-interp | 10 | 100.00% | 50.00% | 50.00% |
|
||||
| ccmac | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| checkers | 16 | 100.00% | 68.75% | 62.50% |
|
||||
| cipher | 3 | 100.00% | 66.67% | 0.00% |
|
||||
| congrad | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| connect4-minimax | 13 | 100.00% | 61.54% | 53.85% |
|
||||
| convex-hull | 4 | 100.00% | 75.00% | 75.00% |
|
||||
| dhrystone | 5 | 100.00% | 20.00% | 20.00% |
|
||||
| distinctness | 2 | 100.00% | 0.00% | 0.00% |
|
||||
| fft-int | 4 | 100.00% | 50.00% | 50.00% |
|
||||
| flood-fill | 2 | 100.00% | 50.00% | 50.00% |
|
||||
| frac-calc | 10 | 100.00% | 50.00% | 50.00% |
|
||||
| fuzzy-match | 3 | 100.00% | 33.33% | 33.33% |
|
||||
| fy-shuffle | 3 | 100.00% | 33.33% | 33.33% |
|
||||
| gcd-list | 2 | 100.00% | 50.00% | 0.00% |
|
||||
| grad-descent | 4 | 100.00% | 25.00% | 25.00% |
|
||||
| graph-tests | 20 | 100.00% | 10.00% | 10.00% |
|
||||
| hanoi | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| heapsort | 2 | 100.00% | 50.00% | 50.00% |
|
||||
| heat-calc | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| huff-encode | 13 | 100.00% | 92.31% | 92.31% |
|
||||
| idct-alg | 3 | 100.00% | 66.67% | 33.33% |
|
||||
| indirect-test | 2 | 100.00% | 50.00% | 50.00% |
|
||||
| k-means | 6 | 100.00% | 33.33% | 33.33% |
|
||||
| kadane | 2 | 100.00% | 50.00% | 50.00% |
|
||||
| kepler | 7 | 100.00% | 14.29% | 14.29% |
|
||||
| knapsack | 3 | 100.00% | 33.33% | 33.33% |
|
||||
| knights-tour | 3 | 100.00% | 33.33% | 33.33% |
|
||||
| life | 14 | 100.00% | 21.43% | 14.29% |
|
||||
| longdiv | 6 | 100.00% | 50.00% | 50.00% |
|
||||
| lu-decomp | 3 | 100.00% | 33.33% | 33.33% |
|
||||
| lz-compress | 2 | 100.00% | 100.00% | 100.00% |
|
||||
| mandelbrot | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| matmult | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| max-subseq | 2 | 100.00% | 0.00% | 0.00% |
|
||||
| mersenne | 3 | 100.00% | 0.00% | 0.00% |
|
||||
| minspan | 8 | 100.00% | 25.00% | 25.00% |
|
||||
| monte-carlo | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| murmur-hash | 2 | 100.00% | 0.00% | 0.00% |
|
||||
| n-queens | 3 | 100.00% | 66.67% | 66.67% |
|
||||
| natlog | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| nbody-sim | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| nr-solver | 1 | 100.00% | 100.00% | 100.00% |
|
||||
| packet-filter | 4 | 100.00% | 0.00% | 0.00% |
|
||||
| parrondo | 2 | 100.00% | 50.00% | 50.00% |
|
||||
| pascal | 3 | 100.00% | 66.67% | 66.67% |
|
||||
| pi-calc | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| primal-test | 3 | 100.00% | 33.33% | 33.33% |
|
||||
| priority-queue | 5 | 100.00% | 80.00% | 80.00% |
|
||||
| qsort-demo | 7 | 100.00% | 28.57% | 28.57% |
|
||||
| qsort-test | 5 | 100.00% | 80.00% | 80.00% |
|
||||
| quaternions | 4 | 100.00% | 0.00% | 0.00% |
|
||||
| rabinkarp-search | 2 | 100.00% | 0.00% | 0.00% |
|
||||
| rand-test | 3 | 100.00% | 0.00% | 0.00% |
|
||||
| ransac | 2 | 100.00% | 50.00% | 0.00% |
|
||||
| regex-parser | 7 | 100.00% | 28.57% | 14.29% |
|
||||
| rho-factor | 3 | 100.00% | 66.67% | 66.67% |
|
||||
| rle-compress | 2 | 100.00% | 0.00% | 0.00% |
|
||||
| rsa-cipher | 4 | 100.00% | 0.00% | 0.00% |
|
||||
| sat-solver | 5 | 100.00% | 60.00% | 60.00% |
|
||||
| shortest-path | 3 | 100.00% | 66.67% | 66.67% |
|
||||
| sieve | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| simple-grep | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| spelt2num | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| spirograph | 2 | 100.00% | 50.00% | 0.00% |
|
||||
| sudoku-solver | 4 | 100.00% | 50.00% | 50.00% |
|
||||
| tetris-sim | 12 | 100.00% | 75.00% | 58.33% |
|
||||
| tiny-NN | 4 | 100.00% | 25.00% | 25.00% |
|
||||
| topo-sort | 7 | 100.00% | 0.00% | 0.00% |
|
||||
| totient | 2 | 100.00% | 50.00% | 50.00% |
|
||||
| transcend | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| uniquify | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| vectors-3d | 8 | 100.00% | 12.50% | 0.00% |
|
||||
| verlet | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| weekday | 2 | 100.00% | 0.00% | 0.00% |
|
||||
|
||||
## Compilation Failures
|
||||
- ackermann/ackermann.c::main@0x1100
|
||||
- aes/aes.c::aes_decrypt@0x18c0
|
||||
- aes/aes.c::aes_encrypt@0x1780
|
||||
- aes/aes.c::inv_mix_columns@0x1640
|
||||
- aes/aes.c::inv_shift_rows@0x14f0
|
||||
- aes/aes.c::key_expansion@0x16d0
|
||||
- aes/aes.c::main@0x1100
|
||||
- aes/aes.c::mix_columns@0x1580
|
||||
- aes/aes.c::shift_rows@0x1480
|
||||
- anagram/anagram.c::BuildMask@0x14c0
|
||||
- anagram/anagram.c::BuildWord@0x17d0
|
||||
- anagram/anagram.c::DumpCandidates@0x19a0
|
||||
- anagram/anagram.c::DumpWords@0x1a30
|
||||
- anagram/anagram.c::FindAnagram@0x1a90
|
||||
- anagram/anagram.c::ReadDict@0x1360
|
||||
- anagram/anagram.c::main@0x1120
|
||||
- audio-codec/audio-codec.c::decode@0x1440
|
||||
- audio-codec/audio-codec.c::main@0x1100
|
||||
- avl-tree/avlcore.c::CheckTreeNodeRotation@0x1c30
|
||||
- avl-tree/element.c::Compare@0x1ad0
|
||||
- avl-tree/avlcore.c::DeleteByElement@0x2860
|
||||
- avl-tree/avlcore.c::DeleteByElementRecursive@0x26d0
|
||||
- avl-tree/avlcore.c::DeleteLeftMost@0x2610
|
||||
- avl-tree/avlcore.c::DoubleLeftRotation@0x1c00
|
||||
- avl-tree/avlcore.c::DoubleRightRotation@0x1bd0
|
||||
- avl-tree/avlcore.c::FindByElement@0x1b00
|
||||
- avl-tree/avlcore.c::Insert@0x1f30
|
||||
- avl-tree/avlcore.c::MakeEmpty@0x1f80
|
||||
- avl-tree/avl-tree.c::breadth@0x1760
|
||||
- avl-tree/avl-tree.c::main@0x1120
|
||||
- banner/banner.c::main@0x1120
|
||||
- bit-kernels/bit-kernels.c::main@0x1120
|
||||
- blake2b/blake2b.c::F@0x12a0
|
||||
- blake2b/blake2b.c::G@0x1230
|
||||
- blake2b/blake2b.c::blake2b@0x1620
|
||||
- blake2b/blake2b.c::test@0x19d0
|
||||
- bloom-filter/bloom-filter.c::bad_search@0x1430
|
||||
- bloom-filter/bloom-filter.c::main@0x1120
|
||||
- boyer-moore-search/boyer-moore-search.c::badCharHeuristic@0x15d0
|
||||
- boyer-moore-search/boyer-moore-search.c::main@0x1140
|
||||
- boyer-moore-search/boyer-moore-search.c::search@0x1630
|
||||
- c-interp/c-interp.c::eval@0x3e90
|
||||
- c-interp/c-interp.c::function_body@0x37f0
|
||||
- c-interp/c-interp.c::function_declaration@0x3a10
|
||||
- c-interp/c-interp.c::main@0x1120
|
||||
- c-interp/c-interp.c::next@0x1580
|
||||
- ccmac/ccmac.c::main@0x1120
|
||||
- checkers/functions.c::fill_print_initial@0x1630
|
||||
- checkers/functions.c::free_tree@0x2460
|
||||
- checkers/functions.c::generate_node_children@0x21c0
|
||||
- checkers/functions.c::link_new_node@0x20e0
|
||||
- checkers/checkers.c::main@0x1150
|
||||
- cipher/cipher.c::main@0x1100
|
||||
- congrad/congrad.c::main@0x1100
|
||||
- connect4-minimax/connect4-minimax.c::init_board@0x1230
|
||||
- connect4-minimax/connect4-minimax.c::main@0x1100
|
||||
- connect4-minimax/connect4-minimax.c::minimax@0x1840
|
||||
- connect4-minimax/connect4-minimax.c::play_game@0x1c90
|
||||
- connect4-minimax/connect4-minimax.c::score_position@0x1620
|
||||
- convex-hull/convex-hull.c::main@0x1100
|
||||
- dhrystone/dhrystone.c::PFunc_1@0x1970
|
||||
- dhrystone/dhrystone.c::PFunc_2@0x1990
|
||||
- dhrystone/dhrystone.c::PProc_8@0x1900
|
||||
- dhrystone/dhrystone.c::main@0x1100
|
||||
- distinctness/distinctness.c::isDistinct@0x12a0
|
||||
- distinctness/distinctness.c::main@0x1100
|
||||
- fft-int/fft-int.c::db_from_ampl@0x1670
|
||||
- fft-int/fft-int.c::fix_fft@0x1320
|
||||
- flood-fill/flood-fill.c::main@0x1100
|
||||
- frac-calc/frac-calc.c::avaliatokens@0x15f0
|
||||
- frac-calc/frac-calc.c::copyr@0x1460
|
||||
- frac-calc/frac-calc.c::divtokens@0x1840
|
||||
- frac-calc/frac-calc.c::help@0x13b0
|
||||
- frac-calc/frac-calc.c::main@0x1120
|
||||
- fuzzy-match/fuzzy-match.c::fuzzy_match_recurse@0x2360
|
||||
- fuzzy-match/fuzzy-match.c::main@0x2100
|
||||
- fy-shuffle/fy-shuffle.c::fy_shuffle@0x1440
|
||||
- fy-shuffle/fy-shuffle.c::main@0x1100
|
||||
- gcd-list/gcd-list.c::main@0x1120
|
||||
- grad-descent/grad-descent.c::derivateWRTBias@0x12d0
|
||||
- grad-descent/grad-descent.c::derivateWRTWeight@0x1270
|
||||
- grad-descent/grad-descent.c::main@0x1100
|
||||
- graph-tests/graph-tests.c::DFS_test@0x1c20
|
||||
- graph-tests/graph-tests.c::addEdge@0x1320
|
||||
- graph-tests/graph-tests.c::addVertex@0x1a50
|
||||
- graph-tests/graph-tests.c::bfs@0x1540
|
||||
- graph-tests/graph-tests.c::bfs_test@0x1720
|
||||
- graph-tests/graph-tests.c::bubbleSort@0x1880
|
||||
- graph-tests/graph-tests.c::createGraph@0x1260
|
||||
- graph-tests/graph-tests.c::createNode@0x1240
|
||||
- graph-tests/graph-tests.c::createQueue@0x1390
|
||||
- graph-tests/graph-tests.c::depthFirstSearch@0x1b20
|
||||
- graph-tests/graph-tests.c::dequeue@0x1430
|
||||
- graph-tests/graph-tests.c::enqueue@0x13e0
|
||||
- graph-tests/graph-tests.c::getAdjUnvisitedVertex@0x1ac0
|
||||
- graph-tests/graph-tests.c::insertAtTheBegin@0x1840
|
||||
- graph-tests/graph-tests.c::link_list@0x18e0
|
||||
- graph-tests/graph-tests.c::main@0x1120
|
||||
- graph-tests/graph-tests.c::printQueue@0x14c0
|
||||
- graph-tests/graph-tests.c::swap@0x1870
|
||||
- hanoi/hanoi.c::main@0x1100
|
||||
- heapsort/heapsort.c::main@0x1100
|
||||
- heat-calc/heat-calc.c::main@0x1100
|
||||
- huff-encode/huff-encode.c::main@0x1120
|
||||
- idct-alg/idct-alg.c::main@0x1100
|
||||
- indirect-test/indirect-test.c::main@0x1100
|
||||
- k-means/k-means.c::calculateNearst@0x1310
|
||||
- k-means/k-means.c::kMeans@0x1420
|
||||
- k-means/k-means.c::main@0x1120
|
||||
- k-means/k-means.c::printEPS@0x16b0
|
||||
- kadane/kadane.c::main@0x1100
|
||||
- kepler/kepler.c::J@0x1920
|
||||
- kepler/kepler.c::bin_fact@0x1740
|
||||
- kepler/kepler.c::binary@0x16a0
|
||||
- kepler/kepler.c::e_series@0x17e0
|
||||
- kepler/kepler.c::j_series@0x1a20
|
||||
- kepler/kepler.c::main@0x1100
|
||||
- knapsack/knapsack.c::main@0x1100
|
||||
- knapsack/knapsack.c::max@0x1310
|
||||
- knights-tour/knights-tour.c::solveKT@0x1390
|
||||
- knights-tour/knights-tour.c::solveKTUtil@0x14f0
|
||||
- life/life.c::getDown@0x16e0
|
||||
- life/life.c::getDownLeft@0x1770
|
||||
- life/life.c::getDownRight@0x17a0
|
||||
- life/life.c::getLeft@0x1650
|
||||
- life/life.c::getNumNeigbors@0x1390
|
||||
- life/life.c::getRight@0x1680
|
||||
- life/life.c::getUp@0x16b0
|
||||
- life/life.c::getUpLeft@0x1710
|
||||
- life/life.c::getUpRight@0x1740
|
||||
- life/life.c::main@0x1100
|
||||
- life/life.c::process@0x1550
|
||||
- longdiv/longdiv.c::main@0x1120
|
||||
- longdiv/longdiv.c::sbc@0x1a20
|
||||
- longdiv/longdiv.c::sub@0x19c0
|
||||
- lu-decomp/lu-decomp.c::main@0x1100
|
||||
- lu-decomp/lu-decomp.c::print_matrix@0x13a0
|
||||
- mandelbrot/mandelbrot.c::main@0x1100
|
||||
- matmult/matmult.c::main@0x1100
|
||||
- max-subseq/max-subseq.c::lcsAlgo@0x1290
|
||||
- max-subseq/max-subseq.c::main@0x1120
|
||||
- mersenne/mersenne.c::genrand@0x1310
|
||||
- mersenne/mersenne.c::main@0x1100
|
||||
- mersenne/mersenne.c::sgenrand@0x1290
|
||||
- minspan/minspan.c::displayGraph@0x14f0
|
||||
- minspan/minspan.c::displayGraph1@0x15f0
|
||||
- minspan/minspan.c::displayPath@0x1700
|
||||
- minspan/minspan.c::displayTree@0x17a0
|
||||
- minspan/minspan.c::main@0x1100
|
||||
- minspan/minspan.c::minSpanTree@0x12f0
|
||||
- monte-carlo/monte-carlo.c::main@0x1100
|
||||
- murmur-hash/murmur-hash.c::main@0x1100
|
||||
- murmur-hash/murmur-hash.c::murmurhash@0x1290
|
||||
- n-queens/n-queens.c::main@0x1120
|
||||
- natlog/natlog.c::main@0x1100
|
||||
- nbody-sim/nbody-sim.c::main@0x1100
|
||||
- packet-filter/packet-filter.c::check_packet_filter@0x1430
|
||||
- packet-filter/packet-filter.c::generate_packet@0x12d0
|
||||
- packet-filter/packet-filter.c::main@0x1100
|
||||
- packet-filter/packet-filter.c::print_packet@0x1490
|
||||
- parrondo/parrondo.c::main@0x1100
|
||||
- pascal/pascal.c::main@0x1100
|
||||
- pi-calc/pi-calc.c::main@0x1100
|
||||
- primal-test/primal-test.c::main@0x1100
|
||||
- primal-test/primal-test.c::miller_rabin_int@0x1510
|
||||
- priority-queue/priority-queue.c::main@0x1120
|
||||
- qsort-demo/qsort-demo.c::main@0x1120
|
||||
- qsort-demo/qsort-demo.c::print_struct_array@0x15c0
|
||||
- qsort-demo/qsort-demo.c::sort_cstrings_example@0x14a0
|
||||
- qsort-demo/qsort-demo.c::sort_integers_example@0x1310
|
||||
- qsort-demo/qsort-demo.c::sort_structs_example@0x1640
|
||||
- qsort-test/qsort-test.c::main@0x1120
|
||||
- quaternions/quaternions.c::euler_from_quat@0x1580
|
||||
- quaternions/quaternions.c::main@0x1100
|
||||
- quaternions/quaternions.c::quat_from_euler@0x13f0
|
||||
- quaternions/quaternions.c::quaternion_multiply@0x16b0
|
||||
- rabinkarp-search/rabinkarp-search.c::main@0x1120
|
||||
- rabinkarp-search/rabinkarp-search.c::search@0x13a0
|
||||
- rand-test/rand-test.c::bad_rand@0x1240
|
||||
- rand-test/rand-test.c::main@0x1100
|
||||
- rand-test/rand-test.c::run_tests@0x1280
|
||||
- ransac/ransac.c::main@0x1100
|
||||
- regex-parser/regex-parser.c::main@0x2100
|
||||
- regex-parser/regex-parser.c::matchcharclass@0x23b0
|
||||
- regex-parser/regex-parser.c::matchone@0x2560
|
||||
- regex-parser/regex-parser.c::re_compile@0x2930
|
||||
- regex-parser/regex-parser.c::re_print@0x2bf0
|
||||
- rho-factor/rho-factor.c::main@0x1120
|
||||
- rle-compress/rle-compress.c::main@0x1120
|
||||
- rle-compress/rle-compress.c::run_length_encode@0x1330
|
||||
- rsa-cipher/rsa-cipher.c::main@0x1100
|
||||
- rsa-cipher/rsa-cipher.c::mod_inverse@0x1670
|
||||
- rsa-cipher/rsa-cipher.c::mod_pow@0x1580
|
||||
- rsa-cipher/rsa-cipher.c::print_hex_int128@0x1790
|
||||
- sat-solver/sat-solver.c::main@0x1100
|
||||
- sat-solver/sat-solver.c::printFormula@0x1390
|
||||
- shortest-path/shortest-path.c::main@0x1100
|
||||
- sieve/sieve.c::main@0x1100
|
||||
- simple-grep/simple-grep.c::main@0x1120
|
||||
- spelt2num/spelt2num.c::main@0x1100
|
||||
- spirograph/spirograph.c::spirograph@0x1230
|
||||
- sudoku-solver/sudoku-solver.c::isSafe@0x1250
|
||||
- sudoku-solver/sudoku-solver.c::main@0x1100
|
||||
- tetris-sim/tetris-sim.c::best_move@0x1860
|
||||
- tetris-sim/tetris-sim.c::evaluate_board@0x1640
|
||||
- tetris-sim/tetris-sim.c::main@0x1120
|
||||
- tiny-NN/tiny-NN.c::main@0x1120
|
||||
- tiny-NN/tiny-NN.c::sampleSine@0x12d0
|
||||
- tiny-NN/tiny-NN.c::train@0x13e0
|
||||
- topo-sort/topo-sort.c::addEdge@0x1370
|
||||
- topo-sort/topo-sort.c::createGraph@0x1300
|
||||
- topo-sort/topo-sort.c::createListNode@0x12e0
|
||||
- topo-sort/topo-sort.c::createStackNode@0x12c0
|
||||
- topo-sort/topo-sort.c::main@0x1120
|
||||
- topo-sort/topo-sort.c::topologicalSort@0x1450
|
||||
- topo-sort/topo-sort.c::topologicalSortUtil@0x13c0
|
||||
- totient/totient.c::main@0x1100
|
||||
- transcend/transcend.c::main@0x1120
|
||||
- uniquify/uniquify.c::main@0x1120
|
||||
- vectors-3d/vectors-3d.c::get_cross_matrix@0x1760
|
||||
- vectors-3d/vectors-3d.c::main@0x1100
|
||||
- vectors-3d/vectors-3d.c::print_vector@0x1620
|
||||
- vectors-3d/vectors-3d.c::unit_vec@0x1690
|
||||
- vectors-3d/vectors-3d.c::vector_add@0x1550
|
||||
- vectors-3d/vectors-3d.c::vector_prod@0x15c0
|
||||
- vectors-3d/vectors-3d.c::vector_sub@0x1510
|
||||
- verlet/verlet.c::main@0x1100
|
||||
- weekday/weekday.c::dayOfWeek@0x1350
|
||||
- weekday/weekday.c::main@0x1100
|
||||
|
||||
## Execution Failures
|
||||
- checkers/functions.c::all_possible_moves@0x1a60
|
||||
- cipher/cipher.c::decipher@0x1360
|
||||
- cipher/cipher.c::encipher@0x12f0
|
||||
- connect4-minimax/connect4-minimax.c::terminal_score@0x1800
|
||||
- gcd-list/gcd-list.c::gcd@0x1310
|
||||
- idct-alg/idct-alg.c::idct_2d@0x12f0
|
||||
- life/life.c::init@0x1220
|
||||
- ransac/ransac.c::ransac_line_fitting@0x1410
|
||||
- regex-parser/regex-parser.c::matchpattern@0x2670
|
||||
- spirograph/spirograph.c::test@0x1390
|
||||
- tetris-sim/tetris-sim.c::clear_lines@0x1480
|
||||
- tetris-sim/tetris-sim.c::simulate_board@0x17c0
|
||||
- vectors-3d/vectors-3d.c::get_angle@0x17d0
|
||||
355
sk2decompile/evaluation/bringupbench/reports/O3_results.md
Normal file
355
sk2decompile/evaluation/bringupbench/reports/O3_results.md
Normal file
|
|
@ -0,0 +1,355 @@
|
|||
# Infer-Out Model 2 Evaluation (merged.O3.func_map.infer-host)
|
||||
|
||||
- Timestamp: 20251119-171533
|
||||
- Source JSONL: merged.O3.func_map.infer.jsonl
|
||||
- Target: host
|
||||
- Total cases: 359
|
||||
- Replacement success: 359 (100.00%)
|
||||
- Compilable: 114 (31.75%)
|
||||
- Executable: 106 (29.53%)
|
||||
|
||||
## Benchmark Breakdown
|
||||
| Benchmark | Cases | Replacement% | Build% | Exec% |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| ackermann | 2 | 100.00% | 50.00% | 50.00% |
|
||||
| aes | 11 | 100.00% | 27.27% | 27.27% |
|
||||
| anagram | 13 | 100.00% | 38.46% | 38.46% |
|
||||
| audio-codec | 3 | 100.00% | 33.33% | 33.33% |
|
||||
| avl-tree | 15 | 100.00% | 13.33% | 13.33% |
|
||||
| banner | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| bit-kernels | 3 | 100.00% | 66.67% | 66.67% |
|
||||
| blake2b | 3 | 100.00% | 0.00% | 0.00% |
|
||||
| bloom-filter | 4 | 100.00% | 25.00% | 25.00% |
|
||||
| boyer-moore-search | 3 | 100.00% | 0.00% | 0.00% |
|
||||
| bubble-sort | 3 | 100.00% | 100.00% | 100.00% |
|
||||
| c-interp | 10 | 100.00% | 40.00% | 40.00% |
|
||||
| ccmac | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| checkers | 13 | 100.00% | 61.54% | 61.54% |
|
||||
| cipher | 3 | 100.00% | 33.33% | 0.00% |
|
||||
| congrad | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| connect4-minimax | 11 | 100.00% | 45.45% | 45.45% |
|
||||
| convex-hull | 4 | 100.00% | 50.00% | 50.00% |
|
||||
| dhrystone | 5 | 100.00% | 40.00% | 40.00% |
|
||||
| distinctness | 2 | 100.00% | 0.00% | 0.00% |
|
||||
| fft-int | 4 | 100.00% | 0.00% | 0.00% |
|
||||
| flood-fill | 2 | 100.00% | 50.00% | 50.00% |
|
||||
| frac-calc | 9 | 100.00% | 22.22% | 22.22% |
|
||||
| fuzzy-match | 3 | 100.00% | 33.33% | 33.33% |
|
||||
| fy-shuffle | 3 | 100.00% | 33.33% | 33.33% |
|
||||
| gcd-list | 2 | 100.00% | 0.00% | 0.00% |
|
||||
| grad-descent | 4 | 100.00% | 0.00% | 0.00% |
|
||||
| graph-tests | 19 | 100.00% | 5.26% | 5.26% |
|
||||
| hanoi | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| heapsort | 2 | 100.00% | 0.00% | 0.00% |
|
||||
| heat-calc | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| huff-encode | 12 | 100.00% | 83.33% | 83.33% |
|
||||
| idct-alg | 3 | 100.00% | 66.67% | 33.33% |
|
||||
| indirect-test | 2 | 100.00% | 50.00% | 50.00% |
|
||||
| k-means | 5 | 100.00% | 0.00% | 0.00% |
|
||||
| kadane | 2 | 100.00% | 50.00% | 50.00% |
|
||||
| kepler | 7 | 100.00% | 14.29% | 14.29% |
|
||||
| knapsack | 3 | 100.00% | 33.33% | 33.33% |
|
||||
| knights-tour | 3 | 100.00% | 33.33% | 33.33% |
|
||||
| life | 14 | 100.00% | 21.43% | 14.29% |
|
||||
| longdiv | 7 | 100.00% | 71.43% | 71.43% |
|
||||
| lu-decomp | 3 | 100.00% | 33.33% | 33.33% |
|
||||
| lz-compress | 2 | 100.00% | 100.00% | 100.00% |
|
||||
| mandelbrot | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| max-subseq | 2 | 100.00% | 0.00% | 0.00% |
|
||||
| mersenne | 4 | 100.00% | 0.00% | 0.00% |
|
||||
| minspan | 8 | 100.00% | 25.00% | 25.00% |
|
||||
| monte-carlo | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| murmur-hash | 2 | 100.00% | 0.00% | 0.00% |
|
||||
| n-queens | 3 | 100.00% | 66.67% | 66.67% |
|
||||
| natlog | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| nbody-sim | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| nr-solver | 1 | 100.00% | 100.00% | 100.00% |
|
||||
| packet-filter | 4 | 100.00% | 0.00% | 0.00% |
|
||||
| parrondo | 2 | 100.00% | 50.00% | 50.00% |
|
||||
| pascal | 3 | 100.00% | 66.67% | 66.67% |
|
||||
| pi-calc | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| primal-test | 3 | 100.00% | 66.67% | 66.67% |
|
||||
| priority-queue | 5 | 100.00% | 40.00% | 40.00% |
|
||||
| qsort-demo | 7 | 100.00% | 28.57% | 28.57% |
|
||||
| qsort-test | 5 | 100.00% | 80.00% | 80.00% |
|
||||
| quaternions | 4 | 100.00% | 0.00% | 0.00% |
|
||||
| rabinkarp-search | 2 | 100.00% | 0.00% | 0.00% |
|
||||
| rand-test | 3 | 100.00% | 0.00% | 0.00% |
|
||||
| ransac | 2 | 100.00% | 50.00% | 0.00% |
|
||||
| regex-parser | 8 | 100.00% | 25.00% | 25.00% |
|
||||
| rho-factor | 1 | 100.00% | 100.00% | 100.00% |
|
||||
| rle-compress | 2 | 100.00% | 0.00% | 0.00% |
|
||||
| rsa-cipher | 4 | 100.00% | 0.00% | 0.00% |
|
||||
| sat-solver | 5 | 100.00% | 60.00% | 40.00% |
|
||||
| shortest-path | 3 | 100.00% | 33.33% | 33.33% |
|
||||
| sieve | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| simple-grep | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| spelt2num | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| spirograph | 2 | 100.00% | 50.00% | 0.00% |
|
||||
| sudoku-solver | 4 | 100.00% | 75.00% | 75.00% |
|
||||
| tetris-sim | 12 | 100.00% | 58.33% | 50.00% |
|
||||
| tiny-NN | 4 | 100.00% | 25.00% | 25.00% |
|
||||
| topo-sort | 7 | 100.00% | 0.00% | 0.00% |
|
||||
| totient | 2 | 100.00% | 50.00% | 50.00% |
|
||||
| transcend | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| uniquify | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| vectors-3d | 8 | 100.00% | 12.50% | 0.00% |
|
||||
| verlet | 1 | 100.00% | 0.00% | 0.00% |
|
||||
| weekday | 2 | 100.00% | 0.00% | 0.00% |
|
||||
|
||||
## Compilation Failures
|
||||
- ackermann/ackermann.c::main@0x1100
|
||||
- aes/aes.c::add_round_key@0x1810
|
||||
- aes/aes.c::aes_decrypt@0x2760
|
||||
- aes/aes.c::aes_encrypt@0x2200
|
||||
- aes/aes.c::inv_shift_rows@0x1af0
|
||||
- aes/aes.c::key_expansion@0x1ff0
|
||||
- aes/aes.c::main@0x1100
|
||||
- aes/aes.c::mix_columns@0x1bd0
|
||||
- aes/aes.c::shift_rows@0x1a30
|
||||
- anagram/anagram.c::BuildMask@0x1620
|
||||
- anagram/anagram.c::BuildWord@0x1940
|
||||
- anagram/anagram.c::DumpCandidates@0x1c10
|
||||
- anagram/anagram.c::DumpWords@0x1ca0
|
||||
- anagram/anagram.c::FindAnagram@0x1d00
|
||||
- anagram/anagram.c::ReadDict@0x14c0
|
||||
- anagram/anagram.c::SortCandidates@0x1f10
|
||||
- anagram/anagram.c::main@0x1120
|
||||
- audio-codec/audio-codec.c::decode@0x1590
|
||||
- audio-codec/audio-codec.c::main@0x1100
|
||||
- avl-tree/avlcore.c::CheckTreeNodeRotation@0x1c50
|
||||
- avl-tree/element.c::Compare@0x1af0
|
||||
- avl-tree/avlcore.c::DeleteByElement@0x2e50
|
||||
- avl-tree/avlcore.c::DeleteByElementRecursive@0x2bf0
|
||||
- avl-tree/avlcore.c::DeleteLeftMost@0x2720
|
||||
- avl-tree/avlcore.c::DoubleLeftRotation@0x1c20
|
||||
- avl-tree/avlcore.c::DoubleRightRotation@0x1bf0
|
||||
- avl-tree/avlcore.c::FindByElement@0x1b20
|
||||
- avl-tree/avlcore.c::Insert@0x1f40
|
||||
- avl-tree/avlcore.c::InsertNode@0x1e10
|
||||
- avl-tree/avlcore.c::MakeEmpty@0x2090
|
||||
- avl-tree/avl-tree.c::breadth@0x1780
|
||||
- avl-tree/avl-tree.c::main@0x1120
|
||||
- banner/banner.c::main@0x1120
|
||||
- bit-kernels/bit-kernels.c::main@0x1120
|
||||
- blake2b/blake2b.c::F@0x12e0
|
||||
- blake2b/blake2b.c::blake2b@0x17b0
|
||||
- blake2b/blake2b.c::test@0x1b50
|
||||
- bloom-filter/bloom-filter.c::bad_search@0x1450
|
||||
- bloom-filter/tinybloom.c::bfilter_intersect@0x1570
|
||||
- bloom-filter/bloom-filter.c::main@0x1120
|
||||
- boyer-moore-search/boyer-moore-search.c::badCharHeuristic@0x15d0
|
||||
- boyer-moore-search/boyer-moore-search.c::main@0x1140
|
||||
- boyer-moore-search/boyer-moore-search.c::search@0x1630
|
||||
- c-interp/c-interp.c::enum_declaration@0x34f0
|
||||
- c-interp/c-interp.c::eval@0x3ea0
|
||||
- c-interp/c-interp.c::function_body@0x37f0
|
||||
- c-interp/c-interp.c::function_declaration@0x3a10
|
||||
- c-interp/c-interp.c::main@0x1120
|
||||
- c-interp/c-interp.c::next@0x15a0
|
||||
- ccmac/ccmac.c::main@0x1120
|
||||
- checkers/functions.c::fill_print_initial@0x18e0
|
||||
- checkers/functions.c::free_tree@0x6210
|
||||
- checkers/functions.c::generate_node_children@0x35d0
|
||||
- checkers/functions.c::link_new_node@0x34c0
|
||||
- checkers/checkers.c::main@0x1130
|
||||
- cipher/cipher.c::encipher@0x12f0
|
||||
- cipher/cipher.c::main@0x1100
|
||||
- congrad/congrad.c::main@0x1100
|
||||
- connect4-minimax/connect4-minimax.c::board_full@0x1500
|
||||
- connect4-minimax/connect4-minimax.c::evaluate_window@0x2380
|
||||
- connect4-minimax/connect4-minimax.c::init_board@0x1230
|
||||
- connect4-minimax/connect4-minimax.c::main@0x1100
|
||||
- connect4-minimax/connect4-minimax.c::minimax@0x3c30
|
||||
- connect4-minimax/connect4-minimax.c::play_game@0x4260
|
||||
- convex-hull/convex-hull.c::main@0x1100
|
||||
- convex-hull/convex-hull.c::sortPoints@0x1740
|
||||
- dhrystone/dhrystone.c::PFunc_1@0x1980
|
||||
- dhrystone/dhrystone.c::PProc_8@0x1910
|
||||
- dhrystone/dhrystone.c::main@0x1100
|
||||
- distinctness/distinctness.c::isDistinct@0x12a0
|
||||
- distinctness/distinctness.c::main@0x1100
|
||||
- fft-int/fft-int.c::db_from_ampl@0x1c50
|
||||
- fft-int/fft-int.c::fix_fft@0x1370
|
||||
- fft-int/fft-int.c::fix_loud@0x1a90
|
||||
- fft-int/fft-int.c::window@0x1650
|
||||
- flood-fill/flood-fill.c::main@0x1100
|
||||
- frac-calc/frac-calc.c::avaliatokens@0x1730
|
||||
- frac-calc/frac-calc.c::copyr@0x1550
|
||||
- frac-calc/frac-calc.c::divtokens@0x1980
|
||||
- frac-calc/frac-calc.c::help@0x14a0
|
||||
- frac-calc/frac-calc.c::main@0x1120
|
||||
- frac-calc/frac-calc.c::misto@0x1610
|
||||
- frac-calc/frac-calc.c::simplifica@0x28f0
|
||||
- fuzzy-match/fuzzy-match.c::fuzzy_match_recurse@0x23e0
|
||||
- fuzzy-match/fuzzy-match.c::main@0x2100
|
||||
- fy-shuffle/fy-shuffle.c::fy_shuffle@0x1440
|
||||
- fy-shuffle/fy-shuffle.c::main@0x1100
|
||||
- gcd-list/gcd-list.c::gcd@0x1310
|
||||
- gcd-list/gcd-list.c::main@0x1120
|
||||
- grad-descent/grad-descent.c::derivateWRTBias@0x12e0
|
||||
- grad-descent/grad-descent.c::derivateWRTWeight@0x1270
|
||||
- grad-descent/grad-descent.c::gradientDescent@0x1350
|
||||
- grad-descent/grad-descent.c::main@0x1100
|
||||
- graph-tests/graph-tests.c::DFS_test@0x2340
|
||||
- graph-tests/graph-tests.c::addEdge@0x1610
|
||||
- graph-tests/graph-tests.c::addVertex@0x1f80
|
||||
- graph-tests/graph-tests.c::bfs@0x1830
|
||||
- graph-tests/graph-tests.c::bfs_test@0x1a70
|
||||
- graph-tests/graph-tests.c::bubbleSort@0x1db0
|
||||
- graph-tests/graph-tests.c::createGraph@0x1550
|
||||
- graph-tests/graph-tests.c::createNode@0x1530
|
||||
- graph-tests/graph-tests.c::createQueue@0x1680
|
||||
- graph-tests/graph-tests.c::depthFirstSearch@0x2110
|
||||
- graph-tests/graph-tests.c::dequeue@0x1720
|
||||
- graph-tests/graph-tests.c::enqueue@0x16d0
|
||||
- graph-tests/graph-tests.c::insertAtTheBegin@0x1d70
|
||||
- graph-tests/graph-tests.c::link_list@0x1e20
|
||||
- graph-tests/graph-tests.c::main@0x1180
|
||||
- graph-tests/graph-tests.c::printQueue@0x17b0
|
||||
- graph-tests/graph-tests.c::swap@0x1da0
|
||||
- graph-tests/graph-tests.c::towers@0x2490
|
||||
- hanoi/hanoi.c::main@0x1100
|
||||
- heapsort/heapsort.c::HSORT@0x12f0
|
||||
- heapsort/heapsort.c::main@0x11a0
|
||||
- heat-calc/heat-calc.c::main@0x1100
|
||||
- huff-encode/huff-encode.c::buildHuffmanTree@0x18b0
|
||||
- huff-encode/huff-encode.c::main@0x1120
|
||||
- idct-alg/idct-alg.c::main@0x1100
|
||||
- indirect-test/indirect-test.c::main@0x1100
|
||||
- k-means/k-means.c::calculateCentroid@0x1390
|
||||
- k-means/k-means.c::calculateNearst@0x1310
|
||||
- k-means/k-means.c::kMeans@0x1400
|
||||
- k-means/k-means.c::main@0x1120
|
||||
- k-means/k-means.c::printEPS@0x16c0
|
||||
- kadane/kadane.c::main@0x1100
|
||||
- kepler/kepler.c::J@0x1b80
|
||||
- kepler/kepler.c::bin_fact@0x1ad0
|
||||
- kepler/kepler.c::binary@0x16a0
|
||||
- kepler/kepler.c::e_series@0x1740
|
||||
- kepler/kepler.c::j_series@0x1920
|
||||
- kepler/kepler.c::main@0x1100
|
||||
- knapsack/knapsack.c::main@0x1100
|
||||
- knapsack/knapsack.c::max@0x1310
|
||||
- knights-tour/knights-tour.c::solveKT@0x1830
|
||||
- knights-tour/knights-tour.c::solveKTUtil@0x1980
|
||||
- life/life.c::getDown@0x1960
|
||||
- life/life.c::getDownLeft@0x19f0
|
||||
- life/life.c::getDownRight@0x1a20
|
||||
- life/life.c::getLeft@0x18d0
|
||||
- life/life.c::getNumNeigbors@0x16d0
|
||||
- life/life.c::getRight@0x1900
|
||||
- life/life.c::getUp@0x1930
|
||||
- life/life.c::getUpLeft@0x1990
|
||||
- life/life.c::getUpRight@0x19c0
|
||||
- life/life.c::main@0x1100
|
||||
- life/life.c::process@0x1430
|
||||
- longdiv/longdiv.c::main@0x1120
|
||||
- longdiv/longdiv.c::sub@0x1a80
|
||||
- lu-decomp/lu-decomp.c::main@0x1100
|
||||
- lu-decomp/lu-decomp.c::print_matrix@0x1320
|
||||
- mandelbrot/mandelbrot.c::main@0x1100
|
||||
- max-subseq/max-subseq.c::lcsAlgo@0x1290
|
||||
- max-subseq/max-subseq.c::main@0x1120
|
||||
- mersenne/mersenne.c::genrand@0x1380
|
||||
- mersenne/mersenne.c::lsgenrand@0x1320
|
||||
- mersenne/mersenne.c::main@0x1100
|
||||
- mersenne/mersenne.c::sgenrand@0x12d0
|
||||
- minspan/minspan.c::displayGraph@0x1db0
|
||||
- minspan/minspan.c::displayGraph1@0x1ee0
|
||||
- minspan/minspan.c::displayPath@0x2020
|
||||
- minspan/minspan.c::displayTree@0x20c0
|
||||
- minspan/minspan.c::main@0x1100
|
||||
- minspan/minspan.c::minSpanTree@0x1400
|
||||
- monte-carlo/monte-carlo.c::main@0x1100
|
||||
- murmur-hash/murmur-hash.c::main@0x1100
|
||||
- murmur-hash/murmur-hash.c::murmurhash@0x1290
|
||||
- n-queens/n-queens.c::main@0x1120
|
||||
- natlog/natlog.c::main@0x1100
|
||||
- nbody-sim/nbody-sim.c::main@0x1100
|
||||
- packet-filter/packet-filter.c::check_packet_filter@0x1520
|
||||
- packet-filter/packet-filter.c::generate_packet@0x13d0
|
||||
- packet-filter/packet-filter.c::main@0x1100
|
||||
- packet-filter/packet-filter.c::print_packet@0x1580
|
||||
- parrondo/parrondo.c::main@0x1100
|
||||
- pascal/pascal.c::main@0x1100
|
||||
- pi-calc/pi-calc.c::main@0x1100
|
||||
- primal-test/primal-test.c::main@0x1100
|
||||
- priority-queue/priority-queue.c::main@0x1120
|
||||
- priority-queue/priority-queue.c::newNode@0x13a0
|
||||
- priority-queue/priority-queue.c::push@0x1420
|
||||
- qsort-demo/qsort-demo.c::main@0x1120
|
||||
- qsort-demo/qsort-demo.c::print_struct_array@0x15b0
|
||||
- qsort-demo/qsort-demo.c::sort_cstrings_example@0x1480
|
||||
- qsort-demo/qsort-demo.c::sort_integers_example@0x1310
|
||||
- qsort-demo/qsort-demo.c::sort_structs_example@0x1630
|
||||
- qsort-test/qsort-test.c::main@0x1120
|
||||
- quaternions/quaternions.c::euler_from_quat@0x1550
|
||||
- quaternions/quaternions.c::main@0x1100
|
||||
- quaternions/quaternions.c::quat_from_euler@0x13e0
|
||||
- quaternions/quaternions.c::quaternion_multiply@0x1670
|
||||
- rabinkarp-search/rabinkarp-search.c::main@0x1120
|
||||
- rabinkarp-search/rabinkarp-search.c::search@0x15a0
|
||||
- rand-test/rand-test.c::bad_rand@0x1240
|
||||
- rand-test/rand-test.c::main@0x1100
|
||||
- rand-test/rand-test.c::run_tests@0x1280
|
||||
- ransac/ransac.c::main@0x1100
|
||||
- regex-parser/regex-parser.c::main@0x2100
|
||||
- regex-parser/regex-parser.c::matchcharclass@0x2420
|
||||
- regex-parser/regex-parser.c::matchone@0x25c0
|
||||
- regex-parser/regex-parser.c::matchpattern@0x26d0
|
||||
- regex-parser/regex-parser.c::re_compile@0x2ac0
|
||||
- regex-parser/regex-parser.c::re_print@0x2e30
|
||||
- rle-compress/rle-compress.c::main@0x1120
|
||||
- rle-compress/rle-compress.c::run_length_encode@0x1330
|
||||
- rsa-cipher/rsa-cipher.c::main@0x1100
|
||||
- rsa-cipher/rsa-cipher.c::mod_inverse@0x15a0
|
||||
- rsa-cipher/rsa-cipher.c::mod_pow@0x14b0
|
||||
- rsa-cipher/rsa-cipher.c::print_hex_int128@0x16c0
|
||||
- sat-solver/sat-solver.c::main@0x1100
|
||||
- sat-solver/sat-solver.c::printFormula@0x1680
|
||||
- shortest-path/shortest-path.c::floydWarshall@0x1330
|
||||
- shortest-path/shortest-path.c::main@0x1100
|
||||
- sieve/sieve.c::main@0x1100
|
||||
- simple-grep/simple-grep.c::main@0x1120
|
||||
- spelt2num/spelt2num.c::main@0x1100
|
||||
- spirograph/spirograph.c::spirograph@0x1230
|
||||
- sudoku-solver/sudoku-solver.c::main@0x1100
|
||||
- tetris-sim/tetris-sim.c::aggregate_height@0x1b20
|
||||
- tetris-sim/tetris-sim.c::best_move@0x21d0
|
||||
- tetris-sim/tetris-sim.c::count_holes@0x1b70
|
||||
- tetris-sim/tetris-sim.c::evaluate_board@0x1ca0
|
||||
- tetris-sim/tetris-sim.c::main@0x1100
|
||||
- tiny-NN/tiny-NN.c::main@0x1120
|
||||
- tiny-NN/tiny-NN.c::sampleSine@0x12d0
|
||||
- tiny-NN/tiny-NN.c::train@0x13e0
|
||||
- topo-sort/topo-sort.c::addEdge@0x13f0
|
||||
- topo-sort/topo-sort.c::createGraph@0x1380
|
||||
- topo-sort/topo-sort.c::createListNode@0x1360
|
||||
- topo-sort/topo-sort.c::createStackNode@0x1340
|
||||
- topo-sort/topo-sort.c::main@0x1120
|
||||
- topo-sort/topo-sort.c::topologicalSort@0x18b0
|
||||
- topo-sort/topo-sort.c::topologicalSortUtil@0x1440
|
||||
- totient/totient.c::main@0x1100
|
||||
- transcend/transcend.c::main@0x1120
|
||||
- uniquify/uniquify.c::main@0x1120
|
||||
- vectors-3d/vectors-3d.c::get_cross_matrix@0x1850
|
||||
- vectors-3d/vectors-3d.c::main@0x1100
|
||||
- vectors-3d/vectors-3d.c::print_vector@0x1730
|
||||
- vectors-3d/vectors-3d.c::unit_vec@0x17a0
|
||||
- vectors-3d/vectors-3d.c::vector_add@0x1650
|
||||
- vectors-3d/vectors-3d.c::vector_prod@0x16b0
|
||||
- vectors-3d/vectors-3d.c::vector_sub@0x1620
|
||||
- verlet/verlet.c::main@0x1100
|
||||
- weekday/weekday.c::dayOfWeek@0x1290
|
||||
- weekday/weekday.c::main@0x1100
|
||||
|
||||
## Execution Failures
|
||||
- cipher/cipher.c::decipher@0x1360
|
||||
- idct-alg/idct-alg.c::idct_2d@0x12f0
|
||||
- life/life.c::init@0x12c0
|
||||
- ransac/ransac.c::ransac_line_fitting@0x1410
|
||||
- sat-solver/sat-solver.c::solveSAT@0x13a0
|
||||
- spirograph/spirograph.c::test@0x1390
|
||||
- tetris-sim/tetris-sim.c::clear_lines@0x19a0
|
||||
- vectors-3d/vectors-3d.c::get_angle@0x18c0
|
||||
493
sk2decompile/evaluation/bringupbench/scripts/build-func-maps.py
Normal file
493
sk2decompile/evaluation/bringupbench/scripts/build-func-maps.py
Normal file
|
|
@ -0,0 +1,493 @@
|
|||
#!/usr/bin/env python3
|
||||
"""Generate function-level mappings across source, pseudo, and assembly outputs."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import Dict, List, Optional
|
||||
import subprocess
|
||||
|
||||
FUNC_KEYWORDS = {"if", "for", "while", "switch", "return", "sizeof", "do", "case", "else"}
|
||||
|
||||
TYPEDEF_MAP = {
|
||||
"cpu_set_t": "int",
|
||||
"nl_item": "int",
|
||||
"__time_t": "int",
|
||||
"__mode_t": "unsigned short",
|
||||
"__off64_t": "long long",
|
||||
"__blksize_t": "long",
|
||||
"__ino_t": "unsigned long",
|
||||
"__blkcnt_t": "unsigned long long",
|
||||
"__syscall_slong_t": "long",
|
||||
"__ssize_t": "long int",
|
||||
"wchar_t": "unsigned short int",
|
||||
"wctype_t": "unsigned short int",
|
||||
"__int64": "long long",
|
||||
"__int32": "int",
|
||||
"__int16": "short",
|
||||
"__int8": "char",
|
||||
"_QWORD": "uint64_t",
|
||||
"_OWORD": "long double",
|
||||
"_DWORD": "uint32_t",
|
||||
"size_t": "unsigned int",
|
||||
"_BYTE": "uint8_t",
|
||||
"_TBYTE": "uint16_t",
|
||||
"_BOOL8": "uint8_t",
|
||||
"gcc_va_list": "va_list",
|
||||
"_WORD": "unsigned short",
|
||||
"_BOOL4": "int",
|
||||
"__va_list_tag": "va_list",
|
||||
"_IO_FILE": "FILE",
|
||||
"DIR": "int",
|
||||
"__fsword_t": "long",
|
||||
"__kernel_ulong_t": "int",
|
||||
"cc_t": "int",
|
||||
"speed_t": "int",
|
||||
"fd_set": "int",
|
||||
"__suseconds_t": "int",
|
||||
"_UNKNOWN": "void",
|
||||
"__sighandler_t": "void (*)(int)",
|
||||
"__compar_fn_t": "int (*)(const void *, const void *)",
|
||||
}
|
||||
|
||||
|
||||
def _load_config_env() -> dict:
|
||||
"""Load config.env from the eval project root."""
|
||||
eval_root = Path(__file__).resolve().parents[1]
|
||||
config_path = eval_root / "config.env"
|
||||
config = {}
|
||||
if config_path.exists():
|
||||
for line in config_path.read_text().splitlines():
|
||||
line = line.strip()
|
||||
if not line or line.startswith("#"):
|
||||
continue
|
||||
if "=" in line:
|
||||
key, _, value = line.partition("=")
|
||||
config[key.strip()] = value.strip()
|
||||
return config
|
||||
|
||||
|
||||
def _get_bench_root(cli_value: str | None = None) -> Path:
|
||||
"""Resolve the benchmark repo root from CLI arg, env var, or config.env."""
|
||||
if cli_value:
|
||||
return Path(cli_value).resolve()
|
||||
env_val = os.environ.get("BENCH_REPO_ROOT")
|
||||
if env_val:
|
||||
return Path(env_val).resolve()
|
||||
config = _load_config_env()
|
||||
if "BENCH_REPO_ROOT" in config:
|
||||
return Path(config["BENCH_REPO_ROOT"]).resolve()
|
||||
sys.exit("error: BENCH_REPO_ROOT not set. Use --bench-root, set the env var, or configure config.env")
|
||||
|
||||
|
||||
def _read_text(path: Path) -> str:
|
||||
return path.read_text(encoding="utf-8")
|
||||
|
||||
|
||||
def _strip_empty(code: str) -> str:
|
||||
return "\n".join(line for line in code.splitlines() if line.strip())
|
||||
|
||||
|
||||
def _good_func(func: str) -> bool:
|
||||
body = "{".join(func.split("{", 1)[1:]) if "{" in func else func
|
||||
total = 0
|
||||
for line in body.splitlines():
|
||||
if len(line.strip()) >= 3:
|
||||
total += 1
|
||||
return 3 < total < 300
|
||||
|
||||
|
||||
def _format_with_clang(func: str, style: str = "Google") -> Optional[str]:
|
||||
if not func:
|
||||
return None
|
||||
cmd = ["clang-format", f"--style={style}"]
|
||||
try:
|
||||
proc = subprocess.run(
|
||||
cmd,
|
||||
input=func,
|
||||
text=True,
|
||||
capture_output=True,
|
||||
check=True,
|
||||
timeout=15,
|
||||
)
|
||||
return proc.stdout
|
||||
except Exception as e:
|
||||
print(e)
|
||||
return None
|
||||
|
||||
|
||||
def _hex_to_dec(text: str) -> str:
|
||||
pattern = re.compile(r"\b(0x[0-9a-fA-F]+)([uUlL]{1,3})?\b")
|
||||
|
||||
def convert(match: re.Match[str]) -> str:
|
||||
hex_part = match.group(1)
|
||||
suffix = match.group(2) or ""
|
||||
return str(int(hex_part, 16)) + suffix
|
||||
|
||||
return pattern.sub(convert, text)
|
||||
|
||||
|
||||
def _remove_keywords(text: str) -> str:
|
||||
patterns = [
|
||||
r"\b__fastcall\b",
|
||||
r"\b__cdecl\b",
|
||||
r"\b__ptr32\b",
|
||||
r"\b__noreturn\s+noreturn\b",
|
||||
]
|
||||
combined = re.compile("|".join(patterns))
|
||||
return combined.sub("", text)
|
||||
|
||||
def _replace_typedefs(text: str) -> str:
|
||||
for alias, original in TYPEDEF_MAP.items():
|
||||
pattern = re.compile(rf"\b{re.escape(alias)}\b")
|
||||
text = pattern.sub(original, text)
|
||||
return text
|
||||
|
||||
|
||||
def _remove_comments(text: str) -> str:
|
||||
text = re.sub(r"/\*.*?\*/", "", text, flags=re.DOTALL)
|
||||
text = re.sub(r"//.*?$", "", text, flags=re.MULTILINE)
|
||||
return text
|
||||
|
||||
|
||||
def _process_code(code_str: str) -> str:
|
||||
code_str = _remove_comments(code_str)
|
||||
code_str = _hex_to_dec(code_str)
|
||||
code_str = _remove_keywords(code_str)
|
||||
code_str = _replace_typedefs(code_str)
|
||||
return code_str
|
||||
|
||||
|
||||
def _normalize_pseudo(text: str) -> str:
|
||||
processed = _process_code(text)
|
||||
if not processed.strip():
|
||||
return ""
|
||||
formatted = _format_with_clang(processed)
|
||||
if formatted is None:
|
||||
return ""
|
||||
cleaned = _strip_empty(formatted)
|
||||
if not cleaned or not _good_func(cleaned):
|
||||
return ""
|
||||
return cleaned
|
||||
|
||||
|
||||
def _strip_comments_and_strings(text: str) -> str:
|
||||
result = list(text)
|
||||
i = 0
|
||||
length = len(text)
|
||||
while i < length:
|
||||
nxt = text[i : i + 2]
|
||||
ch = text[i]
|
||||
if nxt == "//":
|
||||
end = text.find("\n", i)
|
||||
if end == -1:
|
||||
end = length
|
||||
for j in range(i, end):
|
||||
result[j] = " "
|
||||
i = end
|
||||
continue
|
||||
if nxt == "/*":
|
||||
end = text.find("*/", i + 2)
|
||||
if end == -1:
|
||||
end = length - 2
|
||||
for j in range(i, end + 2):
|
||||
result[j] = " "
|
||||
i = end + 2
|
||||
continue
|
||||
if ch in {'"', "'"}:
|
||||
quote = ch
|
||||
result[i] = " "
|
||||
i += 1
|
||||
while i < length:
|
||||
c = text[i]
|
||||
result[i] = " "
|
||||
if c == "\\":
|
||||
i += 2
|
||||
continue
|
||||
if c == quote:
|
||||
i += 1
|
||||
break
|
||||
i += 1
|
||||
continue
|
||||
i += 1
|
||||
return "".join(result)
|
||||
|
||||
def _find_matching_brace(text: str, start_idx: int) -> int:
|
||||
depth = 0
|
||||
i = start_idx
|
||||
length = len(text)
|
||||
while i < length:
|
||||
nxt = text[i : i + 2]
|
||||
ch = text[i]
|
||||
if nxt == "//":
|
||||
i = text.find("\n", i)
|
||||
if i == -1:
|
||||
return length - 1
|
||||
continue
|
||||
if nxt == "/*":
|
||||
i = text.find("*/", i + 2)
|
||||
if i == -1:
|
||||
return length - 1
|
||||
i += 2
|
||||
continue
|
||||
if ch in {'"', "'"}:
|
||||
quote = ch
|
||||
i += 1
|
||||
while i < length:
|
||||
c = text[i]
|
||||
if c == "\\":
|
||||
i += 2
|
||||
continue
|
||||
if c == quote:
|
||||
i += 1
|
||||
break
|
||||
i += 1
|
||||
continue
|
||||
if ch == "{":
|
||||
depth += 1
|
||||
elif ch == "}":
|
||||
depth -= 1
|
||||
if depth == 0:
|
||||
return i
|
||||
i += 1
|
||||
return length - 1
|
||||
|
||||
|
||||
def _extract_source_functions(path: Path, repo_root: Path) -> Dict[str, Dict[str, str]]:
|
||||
text = _read_text(path)
|
||||
sanitized = _strip_comments_and_strings(text)
|
||||
pattern = re.compile(
|
||||
r"(?P<prefix>^|[;\n}])(?P<signature>[^{;}]*?)\b(?P<name>[A-Za-z_][\w]*)\s*\([^;{}]*\)\s*\{",
|
||||
re.MULTILINE,
|
||||
)
|
||||
funcs: Dict[str, Dict[str, str]] = {}
|
||||
for match in pattern.finditer(sanitized):
|
||||
name = match.group("name")
|
||||
if name in FUNC_KEYWORDS:
|
||||
continue
|
||||
brace_idx = sanitized.find("{", match.start("signature"))
|
||||
if brace_idx == -1:
|
||||
continue
|
||||
end_idx = _find_matching_brace(text, brace_idx)
|
||||
if end_idx <= brace_idx:
|
||||
continue
|
||||
start_idx = match.start("signature")
|
||||
content = text[start_idx : end_idx + 1].strip("\n") + "\n"
|
||||
funcs.setdefault(
|
||||
name,
|
||||
{
|
||||
"path": str(path.relative_to(repo_root)),
|
||||
"function_name": name,
|
||||
"content": content,
|
||||
},
|
||||
)
|
||||
return funcs
|
||||
|
||||
def _parse_makefile(makefile: Path) -> List[Path]:
|
||||
text = _read_text(makefile)
|
||||
prog_match = re.search(r"^PROG\s*=\s*(\S+)", text, flags=re.MULTILINE)
|
||||
if not prog_match:
|
||||
raise RuntimeError(f"PROG not found in {makefile}")
|
||||
prog = prog_match.group(1).strip()
|
||||
objs_match = re.search(r"^LOCAL_OBJS\s*=\s*(.*)$", text, flags=re.MULTILINE)
|
||||
obj_tokens: List[str] = []
|
||||
if objs_match:
|
||||
obj_tokens = [token for token in objs_match.group(1).split() if token]
|
||||
if not obj_tokens:
|
||||
obj_tokens = [f"{prog}.o"]
|
||||
src_paths: List[Path] = []
|
||||
for token in obj_tokens:
|
||||
if not token.endswith(".o"):
|
||||
continue
|
||||
candidate = makefile.parent / token.replace(".o", ".c")
|
||||
if candidate.exists():
|
||||
src_paths.append(candidate)
|
||||
if not src_paths:
|
||||
fallback = makefile.parent / f"{prog}.c"
|
||||
if fallback.exists():
|
||||
src_paths.append(fallback)
|
||||
return src_paths
|
||||
|
||||
|
||||
def _collect_source_functions(bench_dir: Path, repo_root: Path) -> Dict[str, Dict[str, str]]:
|
||||
makefile = bench_dir / "Makefile"
|
||||
srcs = _parse_makefile(makefile)
|
||||
func_map: Dict[str, Dict[str, str]] = {}
|
||||
for src in srcs:
|
||||
func_map.update(_extract_source_functions(src, repo_root))
|
||||
return func_map
|
||||
|
||||
|
||||
def _parse_pseudo(pseudo_path: Path, repo_root: Path) -> Dict[str, Dict[str, str]]:
|
||||
text = _read_text(pseudo_path)
|
||||
lines = text.splitlines()
|
||||
pattern = re.compile(r"^/\*\s*(?P<name>[^@]+?)\s*@\s*(?P<addr>0x[0-9a-fA-F]+)\s*\*/$")
|
||||
current: Optional[str] = None
|
||||
current_addr: Optional[str] = None
|
||||
buffer: List[str] = []
|
||||
out: Dict[str, Dict[str, str]] = {}
|
||||
for raw_line in lines:
|
||||
line = raw_line.strip()
|
||||
match = pattern.match(line)
|
||||
if match:
|
||||
if current and buffer:
|
||||
content = "\n".join(buffer).strip("\n") + "\n"
|
||||
out.setdefault(
|
||||
current,
|
||||
{
|
||||
"path": str(pseudo_path.relative_to(repo_root)),
|
||||
"function_name": current,
|
||||
"address": current_addr,
|
||||
"label": current,
|
||||
"content": content,
|
||||
},
|
||||
)
|
||||
current = match.group("name").strip()
|
||||
current_addr = match.group("addr")
|
||||
buffer = []
|
||||
else:
|
||||
if current is not None:
|
||||
buffer.append(raw_line)
|
||||
if current and buffer:
|
||||
content = "\n".join(buffer).strip("\n") + "\n"
|
||||
out.setdefault(
|
||||
current,
|
||||
{
|
||||
"path": str(pseudo_path.relative_to(repo_root)),
|
||||
"function_name": current,
|
||||
"address": current_addr,
|
||||
"label": current,
|
||||
"content": content,
|
||||
},
|
||||
)
|
||||
return out
|
||||
|
||||
def _clean_instruction(raw: str) -> Optional[str]:
|
||||
stripped = raw.strip()
|
||||
if not stripped:
|
||||
return None
|
||||
parts = raw.split("\t")
|
||||
if len(parts) >= 3:
|
||||
relevant = parts[2:]
|
||||
elif len(parts) == 2:
|
||||
relevant = parts[1:]
|
||||
else:
|
||||
relevant = [stripped]
|
||||
instr = "\t".join(relevant)
|
||||
instr = instr.split("#")[0].strip()
|
||||
if not instr:
|
||||
return None
|
||||
if all(c in "0123456789abcdefABCDEF" for c in instr.replace(" ", "")):
|
||||
return None
|
||||
return instr
|
||||
|
||||
|
||||
def _clean_asm_block(name: str, lines: List[str]) -> str:
|
||||
cleaned = [f"<{name}>:"]
|
||||
for raw in lines[1:]:
|
||||
instr = _clean_instruction(raw)
|
||||
if instr:
|
||||
cleaned.append(instr)
|
||||
return "\n".join(cleaned) + "\n"
|
||||
|
||||
|
||||
def _parse_assembly(asm_path: Path) -> Dict[str, str]:
|
||||
lines = _read_text(asm_path).splitlines()
|
||||
header = re.compile(r"^\s*([0-9a-fA-F]+)\s+<([^>]+)>:\s*$")
|
||||
current: Optional[str] = None
|
||||
buffer: List[str] = []
|
||||
result: Dict[str, str] = {}
|
||||
for line in lines:
|
||||
match = header.match(line)
|
||||
if match:
|
||||
if current and buffer:
|
||||
result.setdefault(current, _clean_asm_block(current, buffer))
|
||||
current = match.group(2)
|
||||
buffer = [line]
|
||||
else:
|
||||
if current is not None:
|
||||
buffer.append(line)
|
||||
if current and buffer:
|
||||
result.setdefault(current, _clean_asm_block(current, buffer))
|
||||
return result
|
||||
|
||||
|
||||
def _discover_binaries(explicit: Optional[List[str]], repo_root: Path) -> List[Path]:
|
||||
if explicit:
|
||||
binaries: List[Path] = []
|
||||
for entry in explicit:
|
||||
candidate = Path(entry)
|
||||
if not candidate.is_absolute():
|
||||
candidate = repo_root / candidate
|
||||
if candidate.exists():
|
||||
binaries.append(candidate)
|
||||
return binaries
|
||||
matches = []
|
||||
for path in repo_root.rglob("*.O*"):
|
||||
suffix = path.suffix.lower()
|
||||
if suffix in {".o0", ".o1", ".o2", ".o3"}:
|
||||
matches.append(path)
|
||||
return sorted(matches)
|
||||
|
||||
def _build_map(binary: Path, repo_root: Path) -> None:
|
||||
pseudo_path = Path(str(binary) + ".pseudo")
|
||||
asm_path = Path(str(binary) + ".s")
|
||||
if not pseudo_path.exists() or not asm_path.exists():
|
||||
print(f"[skip] Missing pseudo or assembly for {binary.relative_to(repo_root)}")
|
||||
return
|
||||
bench_dir = binary.parent
|
||||
source_funcs = _collect_source_functions(bench_dir, repo_root)
|
||||
pseudo_funcs = _parse_pseudo(pseudo_path, repo_root)
|
||||
asm_funcs = _parse_assembly(asm_path)
|
||||
common = sorted(set(source_funcs) & set(pseudo_funcs) & set(asm_funcs))
|
||||
if not common:
|
||||
print(f"[warn] No overlapping functions for {binary.relative_to(repo_root)}")
|
||||
return
|
||||
output_path = Path(str(binary) + ".func_map.jsonl")
|
||||
rel_binary = str(binary.relative_to(repo_root))
|
||||
with output_path.open("w", encoding="utf-8") as handle:
|
||||
for name in common:
|
||||
pseudo_entry = pseudo_funcs[name]
|
||||
pseudo_norm = _normalize_pseudo(pseudo_entry.get("content", ""))
|
||||
record = {
|
||||
"source": source_funcs[name],
|
||||
"pseudo": pseudo_entry,
|
||||
"pseudo_normalize": pseudo_norm,
|
||||
"binary": rel_binary,
|
||||
"assembly": asm_funcs[name],
|
||||
}
|
||||
handle.write(json.dumps(record, ensure_ascii=False))
|
||||
handle.write("\n")
|
||||
print(f"[ok] {output_path.relative_to(repo_root)} -> {len(common)} functions")
|
||||
|
||||
|
||||
def main(argv: List[str]) -> int:
|
||||
parser = argparse.ArgumentParser(description="Map source/pseudo/assembly per function")
|
||||
parser.add_argument(
|
||||
"--binary",
|
||||
action="append",
|
||||
help="Specific binary path (relative to repo) to process; can be repeated.",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--bench-root",
|
||||
default=None,
|
||||
help="Path to the Bringup-Bench repository root (default: from config.env).",
|
||||
)
|
||||
args = parser.parse_args(argv)
|
||||
repo_root = _get_bench_root(args.bench_root)
|
||||
binaries = _discover_binaries(args.binary, repo_root)
|
||||
if not binaries:
|
||||
print("No binaries found", file=sys.stderr)
|
||||
return 1
|
||||
for binary in binaries:
|
||||
_build_map(binary, repo_root)
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main(sys.argv[1:]))
|
||||
24
sk2decompile/evaluation/bringupbench/scripts/build-host-opt-levels.sh
Executable file
24
sk2decompile/evaluation/bringupbench/scripts/build-host-opt-levels.sh
Executable file
|
|
@ -0,0 +1,24 @@
|
|||
#!/bin/bash
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
EVAL_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
|
||||
|
||||
# Load config; allow environment overrides
|
||||
if [[ -f "${EVAL_ROOT}/config.env" ]]; then
|
||||
set -a
|
||||
source "${EVAL_ROOT}/config.env"
|
||||
set +a
|
||||
fi
|
||||
|
||||
BENCH_REPO_ROOT="${BENCH_REPO_ROOT:?Set BENCH_REPO_ROOT in config.env or environment}"
|
||||
|
||||
cd "${BENCH_REPO_ROOT}"
|
||||
|
||||
for opt in 0 1 2 3; do
|
||||
echo "==> Building host binaries with -O${opt}"
|
||||
make TARGET=host OPT_CFLAGS="-O${opt} -g" run-tests
|
||||
find . -maxdepth 2 -type f -name '*.host' -execdir mv {} {}.O${opt} \;
|
||||
done
|
||||
|
||||
echo "All host optimization builds complete."
|
||||
21
sk2decompile/evaluation/bringupbench/scripts/clean-all-benchmarks.sh
Executable file
21
sk2decompile/evaluation/bringupbench/scripts/clean-all-benchmarks.sh
Executable file
|
|
@ -0,0 +1,21 @@
|
|||
#!/bin/bash
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
EVAL_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
|
||||
|
||||
# Load config; allow environment overrides
|
||||
if [[ -f "${EVAL_ROOT}/config.env" ]]; then
|
||||
set -a
|
||||
source "${EVAL_ROOT}/config.env"
|
||||
set +a
|
||||
fi
|
||||
|
||||
BENCH_REPO_ROOT="${BENCH_REPO_ROOT:?Set BENCH_REPO_ROOT in config.env or environment}"
|
||||
|
||||
cd "${BENCH_REPO_ROOT}"
|
||||
|
||||
echo "==> Running make all-clean"
|
||||
make all-clean
|
||||
|
||||
echo "All benchmarks cleaned."
|
||||
50
sk2decompile/evaluation/bringupbench/scripts/decompile-all-pseudo.sh
Executable file
50
sk2decompile/evaluation/bringupbench/scripts/decompile-all-pseudo.sh
Executable file
|
|
@ -0,0 +1,50 @@
|
|||
#!/bin/bash
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
EVAL_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
|
||||
|
||||
# Load config; allow environment overrides
|
||||
if [[ -f "${EVAL_ROOT}/config.env" ]]; then
|
||||
set -a
|
||||
source "${EVAL_ROOT}/config.env"
|
||||
set +a
|
||||
fi
|
||||
|
||||
BENCH_REPO_ROOT="${BENCH_REPO_ROOT:?Set BENCH_REPO_ROOT in config.env or environment}"
|
||||
|
||||
IDA_BIN="${IDA_BIN:-/home/bairidreamer/software/IDA-Pro/idat}"
|
||||
DUMP_SCRIPT="${EVAL_ROOT}/scripts/dump_pseudo.py"
|
||||
|
||||
if [[ ! -x "${IDA_BIN}" ]]; then
|
||||
echo "error: IDA binary not found or not executable at ${IDA_BIN}" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [[ ! -f "${DUMP_SCRIPT}" ]]; then
|
||||
echo "error: dump script not found at ${DUMP_SCRIPT}" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
readarray -t BINARIES < <(
|
||||
find "${BENCH_REPO_ROOT}" -mindepth 2 -maxdepth 2 -type f \
|
||||
\( -iname '*.o0' -o -iname '*.o1' -o -iname '*.o2' -o -iname '*.o3' \) \
|
||||
! -path "${BENCH_REPO_ROOT}/scripts/*" \
|
||||
! -path "${BENCH_REPO_ROOT}/target/*" \
|
||||
! -path "${BENCH_REPO_ROOT}/common/*" \
|
||||
! -path "${BENCH_REPO_ROOT}/.git/*" \
|
||||
| sort
|
||||
)
|
||||
|
||||
if [[ ${#BINARIES[@]} -eq 0 ]]; then
|
||||
echo "error: no O0/O1/O2/O3 binaries found under ${BENCH_REPO_ROOT}" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
for binary_path in "${BINARIES[@]}"; do
|
||||
output_path="${binary_path}.pseudo"
|
||||
echo "==> Decompiling ${binary_path#${BENCH_REPO_ROOT}/} -> ${output_path#${BENCH_REPO_ROOT}/}"
|
||||
"${IDA_BIN}" -A "-S${DUMP_SCRIPT} ${output_path}" "${binary_path}"
|
||||
done
|
||||
|
||||
echo "All pseudocode dumps are located alongside their binaries."
|
||||
66
sk2decompile/evaluation/bringupbench/scripts/disasm-all-objdump.sh
Executable file
66
sk2decompile/evaluation/bringupbench/scripts/disasm-all-objdump.sh
Executable file
|
|
@ -0,0 +1,66 @@
|
|||
#!/bin/bash
|
||||
set -euo pipefail
|
||||
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
EVAL_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
|
||||
|
||||
# Load config; allow environment overrides
|
||||
if [[ -f "${EVAL_ROOT}/config.env" ]]; then
|
||||
set -a
|
||||
source "${EVAL_ROOT}/config.env"
|
||||
set +a
|
||||
fi
|
||||
|
||||
BENCH_REPO_ROOT="${BENCH_REPO_ROOT:?Set BENCH_REPO_ROOT in config.env or environment}"
|
||||
|
||||
OBJDUMP_BIN="${OBJDUMP:-objdump}"
|
||||
NUM_JOBS="${JOBS:-}"
|
||||
|
||||
if ! command -v "${OBJDUMP_BIN}" >/dev/null 2>&1; then
|
||||
echo "error: objdump binary '${OBJDUMP_BIN}' not found" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [[ -z "${NUM_JOBS}" ]]; then
|
||||
if command -v nproc >/dev/null 2>&1; then
|
||||
NUM_JOBS="$(nproc)"
|
||||
elif [[ "$(uname)" == "Darwin" ]]; then
|
||||
NUM_JOBS="$(sysctl -n hw.ncpu)"
|
||||
else
|
||||
NUM_JOBS=4
|
||||
fi
|
||||
fi
|
||||
|
||||
if ! [[ "${NUM_JOBS}" =~ ^[0-9]+$ ]] || (( NUM_JOBS <= 0 )); then
|
||||
echo "error: invalid JOBS value '${NUM_JOBS}'" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
readarray -t BINARIES < <(
|
||||
find "${BENCH_REPO_ROOT}" -mindepth 2 -maxdepth 2 -type f \
|
||||
\( -iname '*.o0' -o -iname '*.o1' -o -iname '*.o2' -o -iname '*.o3' \) \
|
||||
! -path "${BENCH_REPO_ROOT}/scripts/*" \
|
||||
! -path "${BENCH_REPO_ROOT}/target/*" \
|
||||
! -path "${BENCH_REPO_ROOT}/common/*" \
|
||||
! -path "${BENCH_REPO_ROOT}/.git/*" \
|
||||
| sort
|
||||
)
|
||||
|
||||
if [[ ${#BINARIES[@]} -eq 0 ]]; then
|
||||
echo "error: no O0/O1/O2/O3 binaries found under ${BENCH_REPO_ROOT}" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
export OBJDUMP_BIN BENCH_REPO_ROOT
|
||||
|
||||
printf '%s\0' "${BINARIES[@]}" | xargs -0 -n1 -P "${NUM_JOBS}" bash -c '
|
||||
binary_path="$1"
|
||||
bench_repo_root="${BENCH_REPO_ROOT}"
|
||||
output_path="${binary_path}.s"
|
||||
rel_in="${binary_path#"${bench_repo_root}/"}"
|
||||
rel_out="${output_path#"${bench_repo_root}/"}"
|
||||
echo "==> Disassembling ${rel_in} -> ${rel_out}"
|
||||
"${OBJDUMP_BIN}" -d "${binary_path}" > "${output_path}"
|
||||
' _
|
||||
|
||||
echo "Assembly listings written alongside each binary (extension .s)."
|
||||
62
sk2decompile/evaluation/bringupbench/scripts/dump_pseudo.py
Normal file
62
sk2decompile/evaluation/bringupbench/scripts/dump_pseudo.py
Normal file
|
|
@ -0,0 +1,62 @@
|
|||
"""
|
||||
Headless IDA/Hex-Rays helper to dump pseudocode for every discovered function.
|
||||
Usage (from shell):
|
||||
idat -A -S"scripts/dump_pseudo.py /path/to/output" /path/to/binary
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
import sys
|
||||
|
||||
import ida_auto
|
||||
import ida_funcs
|
||||
import ida_hexrays
|
||||
import ida_pro
|
||||
import idautils
|
||||
import idc
|
||||
|
||||
|
||||
def _get_output_path() -> str:
|
||||
# IDA populates idc.ARGV with the script path at index 0 and the
|
||||
# user-provided arguments afterwards.
|
||||
if len(idc.ARGV) < 2:
|
||||
raise RuntimeError("output path argument missing")
|
||||
return os.path.abspath(idc.ARGV[1])
|
||||
|
||||
|
||||
def main() -> None:
|
||||
try:
|
||||
output_path = _get_output_path()
|
||||
except Exception as exc: # pragma: no cover - defensive
|
||||
print(f"[dump_pseudo] {exc}", file=sys.stderr)
|
||||
ida_pro.qexit(1)
|
||||
return
|
||||
|
||||
ida_auto.auto_wait()
|
||||
|
||||
if not ida_hexrays.init_hexrays_plugin():
|
||||
print("[dump_pseudo] Hex-Rays decompiler is unavailable", file=sys.stderr)
|
||||
ida_pro.qexit(1)
|
||||
return
|
||||
|
||||
os.makedirs(os.path.dirname(output_path), exist_ok=True)
|
||||
|
||||
with open(output_path, "w", encoding="utf-8") as handle:
|
||||
for ea in idautils.Functions():
|
||||
name = ida_funcs.get_func_name(ea)
|
||||
handle.write(f"/* {name} @ 0x{ea:x} */\n")
|
||||
try:
|
||||
cfunc = ida_hexrays.decompile(ea)
|
||||
except ida_hexrays.DecompilationFailure as exc:
|
||||
handle.write(f"// decompilation failed: {exc}\n\n")
|
||||
continue
|
||||
|
||||
handle.write(str(cfunc))
|
||||
handle.write("\n\n")
|
||||
|
||||
ida_pro.qexit(0)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
682
sk2decompile/evaluation/bringupbench/scripts/eval_infer_out.py
Normal file
682
sk2decompile/evaluation/bringupbench/scripts/eval_infer_out.py
Normal file
|
|
@ -0,0 +1,682 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
Evaluate infer-out-model2 functions by patching benchmark sources inside an
|
||||
isolated workspace, rebuilding, executing, and collecting structured logs for
|
||||
every case listed in a JSONL file.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import shutil
|
||||
import subprocess
|
||||
import sys
|
||||
from concurrent.futures import ThreadPoolExecutor, as_completed
|
||||
from dataclasses import asdict, dataclass, field
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Dict, Iterable, List, Optional, Tuple
|
||||
|
||||
|
||||
def _load_config_env() -> dict:
|
||||
"""Load config.env from the eval project root."""
|
||||
eval_root = Path(__file__).resolve().parents[1]
|
||||
config_path = eval_root / "config.env"
|
||||
config = {}
|
||||
if config_path.exists():
|
||||
for line in config_path.read_text().splitlines():
|
||||
line = line.strip()
|
||||
if not line or line.startswith("#"):
|
||||
continue
|
||||
if "=" in line:
|
||||
key, _, value = line.partition("=")
|
||||
config[key.strip()] = value.strip()
|
||||
return config
|
||||
|
||||
|
||||
def _get_bench_root(cli_value: str | None = None) -> Path:
|
||||
"""Resolve the benchmark repo root from CLI arg, env var, or config.env."""
|
||||
if cli_value:
|
||||
return Path(cli_value).resolve()
|
||||
env_val = os.environ.get("BENCH_REPO_ROOT")
|
||||
if env_val:
|
||||
return Path(env_val).resolve()
|
||||
config = _load_config_env()
|
||||
if "BENCH_REPO_ROOT" in config:
|
||||
return Path(config["BENCH_REPO_ROOT"]).resolve()
|
||||
sys.exit("error: BENCH_REPO_ROOT not set. Use --bench-root, set the env var, or configure config.env")
|
||||
|
||||
|
||||
@dataclass
|
||||
class CaseResult:
|
||||
"""Container for the outcome of processing a single case."""
|
||||
|
||||
case_id: str
|
||||
source_path: str
|
||||
benchmark_dir: str
|
||||
output_dir: str
|
||||
workspace_dir: str = ""
|
||||
artifact_dir: str = ""
|
||||
replacement_applied: bool = False
|
||||
build_status: str = "skipped" # succeeded | failed | skipped
|
||||
test_status: str = "skipped"
|
||||
notes: List[str] = field(default_factory=list)
|
||||
errors: List[str] = field(default_factory=list)
|
||||
log_files: Dict[str, str] = field(default_factory=dict)
|
||||
|
||||
|
||||
def parse_args() -> argparse.Namespace:
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Replace functions with infer-out-model2 bodies, build, "
|
||||
"execute, and record results without modifying the original benchmarks."
|
||||
)
|
||||
parser.add_argument(
|
||||
"jsonl",
|
||||
help="Path to the merged.*.jsonl file containing cases to evaluate.",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--bench-root",
|
||||
default=None,
|
||||
help="Path to the Bringup-Bench repository root (default: from config.env).",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--limit",
|
||||
type=int,
|
||||
default=None,
|
||||
help="Optional limit on the number of cases to process.",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--target",
|
||||
default="host",
|
||||
help="Benchmark build target passed as TARGET=<target> (default: host).",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--report-dir",
|
||||
default="reports/infer_out_eval",
|
||||
help="Directory (relative to eval root) where aggregated reports are written.",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--workspace-root",
|
||||
default="reports/infer_out_eval/workspaces",
|
||||
help="Directory (relative to eval root) to host temporary build workspaces.",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--skip-clean",
|
||||
action="store_true",
|
||||
help="Skip running 'make clean' inside the workspace (useful when iterating).",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--keep-workspaces",
|
||||
action="store_true",
|
||||
help="Keep temporary workspaces after each case finishes (default removes them).",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--command-timeout",
|
||||
type=int,
|
||||
default=20,
|
||||
help="Timeout (in seconds) for each make invocation; 0 disables the timeout.",
|
||||
)
|
||||
parser.add_argument(
|
||||
"--jobs",
|
||||
type=int,
|
||||
default=96,
|
||||
help="Number of cases to process in parallel (default: 1).",
|
||||
)
|
||||
return parser.parse_args()
|
||||
|
||||
|
||||
def canonicalize(text: str) -> str:
|
||||
"""Normalize newlines for reliable substring matching."""
|
||||
return text.replace("\r\n", "\n")
|
||||
|
||||
|
||||
def replace_function_body(
|
||||
full_source: str, reference_function: str, inferred_function: str
|
||||
) -> Tuple[str, bool]:
|
||||
"""
|
||||
Replace the exact reference_function text with inferred_function.
|
||||
|
||||
Returns the updated source and a boolean indicating if replacement happened.
|
||||
"""
|
||||
source_norm = canonicalize(full_source)
|
||||
reference_norm = canonicalize(reference_function)
|
||||
inferred_norm = canonicalize(inferred_function).rstrip() + "\n"
|
||||
|
||||
candidates = (
|
||||
reference_norm,
|
||||
reference_norm.rstrip() + "\n",
|
||||
reference_norm.strip(),
|
||||
)
|
||||
|
||||
for snippet in candidates:
|
||||
start_idx = source_norm.find(snippet)
|
||||
if start_idx == -1:
|
||||
continue
|
||||
end_idx = start_idx + len(snippet)
|
||||
updated = source_norm[:start_idx] + inferred_norm + source_norm[end_idx:]
|
||||
return updated, True
|
||||
return full_source, False
|
||||
|
||||
|
||||
def compose_case_id(case: Dict) -> str:
|
||||
"""Build a stable identifier for a case."""
|
||||
return (
|
||||
f"{case['source']['path']}::{case['source']['function_name']}"
|
||||
f"@{case['pseudo']['address']}"
|
||||
)
|
||||
|
||||
|
||||
def ensure_case_output_dir(
|
||||
output_root: Path, pseudo_path_str: str, pseudo_address: str, result: CaseResult
|
||||
) -> Path:
|
||||
"""Create the per-case output directory, handling file path collisions."""
|
||||
pseudo_rel = Path(pseudo_path_str)
|
||||
base_dir = output_root / pseudo_rel
|
||||
|
||||
if base_dir.exists() and base_dir.is_file():
|
||||
fallback = base_dir.parent / f"{base_dir.name}.infer_eval"
|
||||
fallback.mkdir(parents=True, exist_ok=True)
|
||||
result.notes.append(
|
||||
f"pseudo.path '{pseudo_path_str}' is a file; using '{fallback.relative_to(output_root)}' for logs."
|
||||
)
|
||||
base_dir = fallback
|
||||
else:
|
||||
base_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
case_dir = base_dir / pseudo_address
|
||||
if case_dir.exists():
|
||||
shutil.rmtree(case_dir)
|
||||
case_dir.mkdir(parents=True, exist_ok=True)
|
||||
return case_dir
|
||||
|
||||
|
||||
def run_command(
|
||||
command: List[str],
|
||||
cwd: Path,
|
||||
log_handle,
|
||||
step_name: str,
|
||||
timeout: Optional[int],
|
||||
) -> Optional[int]:
|
||||
"""Run a command, capture stdout/stderr, and write everything to log_handle."""
|
||||
log_handle.write(f"\n[{step_name}] $ {' '.join(command)}\n")
|
||||
log_handle.flush()
|
||||
try:
|
||||
completed = subprocess.run(
|
||||
command,
|
||||
cwd=str(cwd),
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.STDOUT,
|
||||
text=True,
|
||||
encoding="utf-8",
|
||||
errors="replace",
|
||||
timeout=timeout if timeout and timeout > 0 else None,
|
||||
)
|
||||
log_handle.write(completed.stdout)
|
||||
log_handle.write(f"[{step_name}] exit code: {completed.returncode}\n")
|
||||
log_handle.flush()
|
||||
return completed.returncode
|
||||
except subprocess.TimeoutExpired as exc:
|
||||
output = exc.output or exc.stdout
|
||||
if output:
|
||||
if isinstance(output, bytes):
|
||||
log_handle.write(output.decode("utf-8", "replace"))
|
||||
else:
|
||||
log_handle.write(output)
|
||||
log_handle.write(
|
||||
f"[{step_name}] timed out after {timeout} seconds; terminating process.\n"
|
||||
)
|
||||
log_handle.flush()
|
||||
return None
|
||||
|
||||
|
||||
def write_case_artifacts(
|
||||
case_dir: Path,
|
||||
case: Dict,
|
||||
modified_source: str,
|
||||
original_source: str,
|
||||
) -> None:
|
||||
"""Persist reusable artifacts for a case."""
|
||||
(case_dir / "case.json").write_text(json.dumps(case, indent=2), encoding="utf-8")
|
||||
(case_dir / "modified_source.c").write_text(modified_source, encoding="utf-8")
|
||||
(case_dir / "original_source.c").write_text(original_source, encoding="utf-8")
|
||||
(case_dir / "original_function.c").write_text(
|
||||
canonicalize(case["source"]["content"]), encoding="utf-8"
|
||||
)
|
||||
(case_dir / "infer_function.c").write_text(
|
||||
canonicalize(case["pseudo"]["content-fix"]), encoding="utf-8"
|
||||
)
|
||||
|
||||
|
||||
def sanitize_case_id(case_id: str) -> str:
|
||||
"""Generate filesystem-safe case identifier."""
|
||||
sanitized = re.sub(r"[^A-Za-z0-9._-]+", "_", case_id)
|
||||
return sanitized.strip("_") or "case"
|
||||
|
||||
|
||||
def copy_ignore_eval_dirs(_src: str, names: List[str]) -> List[str]:
|
||||
"""Ignore helper to skip evaluation artifacts when copying benchmark dirs."""
|
||||
ignored: List[str] = []
|
||||
for name in names:
|
||||
if name.endswith(".infer_eval"):
|
||||
ignored.append(name)
|
||||
return ignored
|
||||
|
||||
|
||||
def prepare_workspace(
|
||||
repo_root: Path,
|
||||
benchmark_dir: Path,
|
||||
workspace_root: Path,
|
||||
case_id: str,
|
||||
) -> Tuple[Path, Path]:
|
||||
"""Clone the necessary subset of the repo into a temporary workspace."""
|
||||
workspace_case_root = workspace_root / sanitize_case_id(case_id)
|
||||
if workspace_case_root.exists():
|
||||
shutil.rmtree(workspace_case_root)
|
||||
workspace_repo_root = workspace_case_root / "repo"
|
||||
workspace_repo_root.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
shutil.copy2(repo_root / "Makefile", workspace_repo_root / "Makefile")
|
||||
shutil.copytree(repo_root / "common", workspace_repo_root / "common", dirs_exist_ok=True)
|
||||
shutil.copytree(repo_root / "target", workspace_repo_root / "target", dirs_exist_ok=True)
|
||||
shutil.copytree(
|
||||
benchmark_dir,
|
||||
workspace_repo_root / benchmark_dir.name,
|
||||
dirs_exist_ok=True,
|
||||
ignore=copy_ignore_eval_dirs,
|
||||
)
|
||||
return workspace_case_root, workspace_repo_root
|
||||
|
||||
|
||||
def relative_to_repo(path: Path, repo_root: Path) -> str:
|
||||
"""Return a path relative to repo_root when possible."""
|
||||
try:
|
||||
return str(path.relative_to(repo_root))
|
||||
except ValueError:
|
||||
return str(path)
|
||||
|
||||
|
||||
def init_case_result(case: Dict, repo_root: Path) -> CaseResult:
|
||||
"""Create a CaseResult with basic metadata for the given case."""
|
||||
source_rel = Path(case["source"]["path"])
|
||||
benchmark_dir_path = (repo_root / source_rel).parent
|
||||
try:
|
||||
benchmark_rel = str(benchmark_dir_path.relative_to(repo_root))
|
||||
except ValueError:
|
||||
benchmark_rel = str(benchmark_dir_path)
|
||||
return CaseResult(
|
||||
case_id=compose_case_id(case),
|
||||
source_path=str(source_rel),
|
||||
benchmark_dir=benchmark_rel,
|
||||
output_dir="",
|
||||
)
|
||||
|
||||
|
||||
def snapshot_artifacts(
|
||||
case_dir: Path,
|
||||
workspace_benchmark_dir: Path,
|
||||
eval_root: Path,
|
||||
result: CaseResult,
|
||||
) -> None:
|
||||
"""Copy the workspace benchmark directory into the case directory."""
|
||||
artifacts_dir = case_dir / "artifacts"
|
||||
if artifacts_dir.exists():
|
||||
shutil.rmtree(artifacts_dir)
|
||||
try:
|
||||
shutil.copytree(workspace_benchmark_dir, artifacts_dir)
|
||||
result.artifact_dir = relative_to_repo(artifacts_dir, eval_root)
|
||||
except Exception as exc: # pragma: no cover - defensive
|
||||
result.notes.append(f"Failed to copy artifacts: {exc}")
|
||||
|
||||
|
||||
def process_case(
|
||||
case: Dict,
|
||||
args: argparse.Namespace,
|
||||
repo_root: Path,
|
||||
eval_root: Path,
|
||||
) -> CaseResult:
|
||||
"""Process a single JSONL entry."""
|
||||
case_id = compose_case_id(case)
|
||||
source_rel = Path(case["source"]["path"])
|
||||
source_path = repo_root / source_rel
|
||||
benchmark_dir = source_path.parent
|
||||
|
||||
result = init_case_result(case, repo_root)
|
||||
|
||||
if not source_path.exists():
|
||||
result.errors.append(f"Source file '{source_rel}' does not exist.")
|
||||
return result
|
||||
|
||||
try:
|
||||
case_dir = ensure_case_output_dir(
|
||||
eval_root, case["pseudo"]["path"], case["pseudo"]["address"], result
|
||||
)
|
||||
except Exception as exc: # pragma: no cover - defensive
|
||||
result.errors.append(f"Failed to prepare case directory: {exc}")
|
||||
return result
|
||||
|
||||
result.output_dir = str(case_dir.relative_to(eval_root))
|
||||
|
||||
full_source_text = source_path.read_text(encoding="utf-8")
|
||||
updated_source, replaced = replace_function_body(
|
||||
full_source_text,
|
||||
case["source"]["content"],
|
||||
case["pseudo"]["content-fix"],
|
||||
)
|
||||
|
||||
if not replaced:
|
||||
result.errors.append(
|
||||
"Could not locate the original function snippet in source file."
|
||||
)
|
||||
return result
|
||||
|
||||
result.replacement_applied = True
|
||||
write_case_artifacts(case_dir, case, updated_source, full_source_text)
|
||||
|
||||
workspace_root = Path(args.workspace_root)
|
||||
if not workspace_root.is_absolute():
|
||||
workspace_root = eval_root / workspace_root
|
||||
workspace_root.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
workspace_case_root: Optional[Path] = None
|
||||
try:
|
||||
workspace_case_root, workspace_repo_root = prepare_workspace(
|
||||
repo_root, benchmark_dir, workspace_root, case_id
|
||||
)
|
||||
workspace_benchmark_dir = workspace_repo_root / benchmark_dir.name
|
||||
artifacts_captured = False
|
||||
|
||||
def capture_artifacts() -> None:
|
||||
nonlocal artifacts_captured
|
||||
if artifacts_captured:
|
||||
return
|
||||
snapshot_artifacts(case_dir, workspace_benchmark_dir, eval_root, result)
|
||||
artifacts_captured = True
|
||||
|
||||
workspace_source_path = workspace_repo_root / source_rel
|
||||
workspace_source_path.write_text(updated_source, encoding="utf-8")
|
||||
|
||||
result.workspace_dir = relative_to_repo(workspace_case_root, eval_root)
|
||||
|
||||
log_path = case_dir / "case.log"
|
||||
with log_path.open("w", encoding="utf-8") as log_handle:
|
||||
log_handle.write(f"Case: {case_id}\n")
|
||||
log_handle.write(f"Workspace: {workspace_case_root}\n")
|
||||
log_handle.write(f"Benchmark copy: {workspace_benchmark_dir}\n")
|
||||
log_handle.write(f"Target: {args.target}\n")
|
||||
log_handle.flush()
|
||||
|
||||
if not args.skip_clean:
|
||||
clean_rc = run_command(
|
||||
["make", f"TARGET={args.target}", "clean"],
|
||||
workspace_benchmark_dir,
|
||||
log_handle,
|
||||
"clean",
|
||||
args.command_timeout,
|
||||
)
|
||||
if clean_rc is None:
|
||||
result.errors.append(
|
||||
f"'make clean' timed out after {args.command_timeout} seconds."
|
||||
)
|
||||
capture_artifacts()
|
||||
result.log_files["case"] = relative_to_repo(log_path, eval_root)
|
||||
return result
|
||||
if clean_rc != 0:
|
||||
result.build_status = "failed"
|
||||
result.errors.append("make clean failed.")
|
||||
capture_artifacts()
|
||||
result.log_files["case"] = relative_to_repo(log_path, eval_root)
|
||||
return result
|
||||
else:
|
||||
log_handle.write("Skipping 'make clean' per --skip-clean flag.\n")
|
||||
|
||||
build_rc = run_command(
|
||||
["make", f"TARGET={args.target}", "build"],
|
||||
workspace_benchmark_dir,
|
||||
log_handle,
|
||||
"build",
|
||||
args.command_timeout,
|
||||
)
|
||||
|
||||
result.log_files["case"] = relative_to_repo(log_path, eval_root)
|
||||
if build_rc is None:
|
||||
result.build_status = "failed"
|
||||
result.errors.append(
|
||||
f"'make build' timed out after {args.command_timeout} seconds."
|
||||
)
|
||||
capture_artifacts()
|
||||
log_handle.write("Skipping test because build timed out.\n")
|
||||
return result
|
||||
if build_rc == 0:
|
||||
result.build_status = "succeeded"
|
||||
else:
|
||||
result.build_status = "failed"
|
||||
result.errors.append("make build failed.")
|
||||
log_handle.write("Skipping test because build failed.\n")
|
||||
capture_artifacts()
|
||||
return result
|
||||
|
||||
test_rc = run_command(
|
||||
["make", f"TARGET={args.target}", "test"],
|
||||
workspace_benchmark_dir,
|
||||
log_handle,
|
||||
"test",
|
||||
args.command_timeout,
|
||||
)
|
||||
|
||||
if test_rc is None:
|
||||
result.test_status = "failed"
|
||||
result.errors.append(
|
||||
f"'make test' timed out after {args.command_timeout} seconds."
|
||||
)
|
||||
elif test_rc == 0:
|
||||
result.test_status = "succeeded"
|
||||
else:
|
||||
result.test_status = "failed"
|
||||
result.errors.append("make test failed.")
|
||||
|
||||
capture_artifacts()
|
||||
|
||||
finally:
|
||||
if (
|
||||
workspace_case_root
|
||||
and workspace_case_root.exists()
|
||||
and not args.keep_workspaces
|
||||
):
|
||||
shutil.rmtree(workspace_case_root, ignore_errors=True)
|
||||
|
||||
return result
|
||||
|
||||
|
||||
def collect_cases(jsonl_path: Path, limit: Optional[int]) -> Iterable[Dict]:
|
||||
"""Yield cases from jsonl file respecting the optional limit."""
|
||||
processed = 0
|
||||
with jsonl_path.open("r", encoding="utf-8") as handle:
|
||||
for line in handle:
|
||||
stripped = line.strip()
|
||||
if not stripped:
|
||||
continue
|
||||
yield json.loads(stripped)
|
||||
processed += 1
|
||||
if limit is not None and processed >= limit:
|
||||
break
|
||||
|
||||
|
||||
def compute_summary(results: List[CaseResult]) -> Dict:
|
||||
"""Aggregate statistics over all case results."""
|
||||
total = len(results)
|
||||
replacements = sum(1 for r in results if r.replacement_applied)
|
||||
build_success = sum(1 for r in results if r.build_status == "succeeded")
|
||||
test_success = sum(1 for r in results if r.test_status == "succeeded")
|
||||
|
||||
def frac(passed: int, denom: int) -> float:
|
||||
return round(passed / denom, 4) if denom else 0.0
|
||||
|
||||
per_benchmark: Dict[str, Dict[str, float]] = {}
|
||||
for r in results:
|
||||
stats = per_benchmark.setdefault(
|
||||
r.benchmark_dir,
|
||||
{
|
||||
"cases": 0,
|
||||
"replacements": 0,
|
||||
"build_success": 0,
|
||||
"test_success": 0,
|
||||
},
|
||||
)
|
||||
stats["cases"] += 1
|
||||
if r.replacement_applied:
|
||||
stats["replacements"] += 1
|
||||
if r.build_status == "succeeded":
|
||||
stats["build_success"] += 1
|
||||
if r.test_status == "succeeded":
|
||||
stats["test_success"] += 1
|
||||
|
||||
for stats in per_benchmark.values():
|
||||
stats["replacement_rate"] = frac(stats["replacements"], stats["cases"])
|
||||
stats["build_rate"] = frac(stats["build_success"], stats["cases"])
|
||||
stats["test_rate"] = frac(stats["test_success"], stats["cases"])
|
||||
|
||||
summary = {
|
||||
"total_cases": total,
|
||||
"replacement_success_count": replacements,
|
||||
"replacement_success_rate": frac(replacements, total),
|
||||
"compilable_count": build_success,
|
||||
"compilable_rate": frac(build_success, total),
|
||||
"executable_count": test_success,
|
||||
"executable_rate": frac(test_success, total),
|
||||
"compilation_failures": [
|
||||
r.case_id for r in results if r.build_status == "failed"
|
||||
],
|
||||
"execution_failures": [
|
||||
r.case_id
|
||||
for r in results
|
||||
if r.build_status == "succeeded" and r.test_status == "failed"
|
||||
],
|
||||
"cases": [asdict(r) for r in results],
|
||||
"by_benchmark": per_benchmark,
|
||||
}
|
||||
return summary
|
||||
|
||||
|
||||
def write_summary(
|
||||
eval_root: Path,
|
||||
args: argparse.Namespace,
|
||||
jsonl_path: Path,
|
||||
summary: Dict,
|
||||
) -> Tuple[Path, Path]:
|
||||
"""Write JSON and Markdown summary reports."""
|
||||
report_root = eval_root / args.report_dir
|
||||
report_root.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
|
||||
base_name = f"{jsonl_path.stem}-{args.target}"
|
||||
json_report = report_root / f"{base_name}-{timestamp}.json"
|
||||
markdown_report = report_root / f"{base_name}-{timestamp}.md"
|
||||
|
||||
json_report.write_text(json.dumps(summary, indent=2), encoding="utf-8")
|
||||
|
||||
benchmark_lines = [
|
||||
"| Benchmark | Cases | Replacement% | Build% | Exec% |",
|
||||
"| --- | --- | --- | --- | --- |",
|
||||
]
|
||||
for bench, stats in sorted(summary["by_benchmark"].items()):
|
||||
benchmark_lines.append(
|
||||
f"| {bench} | {stats['cases']} | "
|
||||
f"{stats['replacement_rate']*100:.2f}% | "
|
||||
f"{stats['build_rate']*100:.2f}% | "
|
||||
f"{stats['test_rate']*100:.2f}% |"
|
||||
)
|
||||
if len(benchmark_lines) == 2:
|
||||
benchmark_lines.append("| (none) | 0 | 0.00% | 0.00% | 0.00% |")
|
||||
|
||||
compilation_items = summary["compilation_failures"] or ["None"]
|
||||
execution_items = summary["execution_failures"] or ["None"]
|
||||
|
||||
relative_jsonl = relative_to_repo(jsonl_path, eval_root)
|
||||
|
||||
lines = [
|
||||
f"# Infer-Out Model 2 Evaluation ({base_name})",
|
||||
"",
|
||||
f"- Timestamp: {timestamp}",
|
||||
f"- Source JSONL: {relative_jsonl}",
|
||||
f"- Target: {args.target}",
|
||||
f"- Total cases: {summary['total_cases']}",
|
||||
f"- Replacement success: {summary['replacement_success_count']} "
|
||||
f"({summary['replacement_success_rate']*100:.2f}%)",
|
||||
f"- Compilable: {summary['compilable_count']} "
|
||||
f"({summary['compilable_rate']*100:.2f}%)",
|
||||
f"- Executable: {summary['executable_count']} "
|
||||
f"({summary['executable_rate']*100:.2f}%)",
|
||||
"",
|
||||
"## Benchmark Breakdown",
|
||||
*benchmark_lines,
|
||||
"",
|
||||
"## Compilation Failures",
|
||||
]
|
||||
lines.extend(f"- {cid}" for cid in compilation_items)
|
||||
lines.append("")
|
||||
lines.append("## Execution Failures")
|
||||
lines.extend(f"- {cid}" for cid in execution_items)
|
||||
|
||||
markdown_report.write_text("\n".join(lines), encoding="utf-8")
|
||||
return json_report, markdown_report
|
||||
|
||||
|
||||
def main() -> int:
|
||||
args = parse_args()
|
||||
eval_root = Path(__file__).resolve().parents[1]
|
||||
repo_root = _get_bench_root(args.bench_root)
|
||||
jsonl_path = Path(args.jsonl)
|
||||
if not jsonl_path.is_absolute():
|
||||
jsonl_path = eval_root / jsonl_path
|
||||
|
||||
if not jsonl_path.exists():
|
||||
print(f"JSONL file '{jsonl_path}' not found.", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
cases = list(collect_cases(jsonl_path, args.limit))
|
||||
if not cases:
|
||||
print("No cases to process.")
|
||||
return 0
|
||||
|
||||
results: List[Optional[CaseResult]] = [None] * len(cases)
|
||||
|
||||
def record_result(idx: int, case_result: CaseResult) -> None:
|
||||
results[idx] = case_result
|
||||
status = (
|
||||
f"build={case_result.build_status}, test={case_result.test_status}"
|
||||
if case_result.replacement_applied
|
||||
else "replacement_failed"
|
||||
)
|
||||
print(f"[{idx + 1}] {case_result.case_id}: {status}")
|
||||
|
||||
if args.jobs <= 1:
|
||||
for idx, case in enumerate(cases):
|
||||
case_result = process_case(case, args, repo_root, eval_root)
|
||||
record_result(idx, case_result)
|
||||
else:
|
||||
with ThreadPoolExecutor(max_workers=args.jobs) as executor:
|
||||
future_to_idx = {
|
||||
executor.submit(process_case, case, args, repo_root, eval_root): idx
|
||||
for idx, case in enumerate(cases)
|
||||
}
|
||||
for future in as_completed(future_to_idx):
|
||||
idx = future_to_idx[future]
|
||||
try:
|
||||
case_result = future.result()
|
||||
except Exception as exc: # pragma: no cover - defensive
|
||||
case_result = init_case_result(cases[idx], repo_root)
|
||||
case_result.errors.append(f"Unhandled exception: {exc}")
|
||||
record_result(idx, case_result)
|
||||
|
||||
final_results = [res for res in results if res is not None]
|
||||
|
||||
summary = compute_summary(final_results)
|
||||
json_report, markdown_report = write_summary(eval_root, args, jsonl_path, summary)
|
||||
print(f"Wrote summary reports:\n - {json_report}\n - {markdown_report}")
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
180
sk2decompile/verl/SK2DECOMPILE/README.md
Normal file
180
sk2decompile/verl/SK2DECOMPILE/README.md
Normal file
|
|
@ -0,0 +1,180 @@
|
|||
# SK²Decompile — Reinforcement Learning with VERL
|
||||
|
||||
This directory contains the RL (Reinforcement Learning) training pipeline for SK²Decompile, built on top of the [VERL](https://github.com/volcengine/verl) framework (Sheng et al., 2024).
|
||||
|
||||
For the full methodology and experimental details, please refer to our paper:
|
||||
> **SK²Decompile: LLM-based Two-Phase Binary Decompilation from Skeleton to Skin**
|
||||
> [[arXiv:2509.22114]](https://arxiv.org/abs/2509.22114)
|
||||
|
||||
## Overview
|
||||
|
||||
After supervised fine-tuning (SFT), SK²Decompile applies reinforcement learning to further align each phase's model with task-specific objectives. We adopt the **GRPO** (Group Relative Policy Optimization) algorithm (DeepSeek-AI et al., 2025) to train both models with their respective reward signals:
|
||||
|
||||
- **Structure Recovery** (Skeleton): The reward is based on compiler feedback — a positive reward is granted only if the generated IR successfully compiles, with an additional component reflecting the correctness of placeholder recovery (Equation 3 in the paper).
|
||||
- **Identifier Naming** (Skin): The reward is the cosine similarity between the embeddings of the generated code and the reference source code, encouraging semantically aligned identifier predictions rather than exact lexical matches (Equation 4 in the paper).
|
||||
|
||||
The reward functions and training scripts provided here are **reference implementations** for reproducing the RL training pipeline. For the precise reward formulations and design rationale, please refer to Section 3.5 of the paper.
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
SK2DECOMPILE/
|
||||
├── README.md # This file
|
||||
├── data/
|
||||
│ └── sk2decompile-rl-examples.jsonl # Example RL training data
|
||||
├── reward_functions/ # Reference reward implementations
|
||||
│ ├── __init__.py
|
||||
│ ├── exe_type.py # Example: compilability + placeholder Jaccard
|
||||
│ ├── sim_exe.py # Example: compilability + word-level similarity
|
||||
│ ├── embedding_gte.py # Example: embedding-based identifier similarity (GTE)
|
||||
│ └── embedding_qwen3.py # Example: embedding-based identifier similarity (Qwen3)
|
||||
└── scripts/
|
||||
├── run_struct_rl.sh # Reference script: Structure Recovery RL
|
||||
└── run_ident_rl.sh # Reference script: Identifier Naming RL
|
||||
```
|
||||
|
||||
## Reward Formulations (from the Paper)
|
||||
|
||||
### Structure Recovery Reward (Eq. 3)
|
||||
|
||||
The Structure Recovery reward consists of two components:
|
||||
|
||||
1. **Compilability**: The generated IR is compiled using the ground-truth header. A reward of 1.0 is granted only upon successful compilation (verified via [Psyche-C](https://github.com/ltcmelo/psychec.git) for header generation).
|
||||
2. **Placeholder Recovery**: The Jaccard similarity between the generated placeholder set (I_gen) and the ground-truth set (I_IR).
|
||||
|
||||
```
|
||||
r_placeholder = |I_gen ∩ I_IR| / |I_gen ∪ I_IR|
|
||||
|
||||
r_structure = { 0.0, if IR cannot be compiled
|
||||
{ 1.0 + r_placeholder, if IR can be compiled
|
||||
```
|
||||
|
||||
### Identifier Naming Reward (Eq. 4)
|
||||
|
||||
The Identifier Naming reward measures the semantic similarity between the generated code and the reference source code using embedding cosine similarity:
|
||||
|
||||
```
|
||||
r_identifier = cos(e_gen, e_src) = (e_gen · e_src) / (||e_gen|| · ||e_src||)
|
||||
```
|
||||
|
||||
where `e_gen` and `e_src` are the embeddings of the generated and reference code respectively. In our experiments, we use qwen-embedding-0.6B (Zhang et al., 2025) as the embedding model.
|
||||
|
||||
> **Note**: The reward functions in `reward_functions/` are reference implementations that demonstrate the reward design. Please refer to Section 3.5 of the paper for the complete formulation and design rationale.
|
||||
|
||||
## Reproduction Guide
|
||||
|
||||
### Step 1: Install VERL
|
||||
|
||||
Our RL training is based on **VERL v0.4.1** ([HybridFlow](https://github.com/volcengine/verl), Sheng et al., 2024). We recommend using the same version for reproducibility.
|
||||
|
||||
```bash
|
||||
git clone https://github.com/volcengine/verl.git
|
||||
cd verl
|
||||
git checkout v0.4.1 # or the commit closest to v0.4.1
|
||||
pip install -e .
|
||||
```
|
||||
|
||||
### Step 2: Integrate Reward Functions
|
||||
|
||||
Copy the reward functions into VERL's reward module and register them in the routing dispatcher:
|
||||
|
||||
```bash
|
||||
# Copy reward functions
|
||||
cp reward_functions/exe_type.py <VERL_DIR>/verl/utils/reward_score/sk2d_exe_type.py
|
||||
cp reward_functions/sim_exe.py <VERL_DIR>/verl/utils/reward_score/sk2d_sim_exe.py
|
||||
cp reward_functions/embedding_gte.py <VERL_DIR>/verl/utils/reward_score/sk2d_embedding_gte.py
|
||||
cp reward_functions/embedding_qwen3.py <VERL_DIR>/verl/utils/reward_score/sk2d_embedding_qwen3.py
|
||||
```
|
||||
|
||||
Then add routing branches to `<VERL_DIR>/verl/utils/reward_score/__init__.py` in the `default_compute_score()` function:
|
||||
|
||||
```python
|
||||
# Structure Recovery reward (example)
|
||||
elif data_source == "sk2decompile_structure":
|
||||
from . import sk2d_exe_type
|
||||
res = sk2d_exe_type.compute_score(solution_str, ground_truth, extra_info)
|
||||
|
||||
# Identifier Naming reward (example)
|
||||
elif data_source == "sk2decompile_identifier":
|
||||
from . import sk2d_embedding_qwen3
|
||||
res = sk2d_embedding_qwen3.compute_score(solution_str, ground_truth, extra_info)
|
||||
```
|
||||
|
||||
The `data_source` field in your training Parquet files determines which reward function is dispatched for each sample.
|
||||
|
||||
### Step 3: Prepare Training Data
|
||||
|
||||
Training data should be in Parquet format. Each row contains:
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `prompt` | Chat-format messages, e.g., `[{"role": "user", "content": "<pseudocode>... What is the source code?"}]` |
|
||||
| `data_source` | Reward function routing key (must match the branch registered in Step 2) |
|
||||
| `reward_model.ground_truth` | Expected output (IR for Structure Recovery, source code for Identifier Naming) |
|
||||
| `reward_model.style` | `"rule"` (rule-based reward) |
|
||||
| `extra_info.header` | C header declarations for compilability checking (Structure Recovery only) |
|
||||
|
||||
See `data/sk2decompile-rl-examples.jsonl` for example data format. Convert JSONL to Parquet before training.
|
||||
|
||||
### Step 4: Launch Training
|
||||
|
||||
The reference training scripts are in `scripts/`. Edit the configuration variables at the top of each script before launching.
|
||||
|
||||
**Structure Recovery RL:**
|
||||
```bash
|
||||
# Edit scripts/run_struct_rl.sh to set:
|
||||
# VERL_DIR, VENV_PATH, MODEL_PATH, TRAIN_DATA, VAL_DATA, WANDB_*
|
||||
bash scripts/run_struct_rl.sh
|
||||
```
|
||||
|
||||
**Identifier Naming RL** (requires a running embedding server):
|
||||
```bash
|
||||
# 1. Start the embedding server
|
||||
python -m vllm.entrypoints.openai.api_server \
|
||||
--model Qwen3-Embedding-0.6B --port 8000 --dtype float16
|
||||
|
||||
# 2. Edit scripts/run_ident_rl.sh to set:
|
||||
# VERL_DIR, VENV_PATH, MODEL_PATH, TRAIN_DATA, VAL_DATA, WANDB_*
|
||||
bash scripts/run_ident_rl.sh
|
||||
```
|
||||
|
||||
### Step 5: Install Additional Dependencies
|
||||
|
||||
```bash
|
||||
# For compiler-based rewards (Structure Recovery)
|
||||
apt install gcc
|
||||
pip install psychec # or build from https://github.com/ltcmelo/psychec.git
|
||||
|
||||
# For embedding-based rewards (Identifier Naming)
|
||||
pip install tree-sitter==0.24.0 tree-sitter-c==0.23.4 openai
|
||||
```
|
||||
|
||||
## Configurations
|
||||
|
||||
Reference hyperparameters used in the training scripts:
|
||||
|
||||
| Parameter | Structure Recovery | Identifier Naming |
|
||||
|-----------|:-:|:-:|
|
||||
| `train_batch_size` | 128 | 128 |
|
||||
| `max_prompt_length` | 1024 | 1024 |
|
||||
| `max_response_length` | 2048 | 2048 |
|
||||
| `lr` | 1e-6 | 1e-6 |
|
||||
| `kl_loss_coef` | 0.01 | 0.02 |
|
||||
| `kl_loss_type` | low_var_kl | low_var_kl |
|
||||
| `rollout.n` (GRPO samples) | 16 | 16 |
|
||||
| `total_epochs` | 2 | 2 |
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**OOM (Out of Memory)**:
|
||||
- Reduce `ppo_micro_batch_size_per_gpu` (default: 4)
|
||||
- Enable `actor.fsdp_config.param_offload=True`
|
||||
- Reduce `rollout.gpu_memory_utilization` (default: 0.80)
|
||||
|
||||
**Embedding server connection error** (Identifier Naming only):
|
||||
- Ensure the vLLM embedding server is running on port 8000
|
||||
- Check environment variables: `QWEN3_EMBEDDING_API_BASE` (default: `http://127.0.0.1:8000/v1`)
|
||||
|
||||
**Compilation timeout in reward** (Structure Recovery only):
|
||||
- The `gcc -c` call has a 5-second timeout per sample
|
||||
- If many samples timeout, check if the generated code contains infinite loops
|
||||
23
sk2decompile/verl/SK2DECOMPILE/reward_functions/__init__.py
Normal file
23
sk2decompile/verl/SK2DECOMPILE/reward_functions/__init__.py
Normal file
|
|
@ -0,0 +1,23 @@
|
|||
"""
|
||||
SK2Decompile — Reference Reward Functions for GRPO Training.
|
||||
|
||||
This module provides reference implementations of reward functions used in the
|
||||
SK2Decompile RL training pipeline. These are example implementations that
|
||||
demonstrate the reward design described in Section 3.5 of the paper:
|
||||
|
||||
SK2Decompile: LLM-based Two-Phase Binary Decompilation from Skeleton to Skin
|
||||
(arXiv:2509.22114)
|
||||
|
||||
Reference implementations:
|
||||
- exe_type: Compilability + placeholder identifier Jaccard similarity
|
||||
- sim_exe: Compilability + word-level Jaccard similarity
|
||||
- embedding_gte: Tree-sitter identifier extraction + GTE embedding cosine similarity
|
||||
- embedding_qwen3: Tree-sitter identifier extraction + Qwen3 embedding cosine similarity
|
||||
|
||||
To integrate into VERL, copy these files into verl/utils/reward_score/ and
|
||||
register routing branches in __init__.py. See README.md for details.
|
||||
"""
|
||||
|
||||
from . import exe_type, sim_exe, embedding_gte, embedding_qwen3
|
||||
|
||||
__all__ = ["exe_type", "sim_exe", "embedding_gte", "embedding_qwen3"]
|
||||
189
sk2decompile/verl/SK2DECOMPILE/reward_functions/embedding_gte.py
Normal file
189
sk2decompile/verl/SK2DECOMPILE/reward_functions/embedding_gte.py
Normal file
|
|
@ -0,0 +1,189 @@
|
|||
"""
|
||||
Reference reward function: GTE Embedding-based Identifier Similarity.
|
||||
|
||||
This is a reference implementation of the Identifier Naming reward (Eq. 4)
|
||||
described in the SK2Decompile paper (arXiv:2509.22114, Section 3.5).
|
||||
|
||||
Evaluates decompiled C code by:
|
||||
1. Using tree-sitter to parse C code and extract identifiers (func/var/type/field)
|
||||
2. Building a naming summary string per code sample
|
||||
3. Computing cosine similarity between GTE embeddings of the two summaries
|
||||
4. Squaring the similarity score to sharpen the reward signal
|
||||
|
||||
Final score = cosine_similarity^2
|
||||
|
||||
Requires:
|
||||
- A running OpenAI-compatible embedding server (e.g., vLLM serving gte-large-en-v1.5)
|
||||
- tree-sitter and tree-sitter-c packages
|
||||
|
||||
Environment variables:
|
||||
- GTE_EMBEDDING_MODEL_PATH: Model name/path (default: "gte-large-en-v1.5")
|
||||
- GTE_EMBEDDING_API_KEY or OPENAI_API_KEY: API key (default: "none")
|
||||
- GTE_EMBEDDING_API_BASE: API base URL (default: "http://127.0.0.1:8000/v1")
|
||||
"""
|
||||
|
||||
import math
|
||||
import os
|
||||
import random
|
||||
from typing import Dict, List, Optional, Sequence, Tuple
|
||||
|
||||
from openai import OpenAI
|
||||
from tree_sitter import Language, Parser
|
||||
import tree_sitter_c as tsc
|
||||
|
||||
# ---- OpenAI Embedding Client ----
|
||||
|
||||
_MODEL_NAME = os.getenv("GTE_EMBEDDING_MODEL_PATH", "gte-large-en-v1.5")
|
||||
_API_KEY = os.getenv("GTE_EMBEDDING_API_KEY") or os.getenv("OPENAI_API_KEY") or "none"
|
||||
_API_BASE = os.getenv("GTE_EMBEDDING_API_BASE", "http://127.0.0.1:8000/v1")
|
||||
_client: Optional[OpenAI] = None
|
||||
|
||||
|
||||
def _get_client() -> OpenAI:
|
||||
global _client
|
||||
if _client is None:
|
||||
if _API_BASE:
|
||||
_client = OpenAI(api_key=_API_KEY, base_url=_API_BASE)
|
||||
elif _API_KEY:
|
||||
_client = OpenAI(api_key=_API_KEY)
|
||||
else:
|
||||
_client = OpenAI()
|
||||
return _client
|
||||
|
||||
|
||||
def _embed_two(text_a: str, text_b: str) -> Tuple[List[float], List[float]]:
|
||||
"""Embed two texts in a single API call, return their embedding vectors."""
|
||||
client = _get_client()
|
||||
resp = client.embeddings.create(model=_MODEL_NAME, input=[text_a, text_b])
|
||||
emb_a = [float(x) for x in resp.data[0].embedding]
|
||||
emb_b = [float(x) for x in resp.data[1].embedding]
|
||||
return emb_a, emb_b
|
||||
|
||||
|
||||
def _cosine_similarity(vec_a: Sequence[float], vec_b: Sequence[float]) -> float:
|
||||
dot = sum(a * b for a, b in zip(vec_a, vec_b))
|
||||
norm_a = math.sqrt(sum(a * a for a in vec_a))
|
||||
norm_b = math.sqrt(sum(b * b for b in vec_b))
|
||||
if norm_a == 0 or norm_b == 0:
|
||||
return 0.0
|
||||
return dot / (norm_a * norm_b)
|
||||
|
||||
|
||||
# ---- Tree-sitter C: Identifier Extraction ----
|
||||
|
||||
C_LANG = Language(tsc.language())
|
||||
_TS_PARSER = Parser(C_LANG)
|
||||
|
||||
|
||||
def _classify_node(node):
|
||||
"""
|
||||
Classify a tree-sitter node into identifier categories:
|
||||
- func: function names (definitions + calls)
|
||||
- var: variable names (parameters / local / global)
|
||||
- type: type names
|
||||
- field: struct field names
|
||||
"""
|
||||
node_type = node.type
|
||||
name = node.text.decode("utf8")
|
||||
|
||||
if node_type == "type_identifier":
|
||||
return "type", name
|
||||
if node_type == "field_identifier":
|
||||
return "field", name
|
||||
if node_type != "identifier":
|
||||
return None, None
|
||||
|
||||
parent = node.parent
|
||||
if parent:
|
||||
parent_type = parent.type
|
||||
if parent_type == "function_declarator" and parent.child_by_field_name("declarator") == node:
|
||||
return "func", name
|
||||
if parent_type == "call_expression" and parent.child_by_field_name("function") == node:
|
||||
return "func", name
|
||||
if parent_type in ("init_declarator", "parameter_declaration", "declaration", "pointer_declarator"):
|
||||
return "var", name
|
||||
|
||||
return "var", name
|
||||
|
||||
|
||||
def _extract_identifiers_ts(code: str, max_per_type: int = 64) -> Dict[str, List[str]]:
|
||||
"""Extract identifiers from C code using tree-sitter, classified by type."""
|
||||
tree = _TS_PARSER.parse(code.encode("utf8"))
|
||||
result: Dict[str, List[str]] = {"func": [], "var": [], "type": [], "field": []}
|
||||
|
||||
stack = [tree.root_node]
|
||||
while stack:
|
||||
node = stack.pop()
|
||||
id_type, name = _classify_node(node)
|
||||
if id_type in result and len(result[id_type]) < max_per_type:
|
||||
result[id_type].append(name)
|
||||
stack.extend(node.children)
|
||||
|
||||
return result
|
||||
|
||||
|
||||
# ---- Summary Construction & Similarity ----
|
||||
|
||||
|
||||
def _build_summary_text(identifiers: Dict[str, List[str]], max_per_type: int = 64) -> str:
|
||||
"""
|
||||
Build a naming summary string from classified identifiers.
|
||||
Example: "func: foo bar || type: my_type || field: field1 field2 || var: i j k"
|
||||
"""
|
||||
parts: List[str] = []
|
||||
for kind in ("func", "type", "field", "var"):
|
||||
names = identifiers.get(kind, [])
|
||||
if not names:
|
||||
continue
|
||||
segment = f"{kind}: " + " ".join(names[:max_per_type])
|
||||
parts.append(segment)
|
||||
return " || ".join(parts)
|
||||
|
||||
|
||||
def _identifier_similarity_ts(candidate_text: str, reference_text: str):
|
||||
"""
|
||||
Compute identifier-level similarity using embedding cosine similarity.
|
||||
|
||||
Steps:
|
||||
1. Extract identifiers from both texts using tree-sitter
|
||||
2. Build naming summary strings
|
||||
3. Embed both summaries in a single API call
|
||||
4. Return cosine similarity as name_score
|
||||
|
||||
Returns:
|
||||
name_score: float in [0, 1]
|
||||
"""
|
||||
cand_ids = _extract_identifiers_ts(candidate_text)
|
||||
ref_ids = _extract_identifiers_ts(reference_text)
|
||||
|
||||
cand_summary = _build_summary_text(cand_ids)
|
||||
ref_summary = _build_summary_text(ref_ids)
|
||||
|
||||
if not cand_summary or not ref_summary:
|
||||
return 0.0
|
||||
|
||||
emb_cand, emb_ref = _embed_two(cand_summary, ref_summary)
|
||||
return _cosine_similarity(emb_cand, emb_ref)
|
||||
|
||||
|
||||
# ---- Main Reward Function ----
|
||||
|
||||
|
||||
def compute_score(solution_str, ground_truth, extra_info=None):
|
||||
"""
|
||||
Compute reward based on identifier naming similarity using GTE embeddings.
|
||||
Returns score^2 to sharpen the reward signal.
|
||||
"""
|
||||
if not isinstance(solution_str, str):
|
||||
solution_str = "" if solution_str is None else str(solution_str)
|
||||
if not isinstance(ground_truth, str):
|
||||
ground_truth = "" if ground_truth is None else str(ground_truth)
|
||||
|
||||
candidate_text = solution_str.strip()
|
||||
reference_text = ground_truth.strip()
|
||||
|
||||
if not candidate_text or not reference_text:
|
||||
return 0.0
|
||||
|
||||
name_score = _identifier_similarity_ts(candidate_text, reference_text)
|
||||
return name_score * name_score
|
||||
|
|
@ -0,0 +1,189 @@
|
|||
"""
|
||||
Reference reward function: Qwen3 Embedding-based Identifier Similarity.
|
||||
|
||||
This is a reference implementation of the Identifier Naming reward (Eq. 4)
|
||||
described in the SK2Decompile paper (arXiv:2509.22114, Section 3.5).
|
||||
|
||||
Evaluates decompiled C code by:
|
||||
1. Using tree-sitter to parse C code and extract identifiers (func/var/type/field)
|
||||
2. Building a naming summary string per code sample
|
||||
3. Computing cosine similarity between Qwen3 embeddings of the two summaries
|
||||
4. Squaring the similarity score to sharpen the reward signal
|
||||
|
||||
Final score = cosine_similarity^2
|
||||
|
||||
Requires:
|
||||
- A running OpenAI-compatible embedding server (e.g., vLLM serving Qwen3-Embedding-0.6B)
|
||||
- tree-sitter and tree-sitter-c packages
|
||||
|
||||
Environment variables:
|
||||
- QWEN3_EMBEDDING_MODEL_PATH: Model name/path (default: "Qwen3-Embedding-0.6B")
|
||||
- QWEN3_EMBEDDING_API_KEY or OPENAI_API_KEY: API key (default: "none")
|
||||
- QWEN3_EMBEDDING_API_BASE: API base URL (default: "http://127.0.0.1:8000/v1")
|
||||
"""
|
||||
|
||||
import math
|
||||
import os
|
||||
import random
|
||||
from typing import Dict, List, Optional, Sequence, Tuple
|
||||
|
||||
from openai import OpenAI
|
||||
from tree_sitter import Language, Parser
|
||||
import tree_sitter_c as tsc
|
||||
|
||||
# ---- OpenAI Embedding Client ----
|
||||
|
||||
_MODEL_NAME = os.getenv("QWEN3_EMBEDDING_MODEL_PATH", "Qwen3-Embedding-0.6B")
|
||||
_API_KEY = os.getenv("QWEN3_EMBEDDING_API_KEY") or os.getenv("OPENAI_API_KEY") or "none"
|
||||
_API_BASE = os.getenv("QWEN3_EMBEDDING_API_BASE", "http://127.0.0.1:8000/v1")
|
||||
_client: Optional[OpenAI] = None
|
||||
|
||||
|
||||
def _get_client() -> OpenAI:
|
||||
global _client
|
||||
if _client is None:
|
||||
if _API_BASE:
|
||||
_client = OpenAI(api_key=_API_KEY, base_url=_API_BASE)
|
||||
elif _API_KEY:
|
||||
_client = OpenAI(api_key=_API_KEY)
|
||||
else:
|
||||
_client = OpenAI()
|
||||
return _client
|
||||
|
||||
|
||||
def _embed_two(text_a: str, text_b: str) -> Tuple[List[float], List[float]]:
|
||||
"""Embed two texts in a single API call, return their embedding vectors."""
|
||||
client = _get_client()
|
||||
resp = client.embeddings.create(model=_MODEL_NAME, input=[text_a, text_b])
|
||||
emb_a = [float(x) for x in resp.data[0].embedding]
|
||||
emb_b = [float(x) for x in resp.data[1].embedding]
|
||||
return emb_a, emb_b
|
||||
|
||||
|
||||
def _cosine_similarity(vec_a: Sequence[float], vec_b: Sequence[float]) -> float:
|
||||
dot = sum(a * b for a, b in zip(vec_a, vec_b))
|
||||
norm_a = math.sqrt(sum(a * a for a in vec_a))
|
||||
norm_b = math.sqrt(sum(b * b for b in vec_b))
|
||||
if norm_a == 0 or norm_b == 0:
|
||||
return 0.0
|
||||
return dot / (norm_a * norm_b)
|
||||
|
||||
|
||||
# ---- Tree-sitter C: Identifier Extraction ----
|
||||
|
||||
C_LANG = Language(tsc.language())
|
||||
_TS_PARSER = Parser(C_LANG)
|
||||
|
||||
|
||||
def _classify_node(node):
|
||||
"""
|
||||
Classify a tree-sitter node into identifier categories:
|
||||
- func: function names (definitions + calls)
|
||||
- var: variable names (parameters / local / global)
|
||||
- type: type names
|
||||
- field: struct field names
|
||||
"""
|
||||
node_type = node.type
|
||||
name = node.text.decode("utf8")
|
||||
|
||||
if node_type == "type_identifier":
|
||||
return "type", name
|
||||
if node_type == "field_identifier":
|
||||
return "field", name
|
||||
if node_type != "identifier":
|
||||
return None, None
|
||||
|
||||
parent = node.parent
|
||||
if parent:
|
||||
parent_type = parent.type
|
||||
if parent_type == "function_declarator" and parent.child_by_field_name("declarator") == node:
|
||||
return "func", name
|
||||
if parent_type == "call_expression" and parent.child_by_field_name("function") == node:
|
||||
return "func", name
|
||||
if parent_type in ("init_declarator", "parameter_declaration", "declaration", "pointer_declarator"):
|
||||
return "var", name
|
||||
|
||||
return "var", name
|
||||
|
||||
|
||||
def _extract_identifiers_ts(code: str, max_per_type: int = 64) -> Dict[str, List[str]]:
|
||||
"""Extract identifiers from C code using tree-sitter, classified by type."""
|
||||
tree = _TS_PARSER.parse(code.encode("utf8"))
|
||||
result: Dict[str, List[str]] = {"func": [], "var": [], "type": [], "field": []}
|
||||
|
||||
stack = [tree.root_node]
|
||||
while stack:
|
||||
node = stack.pop()
|
||||
id_type, name = _classify_node(node)
|
||||
if id_type in result and len(result[id_type]) < max_per_type:
|
||||
result[id_type].append(name)
|
||||
stack.extend(node.children)
|
||||
|
||||
return result
|
||||
|
||||
|
||||
# ---- Summary Construction & Similarity ----
|
||||
|
||||
|
||||
def _build_summary_text(identifiers: Dict[str, List[str]], max_per_type: int = 64) -> str:
|
||||
"""
|
||||
Build a naming summary string from classified identifiers.
|
||||
Example: "func: foo bar || type: my_type || field: field1 field2 || var: i j k"
|
||||
"""
|
||||
parts: List[str] = []
|
||||
for kind in ("func", "type", "field", "var"):
|
||||
names = identifiers.get(kind, [])
|
||||
if not names:
|
||||
continue
|
||||
segment = f"{kind}: " + " ".join(names[:max_per_type])
|
||||
parts.append(segment)
|
||||
return " || ".join(parts)
|
||||
|
||||
|
||||
def _identifier_similarity_ts(candidate_text: str, reference_text: str):
|
||||
"""
|
||||
Compute identifier-level similarity using embedding cosine similarity.
|
||||
|
||||
Steps:
|
||||
1. Extract identifiers from both texts using tree-sitter
|
||||
2. Build naming summary strings
|
||||
3. Embed both summaries in a single API call
|
||||
4. Return cosine similarity as name_score
|
||||
|
||||
Returns:
|
||||
name_score: float in [0, 1]
|
||||
"""
|
||||
cand_ids = _extract_identifiers_ts(candidate_text)
|
||||
ref_ids = _extract_identifiers_ts(reference_text)
|
||||
|
||||
cand_summary = _build_summary_text(cand_ids)
|
||||
ref_summary = _build_summary_text(ref_ids)
|
||||
|
||||
if not cand_summary or not ref_summary:
|
||||
return 0.0
|
||||
|
||||
emb_cand, emb_ref = _embed_two(cand_summary, ref_summary)
|
||||
return _cosine_similarity(emb_cand, emb_ref)
|
||||
|
||||
|
||||
# ---- Main Reward Function ----
|
||||
|
||||
|
||||
def compute_score(solution_str, ground_truth, extra_info=None):
|
||||
"""
|
||||
Compute reward based on identifier naming similarity using Qwen3 embeddings.
|
||||
Returns score^2 to sharpen the reward signal.
|
||||
"""
|
||||
if not isinstance(solution_str, str):
|
||||
solution_str = "" if solution_str is None else str(solution_str)
|
||||
if not isinstance(ground_truth, str):
|
||||
ground_truth = "" if ground_truth is None else str(ground_truth)
|
||||
|
||||
candidate_text = solution_str.strip()
|
||||
reference_text = ground_truth.strip()
|
||||
|
||||
if not candidate_text or not reference_text:
|
||||
return 0.0
|
||||
|
||||
name_score = _identifier_similarity_ts(candidate_text, reference_text)
|
||||
return name_score * name_score
|
||||
85
sk2decompile/verl/SK2DECOMPILE/reward_functions/exe_type.py
Normal file
85
sk2decompile/verl/SK2DECOMPILE/reward_functions/exe_type.py
Normal file
|
|
@ -0,0 +1,85 @@
|
|||
"""
|
||||
Reference reward function: Compilability + Placeholder Identifier Matching.
|
||||
|
||||
This is a reference implementation of the Structure Recovery reward (Eq. 3)
|
||||
described in the SK2Decompile paper (arXiv:2509.22114, Section 3.5).
|
||||
|
||||
Evaluates decompiled C code by:
|
||||
1. Checking if the code compiles with gcc (compilability score: 0 or 1)
|
||||
2. Extracting placeholder identifier patterns (func*, type*, var*, field*) from
|
||||
both candidate and ground truth, computing Jaccard similarity
|
||||
|
||||
Final score = type_score + compilability_score if compilable, else 0.
|
||||
"""
|
||||
|
||||
import os
|
||||
import re
|
||||
import subprocess
|
||||
import tempfile
|
||||
|
||||
|
||||
def compute_score(solution_str, ground_truth, extra_info=None):
|
||||
type_score_value, _ = type_score(solution_str, ground_truth, extra_info)
|
||||
compileable_score_value = compileable_score(solution_str, ground_truth, extra_info)
|
||||
|
||||
if compileable_score_value == 0.0:
|
||||
return 0.0
|
||||
|
||||
return type_score_value + compileable_score_value
|
||||
|
||||
|
||||
def type_score(solution_str, ground_truth, extra_info=None):
|
||||
"""
|
||||
Compute Jaccard similarity over identifier patterns (func*, type*, var*, field*)
|
||||
between candidate and ground truth code.
|
||||
|
||||
Returns:
|
||||
(jaccard_similarity, total_term_count)
|
||||
"""
|
||||
patterns = [r'\bfunc\w*\b', r'\btype\w*\b', r'\bvar\w*\b', r'\bfield\w*\b']
|
||||
|
||||
def extract_terms(text):
|
||||
terms = set()
|
||||
for pattern in patterns:
|
||||
terms.update(re.findall(pattern, text))
|
||||
return terms
|
||||
|
||||
solution_terms = extract_terms(solution_str)
|
||||
ground_truth_terms = extract_terms(ground_truth)
|
||||
|
||||
intersection = solution_terms.intersection(ground_truth_terms)
|
||||
union = solution_terms.union(ground_truth_terms)
|
||||
|
||||
jaccard_similarity = len(intersection) / len(union) if union else 0.0
|
||||
return jaccard_similarity, len(solution_terms) + len(ground_truth_terms)
|
||||
|
||||
|
||||
def compileable_score(solution_str, ground_truth, extra_info=None):
|
||||
"""
|
||||
Check if the candidate C code compiles with gcc.
|
||||
|
||||
Args:
|
||||
extra_info: Optional dict with 'header' key containing C header declarations.
|
||||
|
||||
Returns:
|
||||
1.0 if compilable, 0.0 otherwise.
|
||||
"""
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
try:
|
||||
source_file = os.path.join(tmpdir, "temp.c")
|
||||
object_file = os.path.join(tmpdir, "temp.o")
|
||||
header = extra_info.get('header', '') if extra_info else ''
|
||||
|
||||
with open(source_file, 'w') as f:
|
||||
f.write(f'{header}\n\n{solution_str}')
|
||||
|
||||
proc = subprocess.run(
|
||||
['gcc', '-c', source_file, '-o', object_file],
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.PIPE,
|
||||
timeout=5,
|
||||
check=True
|
||||
)
|
||||
return 1.0 if proc.returncode == 0 else 0.0
|
||||
except Exception:
|
||||
return 0.0
|
||||
69
sk2decompile/verl/SK2DECOMPILE/reward_functions/sim_exe.py
Normal file
69
sk2decompile/verl/SK2DECOMPILE/reward_functions/sim_exe.py
Normal file
|
|
@ -0,0 +1,69 @@
|
|||
"""
|
||||
Reference reward function: Compilability + Word-level Jaccard Similarity.
|
||||
|
||||
This is a reference implementation of an alternative Structure Recovery reward
|
||||
for the SK2Decompile RL training pipeline (arXiv:2509.22114, Section 3.5).
|
||||
|
||||
Evaluates decompiled C code by:
|
||||
1. Computing word-level Jaccard similarity between candidate and ground truth
|
||||
2. Checking if the code compiles with gcc (compilability score: 0 or 1)
|
||||
|
||||
Final score = jaccard_similarity + compilability_score if jaccard > 0.5, else 0.
|
||||
"""
|
||||
|
||||
import os
|
||||
import subprocess
|
||||
import tempfile
|
||||
|
||||
|
||||
def compute_score(solution_str, ground_truth, extra_info=None):
|
||||
sim_score = jaccard_similarity(solution_str, ground_truth)
|
||||
compile_score = compileable_score(solution_str, ground_truth, extra_info)
|
||||
|
||||
if sim_score > 0.5:
|
||||
return sim_score + compile_score
|
||||
return 0
|
||||
|
||||
|
||||
def jaccard_similarity(str1, str2):
|
||||
"""Compute word-level Jaccard similarity between two strings."""
|
||||
set1 = set(str1.lower().split())
|
||||
set2 = set(str2.lower().split())
|
||||
|
||||
intersection = len(set1.intersection(set2))
|
||||
union = len(set1.union(set2))
|
||||
|
||||
if union == 0:
|
||||
return 0.0
|
||||
return intersection / union
|
||||
|
||||
|
||||
def compileable_score(solution_str, ground_truth, extra_info=None):
|
||||
"""
|
||||
Check if the candidate C code compiles with gcc.
|
||||
|
||||
Args:
|
||||
extra_info: Optional dict with 'header' key containing C header declarations.
|
||||
|
||||
Returns:
|
||||
1.0 if compilable, 0.0 otherwise.
|
||||
"""
|
||||
with tempfile.TemporaryDirectory() as tmpdir:
|
||||
try:
|
||||
source_file = os.path.join(tmpdir, "temp.c")
|
||||
object_file = os.path.join(tmpdir, "temp.o")
|
||||
header = extra_info.get('header', '') if extra_info else ''
|
||||
|
||||
with open(source_file, 'w') as f:
|
||||
f.write(f'{header}\n\n{solution_str}')
|
||||
|
||||
proc = subprocess.run(
|
||||
['gcc', '-c', source_file, '-o', object_file],
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=subprocess.PIPE,
|
||||
timeout=5,
|
||||
check=True
|
||||
)
|
||||
return 1.0 if proc.returncode == 0 else 0.0
|
||||
except Exception:
|
||||
return 0.0
|
||||
120
sk2decompile/verl/SK2DECOMPILE/scripts/run_ident_rl.sh
Executable file
120
sk2decompile/verl/SK2DECOMPILE/scripts/run_ident_rl.sh
Executable file
|
|
@ -0,0 +1,120 @@
|
|||
#!/usr/bin/env bash
|
||||
# =============================================================================
|
||||
# SK2Decompile - Reference Script: Identifier Naming RL Training
|
||||
# =============================================================================
|
||||
# Reference GRPO training script for the Identifier Naming model.
|
||||
# Based on the VERL framework (v0.4.1) with embedding-based rewards.
|
||||
#
|
||||
# This is a reference configuration — please adjust parameters according
|
||||
# to your hardware setup and dataset. See the paper (arXiv:2509.22114,
|
||||
# Section 3.5) for the reward formulation details.
|
||||
#
|
||||
# Prerequisites:
|
||||
# - VERL framework installed (https://github.com/volcengine/verl)
|
||||
# - Reward functions integrated into verl/utils/reward_score/ (see README.md)
|
||||
# - An OpenAI-compatible embedding server running locally
|
||||
# e.g.: python -m vllm.entrypoints.openai.api_server \
|
||||
# --model Qwen3-Embedding-0.6B --port 8000
|
||||
# - tree-sitter, tree-sitter-c, openai packages installed
|
||||
#
|
||||
# Usage:
|
||||
# bash run_ident_rl.sh
|
||||
# =============================================================================
|
||||
set -x
|
||||
|
||||
# ---- User Configuration ----
|
||||
EMBEDDING_VARIANT="gte" # Options: "gte" or "qwen3"
|
||||
|
||||
VERL_DIR="<YOUR_VERL_DIR>"
|
||||
VENV_PATH="<YOUR_VENV_PATH>"
|
||||
MODEL_PATH="<YOUR_MODEL_PATH>" # e.g., path to sk2decompile-ident-6.7b
|
||||
TRAIN_DATA="<YOUR_DATA_PATH>/train.parquet"
|
||||
VAL_DATA="<YOUR_DATA_PATH>/valid.parquet"
|
||||
|
||||
# WandB configuration
|
||||
WANDB_API_KEY_VAL="<YOUR_WANDB_API_KEY>"
|
||||
WANDB_ENTITY_VAL="<YOUR_WANDB_ENTITY>"
|
||||
WANDB_PROJECT_VAL="<YOUR_WANDB_PROJECT>"
|
||||
|
||||
# Training parameters
|
||||
NUM_NODES=1
|
||||
GPUS_PER_NODE=8
|
||||
KL_COEF=0.02
|
||||
TOTAL_EPOCHS=2
|
||||
SAVE_FREQ=25
|
||||
TEST_FREQ=25
|
||||
|
||||
# ---- Environment Setup ----
|
||||
source ${VENV_PATH}/bin/activate
|
||||
|
||||
export UCX_IB_PCI_RELAXED_ORDERING=1
|
||||
export NCCL_IB_PCI_RELAXED_ORDERING=1
|
||||
export NCCL_IB_TIMEOUT=22
|
||||
export NCCL_DEBUG=INFO
|
||||
export TRANSFORMERS_OFFLINE=0
|
||||
export TORCH_NCCL_AVOID_RECORD_STREAMS=1
|
||||
export NCCL_NVLS_ENABLE=0
|
||||
export NCCL_IB_DISABLE=0
|
||||
export CUDA_DEVICE_MAX_CONNECTIONS=1
|
||||
|
||||
# ---- Task & Logging ----
|
||||
TASK_NAME="sk2decompile_ident-rl-${EMBEDDING_VARIANT}"
|
||||
LOG_DIR="${VERL_DIR}/logs"
|
||||
mkdir -p "$LOG_DIR"
|
||||
LOG_FILE="$LOG_DIR/${TASK_NAME}.log"
|
||||
ERR_FILE="$LOG_DIR/${TASK_NAME}.err"
|
||||
|
||||
# ---- WandB ----
|
||||
export WANDB_API_KEY=${WANDB_API_KEY_VAL}
|
||||
export WANDB_ENTITY=${WANDB_ENTITY_VAL}
|
||||
export WANDB_PROJECT=${WANDB_PROJECT_VAL}
|
||||
export WANDB_NAME=${TASK_NAME}
|
||||
export WANDB_MODE='online'
|
||||
wandb login --relogin $WANDB_API_KEY
|
||||
|
||||
# ---- Launch GRPO Training ----
|
||||
python3 -m verl.trainer.main_ppo --config-path=config \
|
||||
--config-name='ppo_trainer-lm4dc.yaml' \
|
||||
algorithm.adv_estimator=grpo \
|
||||
data.train_files=${TRAIN_DATA} \
|
||||
data.val_files=${VAL_DATA} \
|
||||
data.train_batch_size=128 \
|
||||
data.max_prompt_length=1024 \
|
||||
data.max_response_length=2048 \
|
||||
data.filter_overlong_prompts=True \
|
||||
data.truncation='error' \
|
||||
actor_rollout_ref.model.path=${MODEL_PATH} \
|
||||
actor_rollout_ref.actor.optim.lr=1e-6 \
|
||||
actor_rollout_ref.model.use_remove_padding=True \
|
||||
actor_rollout_ref.actor.ppo_mini_batch_size=32 \
|
||||
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
|
||||
actor_rollout_ref.actor.use_kl_loss=True \
|
||||
actor_rollout_ref.actor.kl_loss_coef=${KL_COEF} \
|
||||
actor_rollout_ref.actor.kl_loss_type=low_var_kl \
|
||||
actor_rollout_ref.actor.entropy_coeff=0 \
|
||||
actor_rollout_ref.model.enable_gradient_checkpointing=False \
|
||||
actor_rollout_ref.actor.fsdp_config.param_offload=False \
|
||||
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
|
||||
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \
|
||||
actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
|
||||
actor_rollout_ref.rollout.name=vllm \
|
||||
actor_rollout_ref.rollout.gpu_memory_utilization=0.80 \
|
||||
actor_rollout_ref.rollout.n=16 \
|
||||
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \
|
||||
actor_rollout_ref.ref.fsdp_config.param_offload=True \
|
||||
algorithm.use_kl_in_reward=False \
|
||||
trainer.critic_warmup=0 \
|
||||
trainer.logger=['console','wandb'] \
|
||||
trainer.project_name='sk2decompile_rl' \
|
||||
trainer.experiment_name=$TASK_NAME \
|
||||
trainer.default_local_dir=${VERL_DIR}/checkpoints/${TASK_NAME} \
|
||||
trainer.n_gpus_per_node=${GPUS_PER_NODE} \
|
||||
trainer.nnodes=${NUM_NODES} \
|
||||
trainer.save_freq=${SAVE_FREQ} \
|
||||
trainer.test_freq=${TEST_FREQ} \
|
||||
trainer.total_epochs=${TOTAL_EPOCHS} "$@" \
|
||||
> >(tee -a "$LOG_FILE") \
|
||||
2> >(tee -a "$ERR_FILE" >&2)
|
||||
|
||||
echo "STDOUT saved to: $LOG_FILE"
|
||||
echo "STDERR saved to: $ERR_FILE"
|
||||
117
sk2decompile/verl/SK2DECOMPILE/scripts/run_struct_rl.sh
Executable file
117
sk2decompile/verl/SK2DECOMPILE/scripts/run_struct_rl.sh
Executable file
|
|
@ -0,0 +1,117 @@
|
|||
#!/usr/bin/env bash
|
||||
# =============================================================================
|
||||
# SK2Decompile - Reference Script: Structure Recovery RL Training
|
||||
# =============================================================================
|
||||
# Reference GRPO training script for the Structure Recovery model.
|
||||
# Based on the VERL framework (v0.4.1) with compiler-based rewards.
|
||||
#
|
||||
# This is a reference configuration — please adjust parameters according
|
||||
# to your hardware setup and dataset. See the paper (arXiv:2509.22114,
|
||||
# Section 3.5) for the reward formulation details.
|
||||
#
|
||||
# Prerequisites:
|
||||
# - VERL framework installed (https://github.com/volcengine/verl)
|
||||
# - Reward functions integrated into verl/utils/reward_score/ (see README.md)
|
||||
# - gcc available for compilability checking
|
||||
#
|
||||
# Usage:
|
||||
# bash run_struct_rl.sh
|
||||
# =============================================================================
|
||||
set -x
|
||||
|
||||
# ---- User Configuration ----
|
||||
REWARD_VARIANT="exe_type" # Options: "exe_type" or "sim_exe"
|
||||
|
||||
VERL_DIR="<YOUR_VERL_DIR>"
|
||||
VENV_PATH="<YOUR_VENV_PATH>"
|
||||
MODEL_PATH="<YOUR_MODEL_PATH>" # e.g., path to sk2decompile-struct-6.7b
|
||||
TRAIN_DATA="<YOUR_DATA_PATH>/train.parquet"
|
||||
VAL_DATA="<YOUR_DATA_PATH>/valid.parquet"
|
||||
|
||||
# WandB configuration
|
||||
WANDB_API_KEY_VAL="<YOUR_WANDB_API_KEY>"
|
||||
WANDB_ENTITY_VAL="<YOUR_WANDB_ENTITY>"
|
||||
WANDB_PROJECT_VAL="<YOUR_WANDB_PROJECT>"
|
||||
|
||||
# Training parameters
|
||||
NUM_NODES=2
|
||||
GPUS_PER_NODE=8
|
||||
KL_COEF=0.01
|
||||
TOTAL_EPOCHS=2
|
||||
SAVE_FREQ=25
|
||||
TEST_FREQ=25
|
||||
|
||||
# ---- Environment Setup ----
|
||||
source ${VENV_PATH}/bin/activate
|
||||
|
||||
export UCX_IB_PCI_RELAXED_ORDERING=1
|
||||
export NCCL_IB_PCI_RELAXED_ORDERING=1
|
||||
export NCCL_IB_TIMEOUT=22
|
||||
export NCCL_DEBUG=INFO
|
||||
export TRANSFORMERS_OFFLINE=0
|
||||
export TORCH_NCCL_AVOID_RECORD_STREAMS=1
|
||||
export NCCL_NVLS_ENABLE=0
|
||||
export NCCL_IB_DISABLE=0
|
||||
export CUDA_DEVICE_MAX_CONNECTIONS=1
|
||||
|
||||
# ---- Task & Logging ----
|
||||
TASK_NAME="sk2decompile_struct-rl-${REWARD_VARIANT}"
|
||||
LOG_DIR="${VERL_DIR}/logs"
|
||||
mkdir -p "$LOG_DIR"
|
||||
LOG_FILE="$LOG_DIR/${TASK_NAME}.log"
|
||||
ERR_FILE="$LOG_DIR/${TASK_NAME}.err"
|
||||
|
||||
# ---- WandB ----
|
||||
export WANDB_API_KEY=${WANDB_API_KEY_VAL}
|
||||
export WANDB_ENTITY=${WANDB_ENTITY_VAL}
|
||||
export WANDB_PROJECT=${WANDB_PROJECT_VAL}
|
||||
export WANDB_NAME=${TASK_NAME}
|
||||
export WANDB_MODE='online'
|
||||
wandb login --relogin $WANDB_API_KEY
|
||||
|
||||
# ---- Launch GRPO Training ----
|
||||
python3 -m verl.trainer.main_ppo --config-path=config \
|
||||
--config-name='ppo_trainer-lm4dc.yaml' \
|
||||
algorithm.adv_estimator=grpo \
|
||||
data.train_files=${TRAIN_DATA} \
|
||||
data.val_files=${VAL_DATA} \
|
||||
data.train_batch_size=128 \
|
||||
data.max_prompt_length=1024 \
|
||||
data.max_response_length=2048 \
|
||||
data.filter_overlong_prompts=True \
|
||||
data.truncation='error' \
|
||||
actor_rollout_ref.model.path=${MODEL_PATH} \
|
||||
actor_rollout_ref.actor.optim.lr=1e-6 \
|
||||
actor_rollout_ref.model.use_remove_padding=True \
|
||||
actor_rollout_ref.actor.ppo_mini_batch_size=32 \
|
||||
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
|
||||
actor_rollout_ref.actor.use_kl_loss=True \
|
||||
actor_rollout_ref.actor.kl_loss_coef=${KL_COEF} \
|
||||
actor_rollout_ref.actor.kl_loss_type=low_var_kl \
|
||||
actor_rollout_ref.actor.entropy_coeff=0 \
|
||||
actor_rollout_ref.model.enable_gradient_checkpointing=False \
|
||||
actor_rollout_ref.actor.fsdp_config.param_offload=False \
|
||||
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
|
||||
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \
|
||||
actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
|
||||
actor_rollout_ref.rollout.name=vllm \
|
||||
actor_rollout_ref.rollout.gpu_memory_utilization=0.80 \
|
||||
actor_rollout_ref.rollout.n=16 \
|
||||
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \
|
||||
actor_rollout_ref.ref.fsdp_config.param_offload=True \
|
||||
algorithm.use_kl_in_reward=False \
|
||||
trainer.critic_warmup=0 \
|
||||
trainer.logger=['console','wandb'] \
|
||||
trainer.project_name='sk2decompile_rl' \
|
||||
trainer.experiment_name=$TASK_NAME \
|
||||
trainer.default_local_dir=${VERL_DIR}/checkpoints/${TASK_NAME} \
|
||||
trainer.n_gpus_per_node=${GPUS_PER_NODE} \
|
||||
trainer.nnodes=${NUM_NODES} \
|
||||
trainer.save_freq=${SAVE_FREQ} \
|
||||
trainer.test_freq=${TEST_FREQ} \
|
||||
trainer.total_epochs=${TOTAL_EPOCHS} "$@" \
|
||||
> >(tee -a "$LOG_FILE") \
|
||||
2> >(tee -a "$ERR_FILE" >&2)
|
||||
|
||||
echo "STDOUT saved to: $LOG_FILE"
|
||||
echo "STDERR saved to: $ERR_FILE"
|
||||
Loading…
Add table
Add a link
Reference in a new issue