Merge pull request #73 from BaiRiDreamer/main

Merge VERL RL training + BringUpBench evaluation pipeline
2026-06-17 01:55:50 +00:00 · 2026-02-12 11:02:03 +08:00 · 2026-02-12 11:02:03 +08:00 · 85b364bf09
commit 85b364bf09
parent 1c164e21cd 239cba2673
30 changed files with 7308 additions and 6 deletions
--- a/sk2decompile/README.md
+++ b/sk2decompile/README.md
@ -34,8 +34,19 @@ Binary/Pseudo-code → [Phase 1: Skeleton] → Normalized IR → [Phase 2: Skin]
 SK2Decompile/
 ├── Preprocess/        # Data preprocessing and normalization tools
 ├── LLaMA-Factory/     # Supervised Fine-Tuning (SFT) implementation
-├── verl/              # Reinforcement Learning (RL) with compiler-based rewards
+├── verl/              # Reinforcement Learning (RL) with VERL/GRPO
+│   └── SK2DECOMPILE/
+│       ├── data/              # Example RL training data
+│       ├── reward_functions/  # Custom reward functions (4 variants)
+│       ├── scripts/           # Training launch scripts
+│       └── README.md          # Detailed RL documentation
 ├── evaluation/        # Comprehensive evaluation suite
+│   ├── bringupbench/          # BringUpBench evaluation (Section A.6)
+│   │   ├── scripts/           # Pipeline scripts (compile, decompile, evaluate)
+│   │   ├── data/              # Pre-built function maps and inference results
+│   │   ├── reports/           # Evaluation result summaries
+│   │   └── README.md          # Detailed BringUpBench documentation
+│   └── ...                    # HumanEval, MBPP evaluation scripts
 └── README.md          # This file
 ```

@ -107,20 +118,32 @@ llamafactory-cli train LLaMA-Factory/SK2DECOMPILE/train/norm2code-example.yaml

 ### Phase 2: Reinforcement Learning (RL)

-Fine-tune models using compiler-based rewards for improved correctness:
+After SFT, we apply GRPO (Group Relative Policy Optimization) to further align each model with task-specific objectives (Section 3.5 of the paper):
+
+- **Structure Recovery**
+- **Identifier Naming**
+
+Our RL training is based on [VERL](https://github.com/volcengine/verl) v0.4.1 (Sheng et al., 2024).

 #### Setup VERL
 ```bash
-cd ../verl
-# Follow installation instructions in verl/README.md
+git clone https://github.com/volcengine/verl.git
+cd verl && git checkout v0.4.1 && pip install -e .
+pip install tree-sitter==0.24.0 tree-sitter-c==0.23.4 openai
 ```

 #### Run RL Training
 ```bash
-bash verl/SK2DECOMPILE/train/sk2decompile-rl.sh
+# Structure Recovery RL
+bash verl/SK2DECOMPILE/scripts/run_struct_rl.sh
+
+# Identifier Naming RL (requires embedding server)
+bash verl/SK2DECOMPILE/scripts/run_ident_rl.sh
 ```

-**RL Training Data:** `verl/SK2DECOMPILE/data/sk2decompile-rl-examples.parquet`
+See [`verl/SK2DECOMPILE/README.md`](verl/SK2DECOMPILE/README.md) for the full reproduction guide, including how to integrate reward functions into VERL and prepare training data.
+
+**RL Training Data:** `verl/SK2DECOMPILE/data/sk2decompile-rl-examples.jsonl`

 ## Evaluation
 ```
@ -181,6 +204,12 @@ python gpt_judge.py --json_file your_json_file_path
                    --api_key your_openai_api_key
 ```

+**BringUpBench Evaluation** (Section A.6 of the paper)
+
+We also evaluate on [BringUpBench](https://github.com/toddmaustin/bringup-bench) — 90 self-contained C programs with 505 functions across O0–O3. SK²Decompile achieves **42.3% compilation rate** and **27.0% re-executability rate**, compared to IDA Pro's 23.6% / 21.7%.
+
+See [`evaluation/bringupbench/README.md`](evaluation/bringupbench/README.md) for the full reproduction pipeline, pre-built data, and detailed results.
+
 ## 📊 Results

 Our approach achieves state-of-the-art performance:
--- a/sk2decompile/evaluation/bringupbench/README.md
+++ b/sk2decompile/evaluation/bringupbench/README.md
@ -0,0 +1,249 @@
+# SK²Decompile — Evaluation on BringUpBench
+
+This directory contains the evaluation pipeline for SK²Decompile on the [BringUpBench](https://github.com/toddmaustin/bringup-bench) benchmark, as described in **Section A.6** of our paper:
+
+> **SK²Decompile: LLM-based Two-Phase Binary Decompilation from Skeleton to Skin**
+> [[arXiv:2509.22114]](https://arxiv.org/abs/2509.22114)
+
+## Overview
+
+[BringUpBench](https://github.com/toddmaustin/bringup-bench) (Austin, 2024) is a benchmark suite of **90 self-contained C programs** designed for bringing up newly designed CPUs, accelerators, compilers, and operating systems. It has **zero library dependencies** — all programs rely solely on a built-in `libmin` library and only 4 system calls — making it an ideal, standardized test bed for decompilation evaluation on complex, real-world binaries.
+
+We compiled, decompiled, and executed all projects across optimization levels O0–O3, yielding **505 functions** in total. We compared SK²Decompile against the industry-standard rule-based decompiler, **IDA Pro** (Hex-Rays).
+
+## Results
+
+### SK²Decompile vs IDA Pro
+
+| Opt Level | Functions | SK²Decompile Compilable | SK²Decompile Executable | IDA Compilable | IDA Executable |
+|:---------:|:---------:|:-----------------------:|:-----------------------:|:--------------:|:--------------:|
+| O0 | 382 | **50.26%** | **49.48%** | — | — |
+| O1 | 379 | **40.90%** | **39.05%** | — | — |
+| O2 | 368 | **37.77%** | **34.24%** | — | — |
+| O3 | 359 | **31.75%** | **29.53%** | — | — |
+| **Avg** | **1488** | **42.3%** | **27.0%** | **23.6%** | **21.7%** |
+
+> The average row reports the paper's aggregate numbers (Table 8 in Section A.6). Per-opt-level IDA baselines are not separately reported in the paper. Detailed per-benchmark breakdowns are available in `reports/`.
+
+## Directory Structure
+
+```
+bringupbench/
+├── README.md                              # This file
+├── config.env                             # Environment configuration (paths)
+├── scripts/
+│   ├── build-host-opt-levels.sh           # Step 1: Compile benchmarks at O0-O3
+│   ├── decompile-all-pseudo.sh            # Step 2: IDA Pro batch decompilation
+│   ├── dump_pseudo.py                     # IDA headless decompilation helper
+│   ├── disasm-all-objdump.sh              # Step 3: objdump batch disassembly
+│   ├── build-func-maps.py                # Step 4: Build function-level mappings
+│   ├── clean-all-benchmarks.sh            # Utility: clean all build artifacts
+│   └── eval_infer_out.py                 # Step 5: Automated evaluation
+├── data/
+│   ├── func_maps/                         # Pre-built function mappings (JSONL)
+│   │   ├── merged.O0.func_map.jsonl       # O0: 493 functions
+│   │   ├── merged.O1.func_map.jsonl       # O1: 449 functions
+│   │   ├── merged.O2.func_map.jsonl       # O2: 441 functions
+│   │   └── merged.O3.func_map.jsonl       # O3: 439 functions
+│   └── infer_results/                     # SK²Decompile inference results
+│       ├── merged.O0.func_map.infer.jsonl # O0: 382 evaluated functions
+│       ├── merged.O1.func_map.infer.jsonl # O1: 379 evaluated functions
+│       ├── merged.O2.func_map.infer.jsonl # O2: 368 evaluated functions
+│       └── merged.O3.func_map.infer.jsonl # O3: 359 evaluated functions
+└── reports/                               # Evaluation result summaries
+    ├── O0_results.md
+    ├── O1_results.md
+    ├── O2_results.md
+    └── O3_results.md
+```
+
+## Reproduction Pipeline
+
+Our evaluation pipeline consists of five steps, as described in the paper:
+
+```
+Source (.c)
+  │
+  ▼  Step 1: Compilation
+Binary (.host.O0 ~ .host.O3)
+  │
+  ├──▶ Step 2: Baseline Extraction (IDA Pro) ──▶ Pseudocode (.pseudo)
+  │
+  ├──▶ Step 3: Ground Truth Mapping           ──▶ Function Maps (.func_map.jsonl)
+  │
+  ▼  Step 4: Decompilation (SK²Decompile)
+Inferred C code (.func_map.infer.jsonl)
+  │
+  ▼  Step 5: Validation
+Evaluation Reports (reports/)
+```
+
+### Prerequisites
+
+| Dependency | Purpose | Installation |
+|------------|---------|-------------|
+| [Bringup-Bench](https://github.com/toddmaustin/bringup-bench) | Upstream benchmark suite (90 C programs) | `git clone https://github.com/toddmaustin/bringup-bench.git` |
+| GCC | Compile benchmarks | `apt install gcc` |
+| IDA Pro + Hex-Rays | Decompile binaries to pseudocode | Commercial software |
+| objdump (binutils) | Disassemble binaries | `apt install binutils` |
+| clang-format | Pseudocode normalization | `apt install clang-format` |
+| Python >= 3.10 | Run evaluation scripts | `apt install python3` |
+
+### Quick Start (Evaluation Only)
+
+If you only want to reproduce the evaluation step (Step 5), the pre-built data is included in `data/`. You only need the Bringup-Bench source repository:
+
+```bash
+# 1. Clone Bringup-Bench
+git clone https://github.com/toddmaustin/bringup-bench.git
+
+# 2. Configure paths
+cd bringupbench
+vim config.env  # Set BENCH_REPO_ROOT to your bringup-bench path
+
+# 3. Run evaluation (e.g., O0)
+python3 scripts/eval_infer_out.py data/infer_results/merged.O0.func_map.infer.jsonl
+
+# 4. Check results
+cat reports/O0_results.md
+```
+
+### Full Pipeline (From Scratch)
+
+To reproduce the entire pipeline from compilation to evaluation:
+
+```bash
+cd bringupbench
+vim config.env  # Set BENCH_REPO_ROOT and IDA_BIN
+```
+
+**Step 1: Compile benchmarks at O0–O3**
+
+Build all 90 Bringup-Bench programs at four optimization levels, producing `<name>.host.O{0,1,2,3}` binaries.
+
+```bash
+scripts/build-host-opt-levels.sh
+```
+
+**Step 2: Baseline Extraction (IDA Pro)**
+
+Use IDA Pro in headless mode to decompile all binaries, producing `.pseudo` files with Hex-Rays pseudocode.
+
+```bash
+scripts/decompile-all-pseudo.sh
+```
+
+Each function is delimited by `/* function_name @ 0xADDRESS */` in the output.
+
+**Step 3: Ground Truth Mapping**
+
+Parse source code, pseudocode, and assembly; match functions by name across all three representations; normalize pseudocode (remove IDA-specific types, hex-to-decimal conversion, clang-format).
+
+```bash
+# Disassemble (optional, for assembly mapping)
+scripts/disasm-all-objdump.sh
+
+# Build function-level mappings
+python3 scripts/build-func-maps.py
+```
+
+Output: per-binary `.func_map.jsonl` files. Merge them per optimization level:
+
+```bash
+cat $BENCH_REPO_ROOT/*/*.host.O0.func_map.jsonl > data/func_maps/merged.O0.func_map.jsonl
+cat $BENCH_REPO_ROOT/*/*.host.O1.func_map.jsonl > data/func_maps/merged.O1.func_map.jsonl
+cat $BENCH_REPO_ROOT/*/*.host.O2.func_map.jsonl > data/func_maps/merged.O2.func_map.jsonl
+cat $BENCH_REPO_ROOT/*/*.host.O3.func_map.jsonl > data/func_maps/merged.O3.func_map.jsonl
+```
+
+**Step 4: Decompilation (SK²Decompile Inference)**
+
+Feed the `pseudo_normalize` field from the function maps to SK²Decompile. The two-phase inference pipeline (see `../sk2decompile_inf.py`) produces C code for each function. Results should be written into the JSONL with the `pseudo.content-fix` field containing the final decompiled function body.
+
+```bash
+# Example: use the main SK²Decompile inference pipeline
+cd ../  # back to sk2decompile/evaluation/
+python3 sk2decompile_inf.py \
+    --dataset_path bringupbench/data/func_maps/merged.O0.func_map.jsonl \
+    --model_path LLM4Binary/sk2decompile-struct-6.7b \
+    --recover_model_path LLM4Binary/sk2decompile-ident-6.7b
+```
+
+**Step 5: Validation**
+
+For each function, replace the original source with the decompiled output, rebuild in an isolated workspace, and run the project's test suite.
+
+```bash
+python3 scripts/eval_infer_out.py data/infer_results/merged.O0.func_map.infer.jsonl \
+    --jobs 16 \
+    --command-timeout 20
+```
+
+Common options:
+
+```bash
+--jobs N              # Parallel workers (default: 96)
+--command-timeout S   # Timeout per make command in seconds (default: 20)
+--limit N             # Process only first N cases (for debugging)
+--keep-workspaces     # Keep temporary build directories
+```
+
+## Data Format
+
+### func_map.jsonl (Function Mappings)
+
+Each line is a JSON object containing the source, pseudocode, and assembly for one function:
+
+```jsonc
+{
+  "source": {
+    "path": "ackermann/ackermann.c",          // Source file (relative to BENCH_REPO_ROOT)
+    "function_name": "ackermann",              // Function name
+    "content": "int ackermann(int m, ...) { ... }\n"  // Complete function body
+  },
+  "pseudo": {
+    "path": "ackermann/ackermann.host.O0.pseudo",
+    "function_name": "ackermann",
+    "address": "0x11e9",                       // Function address in binary
+    "label": "ackermann",
+    "content": "__int64 __fastcall ackermann(...) { ... }\n"  // Raw IDA pseudocode
+  },
+  "pseudo_normalize": "int ackermann(...) { ... }",  // Normalized pseudocode
+  "binary": "ackermann/ackermann.host.O0",     // Binary file path
+  "assembly": "<ackermann>:\npush %rbp\n..."   // Cleaned objdump output
+}
+```
+
+### func_map.infer.jsonl (Inference Results)
+
+Extends `func_map.jsonl` with SK²Decompile inference outputs:
+
+```jsonc
+{
+  // ... all fields from func_map.jsonl ...
+  "pseudo": {
+    // ... all fields above, plus:
+    "content-fix": "..."           // Final decompiled function (used for source replacement)
+  },
+  "infer-out-model1": "...",       // Phase 1 (Structure Recovery) raw output
+  "infer-out-model2": "...",       // Phase 2 (Identifier Naming) raw output
+  "pseudo_normalize-fix": "..."    // Corrected normalized pseudocode
+}
+```
+
+## Evaluation Metrics
+
+| Metric | Definition |
+|--------|-----------|
+| **Replacement Rate** | Fraction of functions where the decompiled output can be located and substituted into the original source file |
+| **Compilable Rate** | Fraction of functions where the modified source compiles successfully (`make build`) |
+| **Executable Rate** | Fraction of functions where the compiled program passes its test suite (`make test`, output matches reference) |
+
+The evaluation uses BringUpBench's own build infrastructure (`Makefile`, `libmin`, `libtarg`) to compile and validate. Each function is tested in an isolated workspace to prevent cross-contamination.
+
+## Notes
+
+- BringUpBench programs are self-contained with zero external dependencies, making them ideal for evaluating decompilation without the confounding factor of missing headers or libraries.
+- The `func_maps/` data contains more functions than `infer_results/` because some functions are filtered during inference (e.g., exceeding token limits).
+- All scripts load paths from `config.env`. You can also override via environment variables or CLI arguments (priority: CLI > env > config.env).
+- For the complete SK²Decompile methodology and other benchmark results (HumanEval, MBPP, ExeBench, GitHub2025), see the [main README](../../README.md).
--- a/sk2decompile/evaluation/bringupbench/config.env
+++ b/sk2decompile/evaluation/bringupbench/config.env
@ -0,0 +1,14 @@
+# BringUpBench Evaluation — Environment Configuration
+# All scripts resolve paths from this file.
+# Values can be overridden by same-named environment variables or CLI arguments.
+# Priority: CLI args > environment variables > config.env
+
+# Absolute path to the Bringup-Bench repository
+# Clone from: https://github.com/toddmaustin/bringup-bench.git
+BENCH_REPO_ROOT=/path/to/bringup-bench
+
+# IDA Pro command-line executable (required for Step 2: decompilation)
+IDA_BIN=/path/to/idat
+
+# Default build target (host = native x86-64 Linux)
+DEFAULT_TARGET=host
--- a/sk2decompile/evaluation/bringupbench/data/func_maps/merged.O0.func_map.jsonl
+++ b/sk2decompile/evaluation/bringupbench/data/func_maps/merged.O0.func_map.jsonl
--- a/sk2decompile/evaluation/bringupbench/data/func_maps/merged.O1.func_map.jsonl
+++ b/sk2decompile/evaluation/bringupbench/data/func_maps/merged.O1.func_map.jsonl
--- a/sk2decompile/evaluation/bringupbench/data/func_maps/merged.O2.func_map.jsonl
+++ b/sk2decompile/evaluation/bringupbench/data/func_maps/merged.O2.func_map.jsonl
--- a/sk2decompile/evaluation/bringupbench/data/func_maps/merged.O3.func_map.jsonl
+++ b/sk2decompile/evaluation/bringupbench/data/func_maps/merged.O3.func_map.jsonl
--- a/sk2decompile/evaluation/bringupbench/data/infer_results/merged.O0.func_map.infer.jsonl
+++ b/sk2decompile/evaluation/bringupbench/data/infer_results/merged.O0.func_map.infer.jsonl
--- a/sk2decompile/evaluation/bringupbench/data/infer_results/merged.O1.func_map.infer.jsonl
+++ b/sk2decompile/evaluation/bringupbench/data/infer_results/merged.O1.func_map.infer.jsonl
--- a/sk2decompile/evaluation/bringupbench/data/infer_results/merged.O2.func_map.infer.jsonl
+++ b/sk2decompile/evaluation/bringupbench/data/infer_results/merged.O2.func_map.infer.jsonl
--- a/sk2decompile/evaluation/bringupbench/data/infer_results/merged.O3.func_map.infer.jsonl
+++ b/sk2decompile/evaluation/bringupbench/data/infer_results/merged.O3.func_map.infer.jsonl
--- a/sk2decompile/evaluation/bringupbench/reports/O0_results.md
+++ b/sk2decompile/evaluation/bringupbench/reports/O0_results.md
@ -0,0 +1,296 @@
+# Infer-Out Model 2 Evaluation (merged.O0.func_map.infer-host)
+
+- Timestamp: 20251119-171008
+- Source JSONL: merged.O0.func_map.infer.jsonl
+- Target: host
+- Total cases: 382
+- Replacement success: 382 (100.00%)
+- Compilable: 192 (50.26%)
+- Executable: 189 (49.48%)
+
+## Benchmark Breakdown
+| Benchmark | Cases | Replacement% | Build% | Exec% |
+| --- | --- | --- | --- | --- |
+| ackermann | 2 | 100.00% | 50.00% | 50.00% |
+| aes | 9 | 100.00% | 33.33% | 33.33% |
+| anagram | 12 | 100.00% | 58.33% | 58.33% |
+| audio-codec | 4 | 100.00% | 50.00% | 50.00% |
+| avl-tree | 14 | 100.00% | 35.71% | 35.71% |
+| banner | 1 | 100.00% | 0.00% | 0.00% |
+| bit-kernels | 5 | 100.00% | 100.00% | 100.00% |
+| blake2b | 6 | 100.00% | 16.67% | 16.67% |
+| bloom-filter | 3 | 100.00% | 33.33% | 33.33% |
+| boyer-moore-search | 3 | 100.00% | 0.00% | 0.00% |
+| bubble-sort | 2 | 100.00% | 100.00% | 100.00% |
+| c-interp | 10 | 100.00% | 70.00% | 70.00% |
+| ccmac | 2 | 100.00% | 50.00% | 50.00% |
+| checkers | 15 | 100.00% | 80.00% | 80.00% |
+| cipher | 3 | 100.00% | 33.33% | 33.33% |
+| congrad | 6 | 100.00% | 66.67% | 66.67% |
+| connect4-minimax | 13 | 100.00% | 61.54% | 61.54% |
+| convex-hull | 4 | 100.00% | 75.00% | 75.00% |
+| dhrystone | 5 | 100.00% | 60.00% | 60.00% |
+| distinctness | 2 | 100.00% | 0.00% | 0.00% |
+| fft-int | 4 | 100.00% | 50.00% | 50.00% |
+| flood-fill | 2 | 100.00% | 50.00% | 50.00% |
+| frac-calc | 10 | 100.00% | 60.00% | 60.00% |
+| fuzzy-match | 4 | 100.00% | 25.00% | 25.00% |
+| fy-shuffle | 4 | 100.00% | 50.00% | 50.00% |
+| gcd-list | 2 | 100.00% | 0.00% | 0.00% |
+| grad-descent | 4 | 100.00% | 75.00% | 75.00% |
+| graph-tests | 19 | 100.00% | 21.05% | 21.05% |
+| hanoi | 2 | 100.00% | 50.00% | 50.00% |
+| heapsort | 2 | 100.00% | 50.00% | 50.00% |
+| heat-calc | 1 | 100.00% | 0.00% | 0.00% |
+| huff-encode | 12 | 100.00% | 91.67% | 91.67% |
+| idct-alg | 4 | 100.00% | 50.00% | 50.00% |
+| indirect-test | 2 | 100.00% | 50.00% | 50.00% |
+| k-means | 6 | 100.00% | 100.00% | 100.00% |
+| kadane | 2 | 100.00% | 50.00% | 50.00% |
+| kepler | 7 | 100.00% | 28.57% | 28.57% |
+| knapsack | 3 | 100.00% | 33.33% | 33.33% |
+| knights-tour | 3 | 100.00% | 66.67% | 66.67% |
+| life | 14 | 100.00% | 78.57% | 71.43% |
+| longdiv | 7 | 100.00% | 71.43% | 71.43% |
+| lu-decomp | 3 | 100.00% | 33.33% | 33.33% |
+| lz-compress | 2 | 100.00% | 100.00% | 100.00% |
+| mandelbrot | 1 | 100.00% | 0.00% | 0.00% |
+| matmult | 1 | 100.00% | 0.00% | 0.00% |
+| max-subseq | 2 | 100.00% | 0.00% | 0.00% |
+| mersenne | 3 | 100.00% | 0.00% | 0.00% |
+| minspan | 8 | 100.00% | 62.50% | 62.50% |
+| monte-carlo | 1 | 100.00% | 0.00% | 0.00% |
+| murmur-hash | 2 | 100.00% | 0.00% | 0.00% |
+| n-queens | 3 | 100.00% | 66.67% | 66.67% |
+| natlog | 1 | 100.00% | 0.00% | 0.00% |
+| nbody-sim | 1 | 100.00% | 0.00% | 0.00% |
+| nr-solver | 1 | 100.00% | 100.00% | 100.00% |
+| packet-filter | 3 | 100.00% | 33.33% | 33.33% |
+| parrondo | 3 | 100.00% | 33.33% | 33.33% |
+| pascal | 3 | 100.00% | 100.00% | 100.00% |
+| pi-calc | 1 | 100.00% | 0.00% | 0.00% |
+| primal-test | 3 | 100.00% | 0.00% | 0.00% |
+| priority-queue | 5 | 100.00% | 80.00% | 80.00% |
+| qsort-demo | 5 | 100.00% | 0.00% | 0.00% |
+| qsort-test | 3 | 100.00% | 66.67% | 66.67% |
+| quaternions | 4 | 100.00% | 0.00% | 0.00% |
+| rabinkarp-search | 2 | 100.00% | 50.00% | 50.00% |
+| rand-test | 2 | 100.00% | 0.00% | 0.00% |
+| ransac | 2 | 100.00% | 50.00% | 50.00% |
+| regex-parser | 11 | 100.00% | 72.73% | 63.64% |
+| rho-factor | 4 | 100.00% | 75.00% | 75.00% |
+| rle-compress | 2 | 100.00% | 50.00% | 50.00% |
+| rsa-cipher | 4 | 100.00% | 0.00% | 0.00% |
+| sat-solver | 5 | 100.00% | 60.00% | 60.00% |
+| shortest-path | 3 | 100.00% | 66.67% | 66.67% |
+| sieve | 2 | 100.00% | 50.00% | 50.00% |
+| simple-grep | 1 | 100.00% | 0.00% | 0.00% |
+| spelt2num | 1 | 100.00% | 0.00% | 0.00% |
+| spirograph | 2 | 100.00% | 50.00% | 50.00% |
+| sudoku-solver | 4 | 100.00% | 75.00% | 75.00% |
+| tetris-sim | 12 | 100.00% | 75.00% | 75.00% |
+| tiny-NN | 2 | 100.00% | 50.00% | 50.00% |
+| topo-sort | 7 | 100.00% | 0.00% | 0.00% |
+| totient | 4 | 100.00% | 75.00% | 75.00% |
+| transcend | 3 | 100.00% | 66.67% | 66.67% |
+| uniquify | 1 | 100.00% | 0.00% | 0.00% |
+| vectors-3d | 8 | 100.00% | 12.50% | 12.50% |
+| verlet | 4 | 100.00% | 25.00% | 0.00% |
+| weekday | 2 | 100.00% | 0.00% | 0.00% |
+
+## Compilation Failures
+- ackermann/ackermann.c::main@0x13b9
+- aes/aes.c::aes_decrypt@0x1a65
+- aes/aes.c::aes_encrypt@0x1943
+- aes/aes.c::inv_shift_rows@0x1396
+- aes/aes.c::key_expansion@0x179a
+- aes/aes.c::main@0x1b87
+- aes/aes.c::shift_rows@0x12e5
+- anagram/anagram.c::BuildMask@0x13e7
+- anagram/anagram.c::BuildWord@0x17e5
+- anagram/anagram.c::FindAnagram@0x1ba6
+- anagram/anagram.c::ReadDict@0x121f
+- anagram/anagram.c::main@0x1f71
+- audio-codec/audio-codec.c::decode@0x12f5
+- audio-codec/audio-codec.c::main@0x14b3
+- avl-tree/avlcore.c::DeleteByElement@0x240f
+- avl-tree/avlcore.c::DeleteByElementRecursive@0x21af
+- avl-tree/avlcore.c::DeleteLeftMost@0x2086
+- avl-tree/avlcore.c::FindByElement@0x1a46
+- avl-tree/avlcore.c::Height@0x2475
+- avl-tree/avlcore.c::Insert@0x1fc4
+- avl-tree/avlcore.c::SingleLeftRotation@0x1b3a
+- avl-tree/avl-tree.c::main@0x1399
+- avl-tree/avl-tree.c::printTree@0x11e9
+- banner/banner.c::main@0x11e9
+- blake2b/blake2b.c::BLAKE2B@0x1a9b
+- blake2b/blake2b.c::F@0x1502
+- blake2b/blake2b.c::G@0x1258
+- blake2b/blake2b.c::blake2b@0x1cd3
+- blake2b/blake2b.c::test@0x2071
+- bloom-filter/bloom-filter.c::bad_search@0x11e9
+- bloom-filter/bloom-filter.c::main@0x123d
+- boyer-moore-search/boyer-moore-search.c::badCharHeuristic@0x11e9
+- boyer-moore-search/boyer-moore-search.c::main@0x146d
+- boyer-moore-search/boyer-moore-search.c::search@0x126d
+- c-interp/c-interp.c::eval@0x457c
+- c-interp/c-interp.c::main@0x4e03
+- c-interp/c-interp.c::next@0x11e9
+- ccmac/ccmac.c::main@0x127e
+- checkers/functions.c::fill_print_initial@0x1793
+- checkers/functions.c::generate_node_children@0x29ff
+- checkers/checkers.c::main@0x11e9
+- cipher/cipher.c::encipher@0x11e9
+- cipher/cipher.c::main@0x13cd
+- congrad/congrad.c::cg_solve@0x1643
+- congrad/congrad.c::main@0x199b
+- connect4-minimax/connect4-minimax.c::init_board@0x11e9
+- connect4-minimax/connect4-minimax.c::main@0x2299
+- connect4-minimax/connect4-minimax.c::minimax@0x1d07
+- connect4-minimax/connect4-minimax.c::play_game@0x20d1
+- connect4-minimax/connect4-minimax.c::score_position@0x1a02
+- convex-hull/convex-hull.c::main@0x13e7
+- dhrystone/dhrystone.c::Proc_1@0x199f
+- dhrystone/dhrystone.c::main@0x11e9
+- distinctness/distinctness.c::isDistinct@0x11e9
+- distinctness/distinctness.c::main@0x15d8
+- fft-int/fft-int.c::db_from_ampl@0x1807
+- fft-int/fft-int.c::fix_fft@0x11e9
+- flood-fill/flood-fill.c::main@0x144d
+- frac-calc/frac-calc.c::copyr@0x14d4
+- frac-calc/frac-calc.c::divtokens@0x15b8
+- frac-calc/frac-calc.c::help@0x13d9
+- frac-calc/frac-calc.c::main@0x11e9
+- fuzzy-match/fuzzy-match.c::compute_score@0x2379
+- fuzzy-match/fuzzy-match.c::fuzzy_match_recurse@0x2283
+- fuzzy-match/fuzzy-match.c::main@0x24b3
+- fy-shuffle/fy-shuffle.c::main@0x1378
+- fy-shuffle/fy-shuffle.c::rand_int@0x11e9
+- gcd-list/gcd-list.c::gcd@0x11e9
+- gcd-list/gcd-list.c::main@0x125e
+- grad-descent/grad-descent.c::main@0x1413
+- graph-tests/graph-tests.c::addEdge@0x12c9
+- graph-tests/graph-tests.c::addVertex@0x19f6
+- graph-tests/graph-tests.c::bfs@0x15ce
+- graph-tests/graph-tests.c::bfs_test@0x16e9
+- graph-tests/graph-tests.c::bubbleSort@0x1829
+- graph-tests/graph-tests.c::createGraph@0x1221
+- graph-tests/graph-tests.c::createNode@0x11e9
+- graph-tests/graph-tests.c::createQueue@0x1372
+- graph-tests/graph-tests.c::dequeue@0x145d
+- graph-tests/graph-tests.c::enqueue@0x13d7
+- graph-tests/graph-tests.c::insertAtTheBegin@0x17b1
+- graph-tests/graph-tests.c::link_list@0x18b8
+- graph-tests/graph-tests.c::main@0x1d6c
+- graph-tests/graph-tests.c::printQueue@0x151b
+- graph-tests/graph-tests.c::swap@0x17f8
+- hanoi/hanoi.c::main@0x12d4
+- heapsort/heapsort.c::main@0x155f
+- heat-calc/heat-calc.c::main@0x11e9
+- huff-encode/huff-encode.c::main@0x192d
+- idct-alg/idct-alg.c::C@0x11e9
+- idct-alg/idct-alg.c::main@0x1472
+- indirect-test/indirect-test.c::main@0x12c9
+- kadane/kadane.c::main@0x1276
+- kepler/kepler.c::bin_fact@0x1b3e
+- kepler/kepler.c::binary@0x12c6
+- kepler/kepler.c::e_series@0x1389
+- kepler/kepler.c::j_series@0x1501
+- kepler/kepler.c::main@0x1608
+- knapsack/knapsack.c::main@0x138e
+- knapsack/knapsack.c::max@0x11e9
+- knights-tour/knights-tour.c::solveKT@0x12d6
+- life/life.c::getNumNeigbors@0x156f
+- life/life.c::main@0x11e9
+- life/life.c::process@0x1426
+- longdiv/longdiv.c::main@0x18fd
+- longdiv/longdiv.c::sub@0x11e9
+- lu-decomp/lu-decomp.c::main@0x1520
+- lu-decomp/lu-decomp.c::print_matrix@0x11e9
+- mandelbrot/mandelbrot.c::main@0x1220
+- matmult/matmult.c::main@0x11e9
+- max-subseq/max-subseq.c::lcsAlgo@0x11e9
+- max-subseq/max-subseq.c::main@0x171a
+- mersenne/mersenne.c::genrand@0x12ee
+- mersenne/mersenne.c::main@0x153a
+- mersenne/mersenne.c::sgenrand@0x11e9
+- minspan/minspan.c::displayPath@0x1af2
+- minspan/minspan.c::main@0x1d8f
+- minspan/minspan.c::minSpanTree@0x1297
+- monte-carlo/monte-carlo.c::main@0x11e9
+- murmur-hash/murmur-hash.c::main@0x13a9
+- murmur-hash/murmur-hash.c::murmurhash@0x11e9
+- n-queens/n-queens.c::main@0x12ec
+- natlog/natlog.c::main@0x11e9
+- nbody-sim/nbody-sim.c::main@0x11e9
+- packet-filter/packet-filter.c::generate_packet@0x11e9
+- packet-filter/packet-filter.c::main@0x14c3
+- parrondo/parrondo.c::cointoss@0x11e9
+- parrondo/parrondo.c::main@0x12cb
+- pi-calc/pi-calc.c::main@0x11e9
+- primal-test/primal-test.c::main@0x1459
+- primal-test/primal-test.c::miller_rabin_int@0x12fd
+- primal-test/primal-test.c::powm@0x11e9
+- priority-queue/priority-queue.c::main@0x13ee
+- qsort-demo/qsort-demo.c::main@0x17bf
+- qsort-demo/qsort-demo.c::print_struct_array@0x155e
+- qsort-demo/qsort-demo.c::sort_cstrings_example@0x1401
+- qsort-demo/qsort-demo.c::sort_integers_example@0x1280
+- qsort-demo/qsort-demo.c::sort_structs_example@0x1603
+- qsort-test/qsort-test.c::main@0x1415
+- quaternions/quaternions.c::euler_from_quat@0x1447
+- quaternions/quaternions.c::quat_from_euler@0x11e9
+- quaternions/quaternions.c::quaternion_multiply@0x1655
+- quaternions/quaternions.c::test@0x18b2
+- rabinkarp-search/rabinkarp-search.c::main@0x1341
+- rand-test/rand-test.c::main@0x1913
+- rand-test/rand-test.c::run_tests@0x1258
+- ransac/ransac.c::main@0x1466
+- regex-parser/regex-parser.c::main@0x32b9
+- regex-parser/regex-parser.c::re_compile@0x22e1
+- regex-parser/regex-parser.c::re_print@0x278f
+- rho-factor/rho-factor.c::main@0x5c7d
+- rle-compress/rle-compress.c::run_length_encode@0x11e9
+- rsa-cipher/rsa-cipher.c::main@0x1634
+- rsa-cipher/rsa-cipher.c::mod_inverse@0x1363
+- rsa-cipher/rsa-cipher.c::mod_pow@0x11e9
+- rsa-cipher/rsa-cipher.c::print_hex_int128@0x14ef
+- sat-solver/sat-solver.c::main@0x1518
+- sat-solver/sat-solver.c::printFormula@0x1391
+- shortest-path/shortest-path.c::main@0x1469
+- sieve/sieve.c::main@0x1300
+- simple-grep/simple-grep.c::main@0x11e9
+- spelt2num/spelt2num.c::main@0x11e9
+- spirograph/spirograph.c::spirograph@0x11e9
+- sudoku-solver/sudoku-solver.c::main@0x1532
+- tetris-sim/tetris-sim.c::best_move@0x1810
+- tetris-sim/tetris-sim.c::evaluate_board@0x1686
+- tetris-sim/tetris-sim.c::main@0x1ba5
+- tiny-NN/tiny-NN.c::train@0x1485
+- topo-sort/topo-sort.c::addEdge@0x12cf
+- topo-sort/topo-sort.c::createGraph@0x1259
+- topo-sort/topo-sort.c::createListNode@0x1221
+- topo-sort/topo-sort.c::createStackNode@0x11e9
+- topo-sort/topo-sort.c::main@0x153d
+- topo-sort/topo-sort.c::topologicalSort@0x13fd
+- topo-sort/topo-sort.c::topologicalSortUtil@0x1332
+- totient/totient.c::my_gcd@0x11e9
+- transcend/transcend.c::init_inputs_f64@0x1235
+- uniquify/uniquify.c::main@0x1228
+- vectors-3d/vectors-3d.c::get_cross_matrix@0x1601
+- vectors-3d/vectors-3d.c::print_vector@0x144f
+- vectors-3d/vectors-3d.c::test@0x17fb
+- vectors-3d/vectors-3d.c::unit_vec@0x1510
+- vectors-3d/vectors-3d.c::vector_add@0x126d
+- vectors-3d/vectors-3d.c::vector_prod@0x1373
+- vectors-3d/vectors-3d.c::vector_sub@0x11e9
+- verlet/verlet.c::main@0x170b
+- verlet/verlet.c::vb_init@0x1271
+- verlet/verlet.c::vb_step_avg@0x13aa
+- weekday/weekday.c::dayOfWeek@0x11e9
+- weekday/weekday.c::main@0x130d
+
+## Execution Failures
+- life/life.c::init@0x1237
+- regex-parser/regex-parser.c::matchpattern@0x313f
+- verlet/verlet.c::vb_checksum@0x160b
--- a/sk2decompile/evaluation/bringupbench/reports/O1_results.md
+++ b/sk2decompile/evaluation/bringupbench/reports/O1_results.md
@ -0,0 +1,334 @@
+# Infer-Out Model 2 Evaluation (merged.O1.func_map.infer-host)
+
+- Timestamp: 20251119-171212
+- Source JSONL: merged.O1.func_map.infer.jsonl
+- Target: host
+- Total cases: 379
+- Replacement success: 379 (100.00%)
+- Compilable: 155 (40.90%)
+- Executable: 148 (39.05%)
+
+## Benchmark Breakdown
+| Benchmark | Cases | Replacement% | Build% | Exec% |
+| --- | --- | --- | --- | --- |
+| ackermann | 2 | 100.00% | 50.00% | 50.00% |
+| aes | 9 | 100.00% | 33.33% | 33.33% |
+| anagram | 13 | 100.00% | 53.85% | 53.85% |
+| audio-codec | 3 | 100.00% | 0.00% | 0.00% |
+| avl-tree | 17 | 100.00% | 29.41% | 29.41% |
+| banner | 1 | 100.00% | 0.00% | 0.00% |
+| bit-kernels | 5 | 100.00% | 80.00% | 80.00% |
+| blake2b | 5 | 100.00% | 20.00% | 20.00% |
+| bloom-filter | 4 | 100.00% | 50.00% | 50.00% |
+| boyer-moore-search | 3 | 100.00% | 0.00% | 0.00% |
+| bubble-sort | 3 | 100.00% | 100.00% | 100.00% |
+| c-interp | 10 | 100.00% | 60.00% | 60.00% |
+| ccmac | 1 | 100.00% | 0.00% | 0.00% |
+| checkers | 16 | 100.00% | 81.25% | 81.25% |
+| cipher | 3 | 100.00% | 33.33% | 0.00% |
+| congrad | 2 | 100.00% | 0.00% | 0.00% |
+| connect4-minimax | 13 | 100.00% | 61.54% | 61.54% |
+| convex-hull | 4 | 100.00% | 75.00% | 75.00% |
+| dhrystone | 5 | 100.00% | 40.00% | 40.00% |
+| distinctness | 2 | 100.00% | 0.00% | 0.00% |
+| fft-int | 4 | 100.00% | 75.00% | 75.00% |
+| flood-fill | 2 | 100.00% | 50.00% | 50.00% |
+| frac-calc | 10 | 100.00% | 40.00% | 40.00% |
+| fuzzy-match | 3 | 100.00% | 33.33% | 33.33% |
+| fy-shuffle | 3 | 100.00% | 33.33% | 33.33% |
+| gcd-list | 2 | 100.00% | 0.00% | 0.00% |
+| grad-descent | 4 | 100.00% | 0.00% | 0.00% |
+| graph-tests | 19 | 100.00% | 21.05% | 21.05% |
+| hanoi | 2 | 100.00% | 50.00% | 50.00% |
+| heapsort | 2 | 100.00% | 50.00% | 50.00% |
+| heat-calc | 1 | 100.00% | 0.00% | 0.00% |
+| huff-encode | 13 | 100.00% | 92.31% | 92.31% |
+| idct-alg | 3 | 100.00% | 66.67% | 33.33% |
+| indirect-test | 2 | 100.00% | 50.00% | 50.00% |
+| k-means | 6 | 100.00% | 50.00% | 50.00% |
+| kadane | 2 | 100.00% | 50.00% | 50.00% |
+| kepler | 7 | 100.00% | 14.29% | 14.29% |
+| knapsack | 3 | 100.00% | 33.33% | 33.33% |
+| knights-tour | 3 | 100.00% | 66.67% | 66.67% |
+| life | 14 | 100.00% | 21.43% | 14.29% |
+| longdiv | 7 | 100.00% | 71.43% | 71.43% |
+| lu-decomp | 3 | 100.00% | 33.33% | 33.33% |
+| lz-compress | 2 | 100.00% | 100.00% | 100.00% |
+| mandelbrot | 1 | 100.00% | 0.00% | 0.00% |
+| matmult | 1 | 100.00% | 0.00% | 0.00% |
+| max-subseq | 2 | 100.00% | 0.00% | 0.00% |
+| mersenne | 3 | 100.00% | 0.00% | 0.00% |
+| minspan | 8 | 100.00% | 37.50% | 25.00% |
+| monte-carlo | 1 | 100.00% | 0.00% | 0.00% |
+| murmur-hash | 2 | 100.00% | 0.00% | 0.00% |
+| n-queens | 3 | 100.00% | 66.67% | 66.67% |
+| natlog | 1 | 100.00% | 0.00% | 0.00% |
+| nbody-sim | 1 | 100.00% | 0.00% | 0.00% |
+| nr-solver | 1 | 100.00% | 100.00% | 100.00% |
+| packet-filter | 4 | 100.00% | 25.00% | 25.00% |
+| parrondo | 2 | 100.00% | 0.00% | 0.00% |
+| pascal | 3 | 100.00% | 33.33% | 33.33% |
+| pi-calc | 1 | 100.00% | 0.00% | 0.00% |
+| primal-test | 3 | 100.00% | 33.33% | 33.33% |
+| priority-queue | 5 | 100.00% | 80.00% | 80.00% |
+| qsort-demo | 7 | 100.00% | 28.57% | 28.57% |
+| qsort-test | 5 | 100.00% | 80.00% | 80.00% |
+| quaternions | 4 | 100.00% | 0.00% | 0.00% |
+| rabinkarp-search | 2 | 100.00% | 0.00% | 0.00% |
+| rand-test | 3 | 100.00% | 0.00% | 0.00% |
+| ransac | 2 | 100.00% | 0.00% | 0.00% |
+| regex-parser | 8 | 100.00% | 25.00% | 12.50% |
+| rho-factor | 4 | 100.00% | 75.00% | 75.00% |
+| rle-compress | 2 | 100.00% | 0.00% | 0.00% |
+| rsa-cipher | 4 | 100.00% | 0.00% | 0.00% |
+| sat-solver | 5 | 100.00% | 60.00% | 60.00% |
+| shortest-path | 3 | 100.00% | 66.67% | 66.67% |
+| sieve | 1 | 100.00% | 0.00% | 0.00% |
+| simple-grep | 1 | 100.00% | 0.00% | 0.00% |
+| spelt2num | 1 | 100.00% | 0.00% | 0.00% |
+| spirograph | 2 | 100.00% | 50.00% | 50.00% |
+| sudoku-solver | 4 | 100.00% | 50.00% | 50.00% |
+| tetris-sim | 12 | 100.00% | 75.00% | 66.67% |
+| tiny-NN | 5 | 100.00% | 40.00% | 40.00% |
+| topo-sort | 7 | 100.00% | 0.00% | 0.00% |
+| totient | 4 | 100.00% | 50.00% | 50.00% |
+| transcend | 1 | 100.00% | 0.00% | 0.00% |
+| uniquify | 1 | 100.00% | 0.00% | 0.00% |
+| vectors-3d | 8 | 100.00% | 12.50% | 0.00% |
+| verlet | 1 | 100.00% | 0.00% | 0.00% |
+| weekday | 2 | 100.00% | 0.00% | 0.00% |
+
+## Compilation Failures
+- ackermann/ackermann.c::main@0x131c
+- aes/aes.c::aes_decrypt@0x161b
+- aes/aes.c::aes_encrypt@0x1560
+- aes/aes.c::inv_shift_rows@0x12cd
+- aes/aes.c::key_expansion@0x14c3
+- aes/aes.c::main@0x16d1
+- aes/aes.c::shift_rows@0x1248
+- anagram/anagram.c::BuildMask@0x1372
+- anagram/anagram.c::BuildWord@0x15cd
+- anagram/anagram.c::DumpWords@0x17e8
+- anagram/anagram.c::FindAnagram@0x1839
+- anagram/anagram.c::ReadDict@0x1233
+- anagram/anagram.c::main@0x1a93
+- audio-codec/audio-codec.c::decode@0x1271
+- audio-codec/audio-codec.c::encode@0x11e9
+- audio-codec/audio-codec.c::main@0x12d7
+- avl-tree/avlcore.c::CheckTreeNodeRotation@0x186a
+- avl-tree/element.c::Compare@0x1764
+- avl-tree/avlcore.c::DeleteByElement@0x1d2b
+- avl-tree/avlcore.c::DeleteByElementRecursive@0x1b8b
+- avl-tree/avlcore.c::DoubleLeftRotation@0x1845
+- avl-tree/avlcore.c::DoubleRightRotation@0x1821
+- avl-tree/avlcore.c::FindByElement@0x1790
+- avl-tree/avlcore.c::Height@0x1d6e
+- avl-tree/avlcore.c::Insert@0x1a73
+- avl-tree/avlcore.c::InsertNode@0x199b
+- avl-tree/avl-tree.c::main@0x1380
+- avl-tree/avl-tree.c::printTree@0x11e9
+- banner/banner.c::main@0x11e9
+- bit-kernels/bit-kernels.c::main@0x12e8
+- blake2b/blake2b.c::F@0x1258
+- blake2b/blake2b.c::G@0x11e9
+- blake2b/blake2b.c::blake2b@0x1616
+- blake2b/blake2b.c::test@0x1982
+- bloom-filter/bloom-filter.c::bad_search@0x11e9
+- bloom-filter/bloom-filter.c::main@0x1217
+- boyer-moore-search/boyer-moore-search.c::badCharHeuristic@0x11e9
+- boyer-moore-search/boyer-moore-search.c::main@0x1329
+- boyer-moore-search/boyer-moore-search.c::search@0x1223
+- c-interp/c-interp.c::eval@0x35d3
+- c-interp/c-interp.c::function_body@0x310b
+- c-interp/c-interp.c::main@0x3c45
+- c-interp/c-interp.c::next@0x11e9
+- ccmac/ccmac.c::main@0x11e9
+- checkers/functions.c::fill_print_initial@0x15dd
+- checkers/functions.c::link_new_node@0x204d
+- checkers/checkers.c::main@0x11e9
+- cipher/cipher.c::encipher@0x11e9
+- cipher/cipher.c::main@0x12b3
+- congrad/congrad.c::cg_spmv@0x11e9
+- congrad/congrad.c::main@0x125a
+- connect4-minimax/connect4-minimax.c::init_board@0x11e9
+- connect4-minimax/connect4-minimax.c::main@0x1c5d
+- connect4-minimax/connect4-minimax.c::minimax@0x17ed
+- connect4-minimax/connect4-minimax.c::play_game@0x1b13
+- connect4-minimax/connect4-minimax.c::score_position@0x158e
+- convex-hull/convex-hull.c::main@0x130d
+- dhrystone/dhrystone.c::PFunc_1@0x12ab
+- dhrystone/dhrystone.c::PFunc_2@0x12c8
+- dhrystone/dhrystone.c::main@0x1311
+- distinctness/distinctness.c::isDistinct@0x11e9
+- distinctness/distinctness.c::main@0x1342
+- fft-int/fft-int.c::db_from_ampl@0x1513
+- flood-fill/flood-fill.c::main@0x130f
+- frac-calc/frac-calc.c::avaliatokens@0x1421
+- frac-calc/frac-calc.c::calcula@0x172a
+- frac-calc/frac-calc.c::copyr@0x12b5
+- frac-calc/frac-calc.c::divtokens@0x1636
+- frac-calc/frac-calc.c::help@0x11e9
+- frac-calc/frac-calc.c::main@0x18c1
+- fuzzy-match/fuzzy-match.c::fuzzy_match_recurse@0x21e9
+- fuzzy-match/fuzzy-match.c::main@0x2391
+- fy-shuffle/fy-shuffle.c::fy_shuffle@0x11e9
+- fy-shuffle/fy-shuffle.c::main@0x12de
+- gcd-list/gcd-list.c::gcd@0x11e9
+- gcd-list/gcd-list.c::main@0x121c
+- grad-descent/grad-descent.c::derivateWRTBias@0x1247
+- grad-descent/grad-descent.c::derivateWRTWeight@0x11e9
+- grad-descent/grad-descent.c::gradientDescent@0x129d
+- grad-descent/grad-descent.c::main@0x1312
+- graph-tests/graph-tests.c::addEdge@0x127b
+- graph-tests/graph-tests.c::addVertex@0x1743
+- graph-tests/graph-tests.c::bfs@0x144f
+- graph-tests/graph-tests.c::bfs_test@0x150f
+- graph-tests/graph-tests.c::bubbleSort@0x15e7
+- graph-tests/graph-tests.c::createGraph@0x1206
+- graph-tests/graph-tests.c::createNode@0x11e9
+- graph-tests/graph-tests.c::createQueue@0x12cd
+- graph-tests/graph-tests.c::dequeue@0x1357
+- graph-tests/graph-tests.c::enqueue@0x130a
+- graph-tests/graph-tests.c::insertAtTheBegin@0x15ae
+- graph-tests/graph-tests.c::link_list@0x163c
+- graph-tests/graph-tests.c::main@0x1a0e
+- graph-tests/graph-tests.c::printQueue@0x13cc
+- graph-tests/graph-tests.c::swap@0x15da
+- hanoi/hanoi.c::main@0x1261
+- heapsort/heapsort.c::main@0x13d4
+- heat-calc/heat-calc.c::main@0x11e9
+- huff-encode/huff-encode.c::main@0x15ef
+- idct-alg/idct-alg.c::main@0x140e
+- indirect-test/indirect-test.c::main@0x1257
+- k-means/k-means.c::calculateNearst@0x11e9
+- k-means/k-means.c::main@0x1922
+- k-means/k-means.c::printEPS@0x1546
+- kadane/kadane.c::main@0x123b
+- kepler/kepler.c::J@0x18c0
+- kepler/kepler.c::bin_fact@0x1718
+- kepler/kepler.c::binary@0x121d
+- kepler/kepler.c::e_series@0x17a2
+- kepler/kepler.c::j_series@0x19bb
+- kepler/kepler.c::main@0x131f
+- knapsack/knapsack.c::main@0x128b
+- knapsack/knapsack.c::max@0x11e9
+- knights-tour/knights-tour.c::solveKT@0x1341
+- life/life.c::getDown@0x1406
+- life/life.c::getDownLeft@0x1487
+- life/life.c::getDownRight@0x14b4
+- life/life.c::getLeft@0x1390
+- life/life.c::getNumNeigbors@0x14e2
+- life/life.c::getRight@0x13b7
+- life/life.c::getUp@0x13df
+- life/life.c::getUpLeft@0x142e
+- life/life.c::getUpRight@0x145a
+- life/life.c::main@0x1664
+- life/life.c::process@0x15a3
+- longdiv/longdiv.c::main@0x1691
+- longdiv/longdiv.c::sub@0x11e9
+- lu-decomp/lu-decomp.c::main@0x13ad
+- lu-decomp/lu-decomp.c::print_matrix@0x11e9
+- mandelbrot/mandelbrot.c::main@0x120d
+- matmult/matmult.c::main@0x11e9
+- max-subseq/max-subseq.c::lcsAlgo@0x11e9
+- max-subseq/max-subseq.c::main@0x14c4
+- mersenne/mersenne.c::genrand@0x125b
+- mersenne/mersenne.c::main@0x1398
+- mersenne/mersenne.c::sgenrand@0x11e9
+- minspan/minspan.c::displayGraph@0x13f5
+- minspan/minspan.c::displayGraph1@0x14f3
+- minspan/minspan.c::displayPath@0x15fa
+- minspan/minspan.c::main@0x175b
+- minspan/minspan.c::minSpanTree@0x1231
+- monte-carlo/monte-carlo.c::main@0x11e9
+- murmur-hash/murmur-hash.c::main@0x12a3
+- murmur-hash/murmur-hash.c::murmurhash@0x11e9
+- n-queens/n-queens.c::main@0x12b1
+- natlog/natlog.c::main@0x11e9
+- nbody-sim/nbody-sim.c::main@0x11e9
+- packet-filter/packet-filter.c::check_packet_filter@0x133d
+- packet-filter/packet-filter.c::generate_packet@0x11e9
+- packet-filter/packet-filter.c::main@0x145c
+- parrondo/parrondo.c::main@0x127d
+- parrondo/parrondo.c::play_c@0x1238
+- pascal/pascal.c::main@0x12d1
+- pascal/pascal.c::print_centered@0x122b
+- pi-calc/pi-calc.c::main@0x11e9
+- primal-test/primal-test.c::main@0x13ea
+- primal-test/primal-test.c::miller_rabin_int@0x1243
+- priority-queue/priority-queue.c::main@0x130a
+- qsort-demo/qsort-demo.c::main@0x163f
+- qsort-demo/qsort-demo.c::print_struct_array@0x1470
+- qsort-demo/qsort-demo.c::sort_cstrings_example@0x13b3
+- qsort-demo/qsort-demo.c::sort_integers_example@0x1292
+- qsort-demo/qsort-demo.c::sort_structs_example@0x14d2
+- qsort-test/qsort-test.c::main@0x133f
+- quaternions/quaternions.c::euler_from_quat@0x136c
+- quaternions/quaternions.c::main@0x15bf
+- quaternions/quaternions.c::quat_from_euler@0x11e9
+- quaternions/quaternions.c::quaternion_multiply@0x1487
+- rabinkarp-search/rabinkarp-search.c::main@0x1366
+- rabinkarp-search/rabinkarp-search.c::search@0x11e9
+- rand-test/rand-test.c::bad_rand@0x11e9
+- rand-test/rand-test.c::main@0x1514
+- rand-test/rand-test.c::run_tests@0x1220
+- ransac/ransac.c::main@0x13cf
+- ransac/ransac.c::ransac_line_fitting@0x1238
+- regex-parser/regex-parser.c::main@0x2b4b
+- regex-parser/regex-parser.c::matchalphanum@0x21fc
+- regex-parser/regex-parser.c::matchcharclass@0x222a
+- regex-parser/regex-parser.c::matchone@0x23e1
+- regex-parser/regex-parser.c::re_compile@0x270b
+- regex-parser/regex-parser.c::re_print@0x2964
+- rho-factor/rho-factor.c::main@0x3ef0
+- rle-compress/rle-compress.c::main@0x1318
+- rle-compress/rle-compress.c::run_length_encode@0x11e9
+- rsa-cipher/rsa-cipher.c::main@0x1527
+- rsa-cipher/rsa-cipher.c::mod_inverse@0x12f3
+- rsa-cipher/rsa-cipher.c::mod_pow@0x11e9
+- rsa-cipher/rsa-cipher.c::print_hex_int128@0x1444
+- sat-solver/sat-solver.c::main@0x141e
+- sat-solver/sat-solver.c::printFormula@0x12ff
+- shortest-path/shortest-path.c::main@0x1333
+- sieve/sieve.c::main@0x11e9
+- simple-grep/simple-grep.c::main@0x11e9
+- spelt2num/spelt2num.c::main@0x11e9
+- spirograph/spirograph.c::spirograph@0x11e9
+- sudoku-solver/sudoku-solver.c::isSafe@0x11e9
+- sudoku-solver/sudoku-solver.c::main@0x13e5
+- tetris-sim/tetris-sim.c::best_move@0x157c
+- tetris-sim/tetris-sim.c::evaluate_board@0x144b
+- tetris-sim/tetris-sim.c::main@0x180d
+- tiny-NN/tiny-NN.c::main@0x16a4
+- tiny-NN/tiny-NN.c::sampleSine@0x1251
+- tiny-NN/tiny-NN.c::train@0x133c
+- topo-sort/topo-sort.c::addEdge@0x127d
+- topo-sort/topo-sort.c::createGraph@0x1223
+- topo-sort/topo-sort.c::createListNode@0x1206
+- topo-sort/topo-sort.c::createStackNode@0x11e9
+- topo-sort/topo-sort.c::main@0x1424
+- topo-sort/topo-sort.c::topologicalSort@0x132c
+- topo-sort/topo-sort.c::topologicalSortUtil@0x12b7
+- totient/totient.c::main@0x12bf
+- totient/totient.c::my_gcd@0x11e9
+- transcend/transcend.c::main@0x11e9
+- uniquify/uniquify.c::main@0x1201
+- vectors-3d/vectors-3d.c::get_cross_matrix@0x13c2
+- vectors-3d/vectors-3d.c::main@0x14cb
+- vectors-3d/vectors-3d.c::print_vector@0x12dc
+- vectors-3d/vectors-3d.c::unit_vec@0x1331
+- vectors-3d/vectors-3d.c::vector_add@0x121f
+- vectors-3d/vectors-3d.c::vector_prod@0x127e
+- vectors-3d/vectors-3d.c::vector_sub@0x11e9
+- verlet/verlet.c::main@0x11e9
+- weekday/weekday.c::dayOfWeek@0x11e9
+- weekday/weekday.c::main@0x12ea
+
+## Execution Failures
+- cipher/cipher.c::decipher@0x1251
+- idct-alg/idct-alg.c::idct_2d@0x1216
+- life/life.c::init@0x11e9
+- minspan/minspan.c::displayTree@0x16b7
+- regex-parser/regex-parser.c::matchpattern@0x2491
+- tetris-sim/tetris-sim.c::clear_lines@0x12b6
+- vectors-3d/vectors-3d.c::get_angle@0x1429
--- a/sk2decompile/evaluation/bringupbench/reports/O2_results.md
+++ b/sk2decompile/evaluation/bringupbench/reports/O2_results.md
@ -0,0 +1,345 @@
+# Infer-Out Model 2 Evaluation (merged.O2.func_map.infer-host)
+
+- Timestamp: 20251119-170633
+- Source JSONL: merged.O2.func_map.infer.jsonl
+- Target: host
+- Total cases: 368
+- Replacement success: 368 (100.00%)
+- Compilable: 139 (37.77%)
+- Executable: 126 (34.24%)
+
+## Benchmark Breakdown
+| Benchmark | Cases | Replacement% | Build% | Exec% |
+| --- | --- | --- | --- | --- |
+| ackermann | 2 | 100.00% | 50.00% | 50.00% |
+| aes | 10 | 100.00% | 20.00% | 20.00% |
+| anagram | 13 | 100.00% | 46.15% | 46.15% |
+| audio-codec | 3 | 100.00% | 33.33% | 33.33% |
+| avl-tree | 15 | 100.00% | 20.00% | 20.00% |
+| banner | 1 | 100.00% | 0.00% | 0.00% |
+| bit-kernels | 3 | 100.00% | 66.67% | 66.67% |
+| blake2b | 4 | 100.00% | 0.00% | 0.00% |
+| bloom-filter | 4 | 100.00% | 50.00% | 50.00% |
+| boyer-moore-search | 3 | 100.00% | 0.00% | 0.00% |
+| bubble-sort | 3 | 100.00% | 100.00% | 100.00% |
+| c-interp | 10 | 100.00% | 50.00% | 50.00% |
+| ccmac | 1 | 100.00% | 0.00% | 0.00% |
+| checkers | 16 | 100.00% | 68.75% | 62.50% |
+| cipher | 3 | 100.00% | 66.67% | 0.00% |
+| congrad | 1 | 100.00% | 0.00% | 0.00% |
+| connect4-minimax | 13 | 100.00% | 61.54% | 53.85% |
+| convex-hull | 4 | 100.00% | 75.00% | 75.00% |
+| dhrystone | 5 | 100.00% | 20.00% | 20.00% |
+| distinctness | 2 | 100.00% | 0.00% | 0.00% |
+| fft-int | 4 | 100.00% | 50.00% | 50.00% |
+| flood-fill | 2 | 100.00% | 50.00% | 50.00% |
+| frac-calc | 10 | 100.00% | 50.00% | 50.00% |
+| fuzzy-match | 3 | 100.00% | 33.33% | 33.33% |
+| fy-shuffle | 3 | 100.00% | 33.33% | 33.33% |
+| gcd-list | 2 | 100.00% | 50.00% | 0.00% |
+| grad-descent | 4 | 100.00% | 25.00% | 25.00% |
+| graph-tests | 20 | 100.00% | 10.00% | 10.00% |
+| hanoi | 1 | 100.00% | 0.00% | 0.00% |
+| heapsort | 2 | 100.00% | 50.00% | 50.00% |
+| heat-calc | 1 | 100.00% | 0.00% | 0.00% |
+| huff-encode | 13 | 100.00% | 92.31% | 92.31% |
+| idct-alg | 3 | 100.00% | 66.67% | 33.33% |
+| indirect-test | 2 | 100.00% | 50.00% | 50.00% |
+| k-means | 6 | 100.00% | 33.33% | 33.33% |
+| kadane | 2 | 100.00% | 50.00% | 50.00% |
+| kepler | 7 | 100.00% | 14.29% | 14.29% |
+| knapsack | 3 | 100.00% | 33.33% | 33.33% |
+| knights-tour | 3 | 100.00% | 33.33% | 33.33% |
+| life | 14 | 100.00% | 21.43% | 14.29% |
+| longdiv | 6 | 100.00% | 50.00% | 50.00% |
+| lu-decomp | 3 | 100.00% | 33.33% | 33.33% |
+| lz-compress | 2 | 100.00% | 100.00% | 100.00% |
+| mandelbrot | 1 | 100.00% | 0.00% | 0.00% |
+| matmult | 1 | 100.00% | 0.00% | 0.00% |
+| max-subseq | 2 | 100.00% | 0.00% | 0.00% |
+| mersenne | 3 | 100.00% | 0.00% | 0.00% |
+| minspan | 8 | 100.00% | 25.00% | 25.00% |
+| monte-carlo | 1 | 100.00% | 0.00% | 0.00% |
+| murmur-hash | 2 | 100.00% | 0.00% | 0.00% |
+| n-queens | 3 | 100.00% | 66.67% | 66.67% |
+| natlog | 1 | 100.00% | 0.00% | 0.00% |
+| nbody-sim | 1 | 100.00% | 0.00% | 0.00% |
+| nr-solver | 1 | 100.00% | 100.00% | 100.00% |
+| packet-filter | 4 | 100.00% | 0.00% | 0.00% |
+| parrondo | 2 | 100.00% | 50.00% | 50.00% |
+| pascal | 3 | 100.00% | 66.67% | 66.67% |
+| pi-calc | 1 | 100.00% | 0.00% | 0.00% |
+| primal-test | 3 | 100.00% | 33.33% | 33.33% |
+| priority-queue | 5 | 100.00% | 80.00% | 80.00% |
+| qsort-demo | 7 | 100.00% | 28.57% | 28.57% |
+| qsort-test | 5 | 100.00% | 80.00% | 80.00% |
+| quaternions | 4 | 100.00% | 0.00% | 0.00% |
+| rabinkarp-search | 2 | 100.00% | 0.00% | 0.00% |
+| rand-test | 3 | 100.00% | 0.00% | 0.00% |
+| ransac | 2 | 100.00% | 50.00% | 0.00% |
+| regex-parser | 7 | 100.00% | 28.57% | 14.29% |
+| rho-factor | 3 | 100.00% | 66.67% | 66.67% |
+| rle-compress | 2 | 100.00% | 0.00% | 0.00% |
+| rsa-cipher | 4 | 100.00% | 0.00% | 0.00% |
+| sat-solver | 5 | 100.00% | 60.00% | 60.00% |
+| shortest-path | 3 | 100.00% | 66.67% | 66.67% |
+| sieve | 1 | 100.00% | 0.00% | 0.00% |
+| simple-grep | 1 | 100.00% | 0.00% | 0.00% |
+| spelt2num | 1 | 100.00% | 0.00% | 0.00% |
+| spirograph | 2 | 100.00% | 50.00% | 0.00% |
+| sudoku-solver | 4 | 100.00% | 50.00% | 50.00% |
+| tetris-sim | 12 | 100.00% | 75.00% | 58.33% |
+| tiny-NN | 4 | 100.00% | 25.00% | 25.00% |
+| topo-sort | 7 | 100.00% | 0.00% | 0.00% |
+| totient | 2 | 100.00% | 50.00% | 50.00% |
+| transcend | 1 | 100.00% | 0.00% | 0.00% |
+| uniquify | 1 | 100.00% | 0.00% | 0.00% |
+| vectors-3d | 8 | 100.00% | 12.50% | 0.00% |
+| verlet | 1 | 100.00% | 0.00% | 0.00% |
+| weekday | 2 | 100.00% | 0.00% | 0.00% |
+
+## Compilation Failures
+- ackermann/ackermann.c::main@0x1100
+- aes/aes.c::aes_decrypt@0x18c0
+- aes/aes.c::aes_encrypt@0x1780
+- aes/aes.c::inv_mix_columns@0x1640
+- aes/aes.c::inv_shift_rows@0x14f0
+- aes/aes.c::key_expansion@0x16d0
+- aes/aes.c::main@0x1100
+- aes/aes.c::mix_columns@0x1580
+- aes/aes.c::shift_rows@0x1480
+- anagram/anagram.c::BuildMask@0x14c0
+- anagram/anagram.c::BuildWord@0x17d0
+- anagram/anagram.c::DumpCandidates@0x19a0
+- anagram/anagram.c::DumpWords@0x1a30
+- anagram/anagram.c::FindAnagram@0x1a90
+- anagram/anagram.c::ReadDict@0x1360
+- anagram/anagram.c::main@0x1120
+- audio-codec/audio-codec.c::decode@0x1440
+- audio-codec/audio-codec.c::main@0x1100
+- avl-tree/avlcore.c::CheckTreeNodeRotation@0x1c30
+- avl-tree/element.c::Compare@0x1ad0
+- avl-tree/avlcore.c::DeleteByElement@0x2860
+- avl-tree/avlcore.c::DeleteByElementRecursive@0x26d0
+- avl-tree/avlcore.c::DeleteLeftMost@0x2610
+- avl-tree/avlcore.c::DoubleLeftRotation@0x1c00
+- avl-tree/avlcore.c::DoubleRightRotation@0x1bd0
+- avl-tree/avlcore.c::FindByElement@0x1b00
+- avl-tree/avlcore.c::Insert@0x1f30
+- avl-tree/avlcore.c::MakeEmpty@0x1f80
+- avl-tree/avl-tree.c::breadth@0x1760
+- avl-tree/avl-tree.c::main@0x1120
+- banner/banner.c::main@0x1120
+- bit-kernels/bit-kernels.c::main@0x1120
+- blake2b/blake2b.c::F@0x12a0
+- blake2b/blake2b.c::G@0x1230
+- blake2b/blake2b.c::blake2b@0x1620
+- blake2b/blake2b.c::test@0x19d0
+- bloom-filter/bloom-filter.c::bad_search@0x1430
+- bloom-filter/bloom-filter.c::main@0x1120
+- boyer-moore-search/boyer-moore-search.c::badCharHeuristic@0x15d0
+- boyer-moore-search/boyer-moore-search.c::main@0x1140
+- boyer-moore-search/boyer-moore-search.c::search@0x1630
+- c-interp/c-interp.c::eval@0x3e90
+- c-interp/c-interp.c::function_body@0x37f0
+- c-interp/c-interp.c::function_declaration@0x3a10
+- c-interp/c-interp.c::main@0x1120
+- c-interp/c-interp.c::next@0x1580
+- ccmac/ccmac.c::main@0x1120
+- checkers/functions.c::fill_print_initial@0x1630
+- checkers/functions.c::free_tree@0x2460
+- checkers/functions.c::generate_node_children@0x21c0
+- checkers/functions.c::link_new_node@0x20e0
+- checkers/checkers.c::main@0x1150
+- cipher/cipher.c::main@0x1100
+- congrad/congrad.c::main@0x1100
+- connect4-minimax/connect4-minimax.c::init_board@0x1230
+- connect4-minimax/connect4-minimax.c::main@0x1100
+- connect4-minimax/connect4-minimax.c::minimax@0x1840
+- connect4-minimax/connect4-minimax.c::play_game@0x1c90
+- connect4-minimax/connect4-minimax.c::score_position@0x1620
+- convex-hull/convex-hull.c::main@0x1100
+- dhrystone/dhrystone.c::PFunc_1@0x1970
+- dhrystone/dhrystone.c::PFunc_2@0x1990
+- dhrystone/dhrystone.c::PProc_8@0x1900
+- dhrystone/dhrystone.c::main@0x1100
+- distinctness/distinctness.c::isDistinct@0x12a0
+- distinctness/distinctness.c::main@0x1100
+- fft-int/fft-int.c::db_from_ampl@0x1670
+- fft-int/fft-int.c::fix_fft@0x1320
+- flood-fill/flood-fill.c::main@0x1100
+- frac-calc/frac-calc.c::avaliatokens@0x15f0
+- frac-calc/frac-calc.c::copyr@0x1460
+- frac-calc/frac-calc.c::divtokens@0x1840
+- frac-calc/frac-calc.c::help@0x13b0
+- frac-calc/frac-calc.c::main@0x1120
+- fuzzy-match/fuzzy-match.c::fuzzy_match_recurse@0x2360
+- fuzzy-match/fuzzy-match.c::main@0x2100
+- fy-shuffle/fy-shuffle.c::fy_shuffle@0x1440
+- fy-shuffle/fy-shuffle.c::main@0x1100
+- gcd-list/gcd-list.c::main@0x1120
+- grad-descent/grad-descent.c::derivateWRTBias@0x12d0
+- grad-descent/grad-descent.c::derivateWRTWeight@0x1270
+- grad-descent/grad-descent.c::main@0x1100
+- graph-tests/graph-tests.c::DFS_test@0x1c20
+- graph-tests/graph-tests.c::addEdge@0x1320
+- graph-tests/graph-tests.c::addVertex@0x1a50
+- graph-tests/graph-tests.c::bfs@0x1540
+- graph-tests/graph-tests.c::bfs_test@0x1720
+- graph-tests/graph-tests.c::bubbleSort@0x1880
+- graph-tests/graph-tests.c::createGraph@0x1260
+- graph-tests/graph-tests.c::createNode@0x1240
+- graph-tests/graph-tests.c::createQueue@0x1390
+- graph-tests/graph-tests.c::depthFirstSearch@0x1b20
+- graph-tests/graph-tests.c::dequeue@0x1430
+- graph-tests/graph-tests.c::enqueue@0x13e0
+- graph-tests/graph-tests.c::getAdjUnvisitedVertex@0x1ac0
+- graph-tests/graph-tests.c::insertAtTheBegin@0x1840
+- graph-tests/graph-tests.c::link_list@0x18e0
+- graph-tests/graph-tests.c::main@0x1120
+- graph-tests/graph-tests.c::printQueue@0x14c0
+- graph-tests/graph-tests.c::swap@0x1870
+- hanoi/hanoi.c::main@0x1100
+- heapsort/heapsort.c::main@0x1100
+- heat-calc/heat-calc.c::main@0x1100
+- huff-encode/huff-encode.c::main@0x1120
+- idct-alg/idct-alg.c::main@0x1100
+- indirect-test/indirect-test.c::main@0x1100
+- k-means/k-means.c::calculateNearst@0x1310
+- k-means/k-means.c::kMeans@0x1420
+- k-means/k-means.c::main@0x1120
+- k-means/k-means.c::printEPS@0x16b0
+- kadane/kadane.c::main@0x1100
+- kepler/kepler.c::J@0x1920
+- kepler/kepler.c::bin_fact@0x1740
+- kepler/kepler.c::binary@0x16a0
+- kepler/kepler.c::e_series@0x17e0
+- kepler/kepler.c::j_series@0x1a20
+- kepler/kepler.c::main@0x1100
+- knapsack/knapsack.c::main@0x1100
+- knapsack/knapsack.c::max@0x1310
+- knights-tour/knights-tour.c::solveKT@0x1390
+- knights-tour/knights-tour.c::solveKTUtil@0x14f0
+- life/life.c::getDown@0x16e0
+- life/life.c::getDownLeft@0x1770
+- life/life.c::getDownRight@0x17a0
+- life/life.c::getLeft@0x1650
+- life/life.c::getNumNeigbors@0x1390
+- life/life.c::getRight@0x1680
+- life/life.c::getUp@0x16b0
+- life/life.c::getUpLeft@0x1710
+- life/life.c::getUpRight@0x1740
+- life/life.c::main@0x1100
+- life/life.c::process@0x1550
+- longdiv/longdiv.c::main@0x1120
+- longdiv/longdiv.c::sbc@0x1a20
+- longdiv/longdiv.c::sub@0x19c0
+- lu-decomp/lu-decomp.c::main@0x1100
+- lu-decomp/lu-decomp.c::print_matrix@0x13a0
+- mandelbrot/mandelbrot.c::main@0x1100
+- matmult/matmult.c::main@0x1100
+- max-subseq/max-subseq.c::lcsAlgo@0x1290
+- max-subseq/max-subseq.c::main@0x1120
+- mersenne/mersenne.c::genrand@0x1310
+- mersenne/mersenne.c::main@0x1100
+- mersenne/mersenne.c::sgenrand@0x1290
+- minspan/minspan.c::displayGraph@0x14f0
+- minspan/minspan.c::displayGraph1@0x15f0
+- minspan/minspan.c::displayPath@0x1700
+- minspan/minspan.c::displayTree@0x17a0
+- minspan/minspan.c::main@0x1100
+- minspan/minspan.c::minSpanTree@0x12f0
+- monte-carlo/monte-carlo.c::main@0x1100
+- murmur-hash/murmur-hash.c::main@0x1100
+- murmur-hash/murmur-hash.c::murmurhash@0x1290
+- n-queens/n-queens.c::main@0x1120
+- natlog/natlog.c::main@0x1100
+- nbody-sim/nbody-sim.c::main@0x1100
+- packet-filter/packet-filter.c::check_packet_filter@0x1430
+- packet-filter/packet-filter.c::generate_packet@0x12d0
+- packet-filter/packet-filter.c::main@0x1100
+- packet-filter/packet-filter.c::print_packet@0x1490
+- parrondo/parrondo.c::main@0x1100
+- pascal/pascal.c::main@0x1100
+- pi-calc/pi-calc.c::main@0x1100
+- primal-test/primal-test.c::main@0x1100
+- primal-test/primal-test.c::miller_rabin_int@0x1510
+- priority-queue/priority-queue.c::main@0x1120
+- qsort-demo/qsort-demo.c::main@0x1120
+- qsort-demo/qsort-demo.c::print_struct_array@0x15c0
+- qsort-demo/qsort-demo.c::sort_cstrings_example@0x14a0
+- qsort-demo/qsort-demo.c::sort_integers_example@0x1310
+- qsort-demo/qsort-demo.c::sort_structs_example@0x1640
+- qsort-test/qsort-test.c::main@0x1120
+- quaternions/quaternions.c::euler_from_quat@0x1580
+- quaternions/quaternions.c::main@0x1100
+- quaternions/quaternions.c::quat_from_euler@0x13f0
+- quaternions/quaternions.c::quaternion_multiply@0x16b0
+- rabinkarp-search/rabinkarp-search.c::main@0x1120
+- rabinkarp-search/rabinkarp-search.c::search@0x13a0
+- rand-test/rand-test.c::bad_rand@0x1240
+- rand-test/rand-test.c::main@0x1100
+- rand-test/rand-test.c::run_tests@0x1280
+- ransac/ransac.c::main@0x1100
+- regex-parser/regex-parser.c::main@0x2100
+- regex-parser/regex-parser.c::matchcharclass@0x23b0
+- regex-parser/regex-parser.c::matchone@0x2560
+- regex-parser/regex-parser.c::re_compile@0x2930
+- regex-parser/regex-parser.c::re_print@0x2bf0
+- rho-factor/rho-factor.c::main@0x1120
+- rle-compress/rle-compress.c::main@0x1120
+- rle-compress/rle-compress.c::run_length_encode@0x1330
+- rsa-cipher/rsa-cipher.c::main@0x1100
+- rsa-cipher/rsa-cipher.c::mod_inverse@0x1670
+- rsa-cipher/rsa-cipher.c::mod_pow@0x1580
+- rsa-cipher/rsa-cipher.c::print_hex_int128@0x1790
+- sat-solver/sat-solver.c::main@0x1100
+- sat-solver/sat-solver.c::printFormula@0x1390
+- shortest-path/shortest-path.c::main@0x1100
+- sieve/sieve.c::main@0x1100
+- simple-grep/simple-grep.c::main@0x1120
+- spelt2num/spelt2num.c::main@0x1100
+- spirograph/spirograph.c::spirograph@0x1230
+- sudoku-solver/sudoku-solver.c::isSafe@0x1250
+- sudoku-solver/sudoku-solver.c::main@0x1100
+- tetris-sim/tetris-sim.c::best_move@0x1860
+- tetris-sim/tetris-sim.c::evaluate_board@0x1640
+- tetris-sim/tetris-sim.c::main@0x1120
+- tiny-NN/tiny-NN.c::main@0x1120
+- tiny-NN/tiny-NN.c::sampleSine@0x12d0
+- tiny-NN/tiny-NN.c::train@0x13e0
+- topo-sort/topo-sort.c::addEdge@0x1370
+- topo-sort/topo-sort.c::createGraph@0x1300
+- topo-sort/topo-sort.c::createListNode@0x12e0
+- topo-sort/topo-sort.c::createStackNode@0x12c0
+- topo-sort/topo-sort.c::main@0x1120
+- topo-sort/topo-sort.c::topologicalSort@0x1450
+- topo-sort/topo-sort.c::topologicalSortUtil@0x13c0
+- totient/totient.c::main@0x1100
+- transcend/transcend.c::main@0x1120
+- uniquify/uniquify.c::main@0x1120
+- vectors-3d/vectors-3d.c::get_cross_matrix@0x1760
+- vectors-3d/vectors-3d.c::main@0x1100
+- vectors-3d/vectors-3d.c::print_vector@0x1620
+- vectors-3d/vectors-3d.c::unit_vec@0x1690
+- vectors-3d/vectors-3d.c::vector_add@0x1550
+- vectors-3d/vectors-3d.c::vector_prod@0x15c0
+- vectors-3d/vectors-3d.c::vector_sub@0x1510
+- verlet/verlet.c::main@0x1100
+- weekday/weekday.c::dayOfWeek@0x1350
+- weekday/weekday.c::main@0x1100
+
+## Execution Failures
+- checkers/functions.c::all_possible_moves@0x1a60
+- cipher/cipher.c::decipher@0x1360
+- cipher/cipher.c::encipher@0x12f0
+- connect4-minimax/connect4-minimax.c::terminal_score@0x1800
+- gcd-list/gcd-list.c::gcd@0x1310
+- idct-alg/idct-alg.c::idct_2d@0x12f0
+- life/life.c::init@0x1220
+- ransac/ransac.c::ransac_line_fitting@0x1410
+- regex-parser/regex-parser.c::matchpattern@0x2670
+- spirograph/spirograph.c::test@0x1390
+- tetris-sim/tetris-sim.c::clear_lines@0x1480
+- tetris-sim/tetris-sim.c::simulate_board@0x17c0
+- vectors-3d/vectors-3d.c::get_angle@0x17d0
--- a/sk2decompile/evaluation/bringupbench/reports/O3_results.md
+++ b/sk2decompile/evaluation/bringupbench/reports/O3_results.md
@ -0,0 +1,355 @@
+# Infer-Out Model 2 Evaluation (merged.O3.func_map.infer-host)
+
+- Timestamp: 20251119-171533
+- Source JSONL: merged.O3.func_map.infer.jsonl
+- Target: host
+- Total cases: 359
+- Replacement success: 359 (100.00%)
+- Compilable: 114 (31.75%)
+- Executable: 106 (29.53%)
+
+## Benchmark Breakdown
+| Benchmark | Cases | Replacement% | Build% | Exec% |
+| --- | --- | --- | --- | --- |
+| ackermann | 2 | 100.00% | 50.00% | 50.00% |
+| aes | 11 | 100.00% | 27.27% | 27.27% |
+| anagram | 13 | 100.00% | 38.46% | 38.46% |
+| audio-codec | 3 | 100.00% | 33.33% | 33.33% |
+| avl-tree | 15 | 100.00% | 13.33% | 13.33% |
+| banner | 1 | 100.00% | 0.00% | 0.00% |
+| bit-kernels | 3 | 100.00% | 66.67% | 66.67% |
+| blake2b | 3 | 100.00% | 0.00% | 0.00% |
+| bloom-filter | 4 | 100.00% | 25.00% | 25.00% |
+| boyer-moore-search | 3 | 100.00% | 0.00% | 0.00% |
+| bubble-sort | 3 | 100.00% | 100.00% | 100.00% |
+| c-interp | 10 | 100.00% | 40.00% | 40.00% |
+| ccmac | 1 | 100.00% | 0.00% | 0.00% |
+| checkers | 13 | 100.00% | 61.54% | 61.54% |
+| cipher | 3 | 100.00% | 33.33% | 0.00% |
+| congrad | 1 | 100.00% | 0.00% | 0.00% |
+| connect4-minimax | 11 | 100.00% | 45.45% | 45.45% |
+| convex-hull | 4 | 100.00% | 50.00% | 50.00% |
+| dhrystone | 5 | 100.00% | 40.00% | 40.00% |
+| distinctness | 2 | 100.00% | 0.00% | 0.00% |
+| fft-int | 4 | 100.00% | 0.00% | 0.00% |
+| flood-fill | 2 | 100.00% | 50.00% | 50.00% |
+| frac-calc | 9 | 100.00% | 22.22% | 22.22% |
+| fuzzy-match | 3 | 100.00% | 33.33% | 33.33% |
+| fy-shuffle | 3 | 100.00% | 33.33% | 33.33% |
+| gcd-list | 2 | 100.00% | 0.00% | 0.00% |
+| grad-descent | 4 | 100.00% | 0.00% | 0.00% |
+| graph-tests | 19 | 100.00% | 5.26% | 5.26% |
+| hanoi | 1 | 100.00% | 0.00% | 0.00% |
+| heapsort | 2 | 100.00% | 0.00% | 0.00% |
+| heat-calc | 1 | 100.00% | 0.00% | 0.00% |
+| huff-encode | 12 | 100.00% | 83.33% | 83.33% |
+| idct-alg | 3 | 100.00% | 66.67% | 33.33% |
+| indirect-test | 2 | 100.00% | 50.00% | 50.00% |
+| k-means | 5 | 100.00% | 0.00% | 0.00% |
+| kadane | 2 | 100.00% | 50.00% | 50.00% |
+| kepler | 7 | 100.00% | 14.29% | 14.29% |
+| knapsack | 3 | 100.00% | 33.33% | 33.33% |
+| knights-tour | 3 | 100.00% | 33.33% | 33.33% |
+| life | 14 | 100.00% | 21.43% | 14.29% |
+| longdiv | 7 | 100.00% | 71.43% | 71.43% |
+| lu-decomp | 3 | 100.00% | 33.33% | 33.33% |
+| lz-compress | 2 | 100.00% | 100.00% | 100.00% |
+| mandelbrot | 1 | 100.00% | 0.00% | 0.00% |
+| max-subseq | 2 | 100.00% | 0.00% | 0.00% |
+| mersenne | 4 | 100.00% | 0.00% | 0.00% |
+| minspan | 8 | 100.00% | 25.00% | 25.00% |
+| monte-carlo | 1 | 100.00% | 0.00% | 0.00% |
+| murmur-hash | 2 | 100.00% | 0.00% | 0.00% |
+| n-queens | 3 | 100.00% | 66.67% | 66.67% |
+| natlog | 1 | 100.00% | 0.00% | 0.00% |
+| nbody-sim | 1 | 100.00% | 0.00% | 0.00% |
+| nr-solver | 1 | 100.00% | 100.00% | 100.00% |
+| packet-filter | 4 | 100.00% | 0.00% | 0.00% |
+| parrondo | 2 | 100.00% | 50.00% | 50.00% |
+| pascal | 3 | 100.00% | 66.67% | 66.67% |
+| pi-calc | 1 | 100.00% | 0.00% | 0.00% |
+| primal-test | 3 | 100.00% | 66.67% | 66.67% |
+| priority-queue | 5 | 100.00% | 40.00% | 40.00% |
+| qsort-demo | 7 | 100.00% | 28.57% | 28.57% |
+| qsort-test | 5 | 100.00% | 80.00% | 80.00% |
+| quaternions | 4 | 100.00% | 0.00% | 0.00% |
+| rabinkarp-search | 2 | 100.00% | 0.00% | 0.00% |
+| rand-test | 3 | 100.00% | 0.00% | 0.00% |
+| ransac | 2 | 100.00% | 50.00% | 0.00% |
+| regex-parser | 8 | 100.00% | 25.00% | 25.00% |
+| rho-factor | 1 | 100.00% | 100.00% | 100.00% |
+| rle-compress | 2 | 100.00% | 0.00% | 0.00% |
+| rsa-cipher | 4 | 100.00% | 0.00% | 0.00% |
+| sat-solver | 5 | 100.00% | 60.00% | 40.00% |
+| shortest-path | 3 | 100.00% | 33.33% | 33.33% |
+| sieve | 1 | 100.00% | 0.00% | 0.00% |
+| simple-grep | 1 | 100.00% | 0.00% | 0.00% |
+| spelt2num | 1 | 100.00% | 0.00% | 0.00% |
+| spirograph | 2 | 100.00% | 50.00% | 0.00% |
+| sudoku-solver | 4 | 100.00% | 75.00% | 75.00% |
+| tetris-sim | 12 | 100.00% | 58.33% | 50.00% |
+| tiny-NN | 4 | 100.00% | 25.00% | 25.00% |
+| topo-sort | 7 | 100.00% | 0.00% | 0.00% |
+| totient | 2 | 100.00% | 50.00% | 50.00% |
+| transcend | 1 | 100.00% | 0.00% | 0.00% |
+| uniquify | 1 | 100.00% | 0.00% | 0.00% |
+| vectors-3d | 8 | 100.00% | 12.50% | 0.00% |
+| verlet | 1 | 100.00% | 0.00% | 0.00% |
+| weekday | 2 | 100.00% | 0.00% | 0.00% |
+
+## Compilation Failures
+- ackermann/ackermann.c::main@0x1100
+- aes/aes.c::add_round_key@0x1810
+- aes/aes.c::aes_decrypt@0x2760
+- aes/aes.c::aes_encrypt@0x2200
+- aes/aes.c::inv_shift_rows@0x1af0
+- aes/aes.c::key_expansion@0x1ff0
+- aes/aes.c::main@0x1100
+- aes/aes.c::mix_columns@0x1bd0
+- aes/aes.c::shift_rows@0x1a30
+- anagram/anagram.c::BuildMask@0x1620
+- anagram/anagram.c::BuildWord@0x1940
+- anagram/anagram.c::DumpCandidates@0x1c10
+- anagram/anagram.c::DumpWords@0x1ca0
+- anagram/anagram.c::FindAnagram@0x1d00
+- anagram/anagram.c::ReadDict@0x14c0
+- anagram/anagram.c::SortCandidates@0x1f10
+- anagram/anagram.c::main@0x1120
+- audio-codec/audio-codec.c::decode@0x1590
+- audio-codec/audio-codec.c::main@0x1100
+- avl-tree/avlcore.c::CheckTreeNodeRotation@0x1c50
+- avl-tree/element.c::Compare@0x1af0
+- avl-tree/avlcore.c::DeleteByElement@0x2e50
+- avl-tree/avlcore.c::DeleteByElementRecursive@0x2bf0
+- avl-tree/avlcore.c::DeleteLeftMost@0x2720
+- avl-tree/avlcore.c::DoubleLeftRotation@0x1c20
+- avl-tree/avlcore.c::DoubleRightRotation@0x1bf0
+- avl-tree/avlcore.c::FindByElement@0x1b20
+- avl-tree/avlcore.c::Insert@0x1f40
+- avl-tree/avlcore.c::InsertNode@0x1e10
+- avl-tree/avlcore.c::MakeEmpty@0x2090
+- avl-tree/avl-tree.c::breadth@0x1780
+- avl-tree/avl-tree.c::main@0x1120
+- banner/banner.c::main@0x1120
+- bit-kernels/bit-kernels.c::main@0x1120
+- blake2b/blake2b.c::F@0x12e0
+- blake2b/blake2b.c::blake2b@0x17b0
+- blake2b/blake2b.c::test@0x1b50
+- bloom-filter/bloom-filter.c::bad_search@0x1450
+- bloom-filter/tinybloom.c::bfilter_intersect@0x1570
+- bloom-filter/bloom-filter.c::main@0x1120
+- boyer-moore-search/boyer-moore-search.c::badCharHeuristic@0x15d0
+- boyer-moore-search/boyer-moore-search.c::main@0x1140
+- boyer-moore-search/boyer-moore-search.c::search@0x1630
+- c-interp/c-interp.c::enum_declaration@0x34f0
+- c-interp/c-interp.c::eval@0x3ea0
+- c-interp/c-interp.c::function_body@0x37f0
+- c-interp/c-interp.c::function_declaration@0x3a10
+- c-interp/c-interp.c::main@0x1120
+- c-interp/c-interp.c::next@0x15a0
+- ccmac/ccmac.c::main@0x1120
+- checkers/functions.c::fill_print_initial@0x18e0
+- checkers/functions.c::free_tree@0x6210
+- checkers/functions.c::generate_node_children@0x35d0
+- checkers/functions.c::link_new_node@0x34c0
+- checkers/checkers.c::main@0x1130
+- cipher/cipher.c::encipher@0x12f0
+- cipher/cipher.c::main@0x1100
+- congrad/congrad.c::main@0x1100
+- connect4-minimax/connect4-minimax.c::board_full@0x1500
+- connect4-minimax/connect4-minimax.c::evaluate_window@0x2380
+- connect4-minimax/connect4-minimax.c::init_board@0x1230
+- connect4-minimax/connect4-minimax.c::main@0x1100
+- connect4-minimax/connect4-minimax.c::minimax@0x3c30
+- connect4-minimax/connect4-minimax.c::play_game@0x4260
+- convex-hull/convex-hull.c::main@0x1100
+- convex-hull/convex-hull.c::sortPoints@0x1740
+- dhrystone/dhrystone.c::PFunc_1@0x1980
+- dhrystone/dhrystone.c::PProc_8@0x1910
+- dhrystone/dhrystone.c::main@0x1100
+- distinctness/distinctness.c::isDistinct@0x12a0
+- distinctness/distinctness.c::main@0x1100
+- fft-int/fft-int.c::db_from_ampl@0x1c50
+- fft-int/fft-int.c::fix_fft@0x1370
+- fft-int/fft-int.c::fix_loud@0x1a90
+- fft-int/fft-int.c::window@0x1650
+- flood-fill/flood-fill.c::main@0x1100
+- frac-calc/frac-calc.c::avaliatokens@0x1730
+- frac-calc/frac-calc.c::copyr@0x1550
+- frac-calc/frac-calc.c::divtokens@0x1980
+- frac-calc/frac-calc.c::help@0x14a0
+- frac-calc/frac-calc.c::main@0x1120
+- frac-calc/frac-calc.c::misto@0x1610
+- frac-calc/frac-calc.c::simplifica@0x28f0
+- fuzzy-match/fuzzy-match.c::fuzzy_match_recurse@0x23e0
+- fuzzy-match/fuzzy-match.c::main@0x2100
+- fy-shuffle/fy-shuffle.c::fy_shuffle@0x1440
+- fy-shuffle/fy-shuffle.c::main@0x1100
+- gcd-list/gcd-list.c::gcd@0x1310
+- gcd-list/gcd-list.c::main@0x1120
+- grad-descent/grad-descent.c::derivateWRTBias@0x12e0
+- grad-descent/grad-descent.c::derivateWRTWeight@0x1270
+- grad-descent/grad-descent.c::gradientDescent@0x1350
+- grad-descent/grad-descent.c::main@0x1100
+- graph-tests/graph-tests.c::DFS_test@0x2340
+- graph-tests/graph-tests.c::addEdge@0x1610
+- graph-tests/graph-tests.c::addVertex@0x1f80
+- graph-tests/graph-tests.c::bfs@0x1830
+- graph-tests/graph-tests.c::bfs_test@0x1a70
+- graph-tests/graph-tests.c::bubbleSort@0x1db0
+- graph-tests/graph-tests.c::createGraph@0x1550
+- graph-tests/graph-tests.c::createNode@0x1530
+- graph-tests/graph-tests.c::createQueue@0x1680
+- graph-tests/graph-tests.c::depthFirstSearch@0x2110
+- graph-tests/graph-tests.c::dequeue@0x1720
+- graph-tests/graph-tests.c::enqueue@0x16d0
+- graph-tests/graph-tests.c::insertAtTheBegin@0x1d70
+- graph-tests/graph-tests.c::link_list@0x1e20
+- graph-tests/graph-tests.c::main@0x1180
+- graph-tests/graph-tests.c::printQueue@0x17b0
+- graph-tests/graph-tests.c::swap@0x1da0
+- graph-tests/graph-tests.c::towers@0x2490
+- hanoi/hanoi.c::main@0x1100
+- heapsort/heapsort.c::HSORT@0x12f0
+- heapsort/heapsort.c::main@0x11a0
+- heat-calc/heat-calc.c::main@0x1100
+- huff-encode/huff-encode.c::buildHuffmanTree@0x18b0
+- huff-encode/huff-encode.c::main@0x1120
+- idct-alg/idct-alg.c::main@0x1100
+- indirect-test/indirect-test.c::main@0x1100
+- k-means/k-means.c::calculateCentroid@0x1390
+- k-means/k-means.c::calculateNearst@0x1310
+- k-means/k-means.c::kMeans@0x1400
+- k-means/k-means.c::main@0x1120
+- k-means/k-means.c::printEPS@0x16c0
+- kadane/kadane.c::main@0x1100
+- kepler/kepler.c::J@0x1b80
+- kepler/kepler.c::bin_fact@0x1ad0
+- kepler/kepler.c::binary@0x16a0
+- kepler/kepler.c::e_series@0x1740
+- kepler/kepler.c::j_series@0x1920
+- kepler/kepler.c::main@0x1100
+- knapsack/knapsack.c::main@0x1100
+- knapsack/knapsack.c::max@0x1310
+- knights-tour/knights-tour.c::solveKT@0x1830
+- knights-tour/knights-tour.c::solveKTUtil@0x1980
+- life/life.c::getDown@0x1960
+- life/life.c::getDownLeft@0x19f0
+- life/life.c::getDownRight@0x1a20
+- life/life.c::getLeft@0x18d0
+- life/life.c::getNumNeigbors@0x16d0
+- life/life.c::getRight@0x1900
+- life/life.c::getUp@0x1930
+- life/life.c::getUpLeft@0x1990
+- life/life.c::getUpRight@0x19c0
+- life/life.c::main@0x1100
+- life/life.c::process@0x1430
+- longdiv/longdiv.c::main@0x1120
+- longdiv/longdiv.c::sub@0x1a80
+- lu-decomp/lu-decomp.c::main@0x1100
+- lu-decomp/lu-decomp.c::print_matrix@0x1320
+- mandelbrot/mandelbrot.c::main@0x1100
+- max-subseq/max-subseq.c::lcsAlgo@0x1290
+- max-subseq/max-subseq.c::main@0x1120
+- mersenne/mersenne.c::genrand@0x1380
+- mersenne/mersenne.c::lsgenrand@0x1320
+- mersenne/mersenne.c::main@0x1100
+- mersenne/mersenne.c::sgenrand@0x12d0
+- minspan/minspan.c::displayGraph@0x1db0
+- minspan/minspan.c::displayGraph1@0x1ee0
+- minspan/minspan.c::displayPath@0x2020
+- minspan/minspan.c::displayTree@0x20c0
+- minspan/minspan.c::main@0x1100
+- minspan/minspan.c::minSpanTree@0x1400
+- monte-carlo/monte-carlo.c::main@0x1100
+- murmur-hash/murmur-hash.c::main@0x1100
+- murmur-hash/murmur-hash.c::murmurhash@0x1290
+- n-queens/n-queens.c::main@0x1120
+- natlog/natlog.c::main@0x1100
+- nbody-sim/nbody-sim.c::main@0x1100
+- packet-filter/packet-filter.c::check_packet_filter@0x1520
+- packet-filter/packet-filter.c::generate_packet@0x13d0
+- packet-filter/packet-filter.c::main@0x1100
+- packet-filter/packet-filter.c::print_packet@0x1580
+- parrondo/parrondo.c::main@0x1100
+- pascal/pascal.c::main@0x1100
+- pi-calc/pi-calc.c::main@0x1100
+- primal-test/primal-test.c::main@0x1100
+- priority-queue/priority-queue.c::main@0x1120
+- priority-queue/priority-queue.c::newNode@0x13a0
+- priority-queue/priority-queue.c::push@0x1420
+- qsort-demo/qsort-demo.c::main@0x1120
+- qsort-demo/qsort-demo.c::print_struct_array@0x15b0
+- qsort-demo/qsort-demo.c::sort_cstrings_example@0x1480
+- qsort-demo/qsort-demo.c::sort_integers_example@0x1310
+- qsort-demo/qsort-demo.c::sort_structs_example@0x1630
+- qsort-test/qsort-test.c::main@0x1120
+- quaternions/quaternions.c::euler_from_quat@0x1550
+- quaternions/quaternions.c::main@0x1100
+- quaternions/quaternions.c::quat_from_euler@0x13e0
+- quaternions/quaternions.c::quaternion_multiply@0x1670
+- rabinkarp-search/rabinkarp-search.c::main@0x1120
+- rabinkarp-search/rabinkarp-search.c::search@0x15a0
+- rand-test/rand-test.c::bad_rand@0x1240
+- rand-test/rand-test.c::main@0x1100
+- rand-test/rand-test.c::run_tests@0x1280
+- ransac/ransac.c::main@0x1100
+- regex-parser/regex-parser.c::main@0x2100
+- regex-parser/regex-parser.c::matchcharclass@0x2420
+- regex-parser/regex-parser.c::matchone@0x25c0
+- regex-parser/regex-parser.c::matchpattern@0x26d0
+- regex-parser/regex-parser.c::re_compile@0x2ac0
+- regex-parser/regex-parser.c::re_print@0x2e30
+- rle-compress/rle-compress.c::main@0x1120
+- rle-compress/rle-compress.c::run_length_encode@0x1330
+- rsa-cipher/rsa-cipher.c::main@0x1100
+- rsa-cipher/rsa-cipher.c::mod_inverse@0x15a0
+- rsa-cipher/rsa-cipher.c::mod_pow@0x14b0
+- rsa-cipher/rsa-cipher.c::print_hex_int128@0x16c0
+- sat-solver/sat-solver.c::main@0x1100
+- sat-solver/sat-solver.c::printFormula@0x1680
+- shortest-path/shortest-path.c::floydWarshall@0x1330
+- shortest-path/shortest-path.c::main@0x1100
+- sieve/sieve.c::main@0x1100
+- simple-grep/simple-grep.c::main@0x1120
+- spelt2num/spelt2num.c::main@0x1100
+- spirograph/spirograph.c::spirograph@0x1230
+- sudoku-solver/sudoku-solver.c::main@0x1100
+- tetris-sim/tetris-sim.c::aggregate_height@0x1b20
+- tetris-sim/tetris-sim.c::best_move@0x21d0
+- tetris-sim/tetris-sim.c::count_holes@0x1b70
+- tetris-sim/tetris-sim.c::evaluate_board@0x1ca0
+- tetris-sim/tetris-sim.c::main@0x1100
+- tiny-NN/tiny-NN.c::main@0x1120
+- tiny-NN/tiny-NN.c::sampleSine@0x12d0
+- tiny-NN/tiny-NN.c::train@0x13e0
+- topo-sort/topo-sort.c::addEdge@0x13f0
+- topo-sort/topo-sort.c::createGraph@0x1380
+- topo-sort/topo-sort.c::createListNode@0x1360
+- topo-sort/topo-sort.c::createStackNode@0x1340
+- topo-sort/topo-sort.c::main@0x1120
+- topo-sort/topo-sort.c::topologicalSort@0x18b0
+- topo-sort/topo-sort.c::topologicalSortUtil@0x1440
+- totient/totient.c::main@0x1100
+- transcend/transcend.c::main@0x1120
+- uniquify/uniquify.c::main@0x1120
+- vectors-3d/vectors-3d.c::get_cross_matrix@0x1850
+- vectors-3d/vectors-3d.c::main@0x1100
+- vectors-3d/vectors-3d.c::print_vector@0x1730
+- vectors-3d/vectors-3d.c::unit_vec@0x17a0
+- vectors-3d/vectors-3d.c::vector_add@0x1650
+- vectors-3d/vectors-3d.c::vector_prod@0x16b0
+- vectors-3d/vectors-3d.c::vector_sub@0x1620
+- verlet/verlet.c::main@0x1100
+- weekday/weekday.c::dayOfWeek@0x1290
+- weekday/weekday.c::main@0x1100
+
+## Execution Failures
+- cipher/cipher.c::decipher@0x1360
+- idct-alg/idct-alg.c::idct_2d@0x12f0
+- life/life.c::init@0x12c0
+- ransac/ransac.c::ransac_line_fitting@0x1410
+- sat-solver/sat-solver.c::solveSAT@0x13a0
+- spirograph/spirograph.c::test@0x1390
+- tetris-sim/tetris-sim.c::clear_lines@0x19a0
+- vectors-3d/vectors-3d.c::get_angle@0x18c0
--- a/sk2decompile/evaluation/bringupbench/scripts/build-func-maps.py
+++ b/sk2decompile/evaluation/bringupbench/scripts/build-func-maps.py
@ -0,0 +1,493 @@
+#!/usr/bin/env python3
+"""Generate function-level mappings across source, pseudo, and assembly outputs."""
+
+from __future__ import annotations
+
+import argparse
+import json
+import os
+import re
+import sys
+from pathlib import Path
+from typing import Dict, List, Optional
+import subprocess
+
+FUNC_KEYWORDS = {"if", "for", "while", "switch", "return", "sizeof", "do", "case", "else"}
+
+TYPEDEF_MAP = {
+    "cpu_set_t": "int",
+    "nl_item": "int",
+    "__time_t": "int",
+    "__mode_t": "unsigned short",
+    "__off64_t": "long long",
+    "__blksize_t": "long",
+    "__ino_t": "unsigned long",
+    "__blkcnt_t": "unsigned long long",
+    "__syscall_slong_t": "long",
+    "__ssize_t": "long int",
+    "wchar_t": "unsigned short int",
+    "wctype_t": "unsigned short int",
+    "__int64": "long long",
+    "__int32": "int",
+    "__int16": "short",
+    "__int8": "char",
+    "_QWORD": "uint64_t",
+    "_OWORD": "long double",
+    "_DWORD": "uint32_t",
+    "size_t": "unsigned int",
+    "_BYTE": "uint8_t",
+    "_TBYTE": "uint16_t",
+    "_BOOL8": "uint8_t",
+    "gcc_va_list": "va_list",
+    "_WORD": "unsigned short",
+    "_BOOL4": "int",
+    "__va_list_tag": "va_list",
+    "_IO_FILE": "FILE",
+    "DIR": "int",
+    "__fsword_t": "long",
+    "__kernel_ulong_t": "int",
+    "cc_t": "int",
+    "speed_t": "int",
+    "fd_set": "int",
+    "__suseconds_t": "int",
+    "_UNKNOWN": "void",
+    "__sighandler_t": "void (*)(int)",
+    "__compar_fn_t": "int (*)(const void *, const void *)",
+}
+
+
+def _load_config_env() -> dict:
+    """Load config.env from the eval project root."""
+    eval_root = Path(__file__).resolve().parents[1]
+    config_path = eval_root / "config.env"
+    config = {}
+    if config_path.exists():
+        for line in config_path.read_text().splitlines():
+            line = line.strip()
+            if not line or line.startswith("#"):
+                continue
+            if "=" in line:
+                key, _, value = line.partition("=")
+                config[key.strip()] = value.strip()
+    return config
+
+
+def _get_bench_root(cli_value: str | None = None) -> Path:
+    """Resolve the benchmark repo root from CLI arg, env var, or config.env."""
+    if cli_value:
+        return Path(cli_value).resolve()
+    env_val = os.environ.get("BENCH_REPO_ROOT")
+    if env_val:
+        return Path(env_val).resolve()
+    config = _load_config_env()
+    if "BENCH_REPO_ROOT" in config:
+        return Path(config["BENCH_REPO_ROOT"]).resolve()
+    sys.exit("error: BENCH_REPO_ROOT not set. Use --bench-root, set the env var, or configure config.env")
+
+
+def _read_text(path: Path) -> str:
+    return path.read_text(encoding="utf-8")
+
+
+def _strip_empty(code: str) -> str:
+    return "\n".join(line for line in code.splitlines() if line.strip())
+
+
+def _good_func(func: str) -> bool:
+    body = "{".join(func.split("{", 1)[1:]) if "{" in func else func
+    total = 0
+    for line in body.splitlines():
+        if len(line.strip()) >= 3:
+            total += 1
+    return 3 < total < 300
+
+
+def _format_with_clang(func: str, style: str = "Google") -> Optional[str]:
+    if not func:
+        return None
+    cmd = ["clang-format", f"--style={style}"]
+    try:
+        proc = subprocess.run(
+            cmd,
+            input=func,
+            text=True,
+            capture_output=True,
+            check=True,
+            timeout=15,
+        )
+        return proc.stdout
+    except Exception as e:
+        print(e)
+        return None
+
+
+def _hex_to_dec(text: str) -> str:
+    pattern = re.compile(r"\b(0x[0-9a-fA-F]+)([uUlL]{1,3})?\b")
+
+    def convert(match: re.Match[str]) -> str:
+        hex_part = match.group(1)
+        suffix = match.group(2) or ""
+        return str(int(hex_part, 16)) + suffix
+
+    return pattern.sub(convert, text)
+
+
+def _remove_keywords(text: str) -> str:
+    patterns = [
+        r"\b__fastcall\b",
+        r"\b__cdecl\b",
+        r"\b__ptr32\b",
+        r"\b__noreturn\s+noreturn\b",
+    ]
+    combined = re.compile("|".join(patterns))
+    return combined.sub("", text)
+
+def _replace_typedefs(text: str) -> str:
+    for alias, original in TYPEDEF_MAP.items():
+        pattern = re.compile(rf"\b{re.escape(alias)}\b")
+        text = pattern.sub(original, text)
+    return text
+
+
+def _remove_comments(text: str) -> str:
+    text = re.sub(r"/\*.*?\*/", "", text, flags=re.DOTALL)
+    text = re.sub(r"//.*?$", "", text, flags=re.MULTILINE)
+    return text
+
+
+def _process_code(code_str: str) -> str:
+    code_str = _remove_comments(code_str)
+    code_str = _hex_to_dec(code_str)
+    code_str = _remove_keywords(code_str)
+    code_str = _replace_typedefs(code_str)
+    return code_str
+
+
+def _normalize_pseudo(text: str) -> str:
+    processed = _process_code(text)
+    if not processed.strip():
+        return ""
+    formatted = _format_with_clang(processed)
+    if formatted is None:
+        return ""
+    cleaned = _strip_empty(formatted)
+    if not cleaned or not _good_func(cleaned):
+        return ""
+    return cleaned
+
+
+def _strip_comments_and_strings(text: str) -> str:
+    result = list(text)
+    i = 0
+    length = len(text)
+    while i < length:
+        nxt = text[i : i + 2]
+        ch = text[i]
+        if nxt == "//":
+            end = text.find("\n", i)
+            if end == -1:
+                end = length
+            for j in range(i, end):
+                result[j] = " "
+            i = end
+            continue
+        if nxt == "/*":
+            end = text.find("*/", i + 2)
+            if end == -1:
+                end = length - 2
+            for j in range(i, end + 2):
+                result[j] = " "
+            i = end + 2
+            continue
+        if ch in {'"', "'"}:
+            quote = ch
+            result[i] = " "
+            i += 1
+            while i < length:
+                c = text[i]
+                result[i] = " "
+                if c == "\\":
+                    i += 2
+                    continue
+                if c == quote:
+                    i += 1
+                    break
+                i += 1
+            continue
+        i += 1
+    return "".join(result)
+
+def _find_matching_brace(text: str, start_idx: int) -> int:
+    depth = 0
+    i = start_idx
+    length = len(text)
+    while i < length:
+        nxt = text[i : i + 2]
+        ch = text[i]
+        if nxt == "//":
+            i = text.find("\n", i)
+            if i == -1:
+                return length - 1
+            continue
+        if nxt == "/*":
+            i = text.find("*/", i + 2)
+            if i == -1:
+                return length - 1
+            i += 2
+            continue
+        if ch in {'"', "'"}:
+            quote = ch
+            i += 1
+            while i < length:
+                c = text[i]
+                if c == "\\":
+                    i += 2
+                    continue
+                if c == quote:
+                    i += 1
+                    break
+                i += 1
+            continue
+        if ch == "{":
+            depth += 1
+        elif ch == "}":
+            depth -= 1
+            if depth == 0:
+                return i
+        i += 1
+    return length - 1
+
+
+def _extract_source_functions(path: Path, repo_root: Path) -> Dict[str, Dict[str, str]]:
+    text = _read_text(path)
+    sanitized = _strip_comments_and_strings(text)
+    pattern = re.compile(
+        r"(?P<prefix>^|[;\n}])(?P<signature>[^{;}]*?)\b(?P<name>[A-Za-z_][\w]*)\s*\([^;{}]*\)\s*\{",
+        re.MULTILINE,
+    )
+    funcs: Dict[str, Dict[str, str]] = {}
+    for match in pattern.finditer(sanitized):
+        name = match.group("name")
+        if name in FUNC_KEYWORDS:
+            continue
+        brace_idx = sanitized.find("{", match.start("signature"))
+        if brace_idx == -1:
+            continue
+        end_idx = _find_matching_brace(text, brace_idx)
+        if end_idx <= brace_idx:
+            continue
+        start_idx = match.start("signature")
+        content = text[start_idx : end_idx + 1].strip("\n") + "\n"
+        funcs.setdefault(
+            name,
+            {
+                "path": str(path.relative_to(repo_root)),
+                "function_name": name,
+                "content": content,
+            },
+        )
+    return funcs
+
+def _parse_makefile(makefile: Path) -> List[Path]:
+    text = _read_text(makefile)
+    prog_match = re.search(r"^PROG\s*=\s*(\S+)", text, flags=re.MULTILINE)
+    if not prog_match:
+        raise RuntimeError(f"PROG not found in {makefile}")
+    prog = prog_match.group(1).strip()
+    objs_match = re.search(r"^LOCAL_OBJS\s*=\s*(.*)$", text, flags=re.MULTILINE)
+    obj_tokens: List[str] = []
+    if objs_match:
+        obj_tokens = [token for token in objs_match.group(1).split() if token]
+    if not obj_tokens:
+        obj_tokens = [f"{prog}.o"]
+    src_paths: List[Path] = []
+    for token in obj_tokens:
+        if not token.endswith(".o"):
+            continue
+        candidate = makefile.parent / token.replace(".o", ".c")
+        if candidate.exists():
+            src_paths.append(candidate)
+    if not src_paths:
+        fallback = makefile.parent / f"{prog}.c"
+        if fallback.exists():
+            src_paths.append(fallback)
+    return src_paths
+
+
+def _collect_source_functions(bench_dir: Path, repo_root: Path) -> Dict[str, Dict[str, str]]:
+    makefile = bench_dir / "Makefile"
+    srcs = _parse_makefile(makefile)
+    func_map: Dict[str, Dict[str, str]] = {}
+    for src in srcs:
+        func_map.update(_extract_source_functions(src, repo_root))
+    return func_map
+
+
+def _parse_pseudo(pseudo_path: Path, repo_root: Path) -> Dict[str, Dict[str, str]]:
+    text = _read_text(pseudo_path)
+    lines = text.splitlines()
+    pattern = re.compile(r"^/\*\s*(?P<name>[^@]+?)\s*@\s*(?P<addr>0x[0-9a-fA-F]+)\s*\*/$")
+    current: Optional[str] = None
+    current_addr: Optional[str] = None
+    buffer: List[str] = []
+    out: Dict[str, Dict[str, str]] = {}
+    for raw_line in lines:
+        line = raw_line.strip()
+        match = pattern.match(line)
+        if match:
+            if current and buffer:
+                content = "\n".join(buffer).strip("\n") + "\n"
+                out.setdefault(
+                    current,
+                    {
+                        "path": str(pseudo_path.relative_to(repo_root)),
+                        "function_name": current,
+                        "address": current_addr,
+                        "label": current,
+                        "content": content,
+                    },
+                )
+            current = match.group("name").strip()
+            current_addr = match.group("addr")
+            buffer = []
+        else:
+            if current is not None:
+                buffer.append(raw_line)
+    if current and buffer:
+        content = "\n".join(buffer).strip("\n") + "\n"
+        out.setdefault(
+            current,
+            {
+                "path": str(pseudo_path.relative_to(repo_root)),
+                "function_name": current,
+                "address": current_addr,
+                "label": current,
+                "content": content,
+            },
+        )
+    return out
+
+def _clean_instruction(raw: str) -> Optional[str]:
+    stripped = raw.strip()
+    if not stripped:
+        return None
+    parts = raw.split("\t")
+    if len(parts) >= 3:
+        relevant = parts[2:]
+    elif len(parts) == 2:
+        relevant = parts[1:]
+    else:
+        relevant = [stripped]
+    instr = "\t".join(relevant)
+    instr = instr.split("#")[0].strip()
+    if not instr:
+        return None
+    if all(c in "0123456789abcdefABCDEF" for c in instr.replace(" ", "")):
+        return None
+    return instr
+
+
+def _clean_asm_block(name: str, lines: List[str]) -> str:
+    cleaned = [f"<{name}>:"]
+    for raw in lines[1:]:
+        instr = _clean_instruction(raw)
+        if instr:
+            cleaned.append(instr)
+    return "\n".join(cleaned) + "\n"
+
+
+def _parse_assembly(asm_path: Path) -> Dict[str, str]:
+    lines = _read_text(asm_path).splitlines()
+    header = re.compile(r"^\s*([0-9a-fA-F]+)\s+<([^>]+)>:\s*$")
+    current: Optional[str] = None
+    buffer: List[str] = []
+    result: Dict[str, str] = {}
+    for line in lines:
+        match = header.match(line)
+        if match:
+            if current and buffer:
+                result.setdefault(current, _clean_asm_block(current, buffer))
+            current = match.group(2)
+            buffer = [line]
+        else:
+            if current is not None:
+                buffer.append(line)
+    if current and buffer:
+        result.setdefault(current, _clean_asm_block(current, buffer))
+    return result
+
+
+def _discover_binaries(explicit: Optional[List[str]], repo_root: Path) -> List[Path]:
+    if explicit:
+        binaries: List[Path] = []
+        for entry in explicit:
+            candidate = Path(entry)
+            if not candidate.is_absolute():
+                candidate = repo_root / candidate
+            if candidate.exists():
+                binaries.append(candidate)
+        return binaries
+    matches = []
+    for path in repo_root.rglob("*.O*"):
+        suffix = path.suffix.lower()
+        if suffix in {".o0", ".o1", ".o2", ".o3"}:
+            matches.append(path)
+    return sorted(matches)
+
+def _build_map(binary: Path, repo_root: Path) -> None:
+    pseudo_path = Path(str(binary) + ".pseudo")
+    asm_path = Path(str(binary) + ".s")
+    if not pseudo_path.exists() or not asm_path.exists():
+        print(f"[skip] Missing pseudo or assembly for {binary.relative_to(repo_root)}")
+        return
+    bench_dir = binary.parent
+    source_funcs = _collect_source_functions(bench_dir, repo_root)
+    pseudo_funcs = _parse_pseudo(pseudo_path, repo_root)
+    asm_funcs = _parse_assembly(asm_path)
+    common = sorted(set(source_funcs) & set(pseudo_funcs) & set(asm_funcs))
+    if not common:
+        print(f"[warn] No overlapping functions for {binary.relative_to(repo_root)}")
+        return
+    output_path = Path(str(binary) + ".func_map.jsonl")
+    rel_binary = str(binary.relative_to(repo_root))
+    with output_path.open("w", encoding="utf-8") as handle:
+        for name in common:
+            pseudo_entry = pseudo_funcs[name]
+            pseudo_norm = _normalize_pseudo(pseudo_entry.get("content", ""))
+            record = {
+                "source": source_funcs[name],
+                "pseudo": pseudo_entry,
+                "pseudo_normalize": pseudo_norm,
+                "binary": rel_binary,
+                "assembly": asm_funcs[name],
+            }
+            handle.write(json.dumps(record, ensure_ascii=False))
+            handle.write("\n")
+    print(f"[ok] {output_path.relative_to(repo_root)} -> {len(common)} functions")
+
+
+def main(argv: List[str]) -> int:
+    parser = argparse.ArgumentParser(description="Map source/pseudo/assembly per function")
+    parser.add_argument(
+        "--binary",
+        action="append",
+        help="Specific binary path (relative to repo) to process; can be repeated.",
+    )
+    parser.add_argument(
+        "--bench-root",
+        default=None,
+        help="Path to the Bringup-Bench repository root (default: from config.env).",
+    )
+    args = parser.parse_args(argv)
+    repo_root = _get_bench_root(args.bench_root)
+    binaries = _discover_binaries(args.binary, repo_root)
+    if not binaries:
+        print("No binaries found", file=sys.stderr)
+        return 1
+    for binary in binaries:
+        _build_map(binary, repo_root)
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main(sys.argv[1:]))
--- a/sk2decompile/evaluation/bringupbench/scripts/build-host-opt-levels.sh
+++ b/sk2decompile/evaluation/bringupbench/scripts/build-host-opt-levels.sh
@ -0,0 +1,24 @@
+#!/bin/bash
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+EVAL_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
+
+# Load config; allow environment overrides
+if [[ -f "${EVAL_ROOT}/config.env" ]]; then
+  set -a
+  source "${EVAL_ROOT}/config.env"
+  set +a
+fi
+
+BENCH_REPO_ROOT="${BENCH_REPO_ROOT:?Set BENCH_REPO_ROOT in config.env or environment}"
+
+cd "${BENCH_REPO_ROOT}"
+
+for opt in 0 1 2 3; do
+  echo "==> Building host binaries with -O${opt}"
+  make TARGET=host OPT_CFLAGS="-O${opt} -g" run-tests
+  find . -maxdepth 2 -type f -name '*.host' -execdir mv {} {}.O${opt} \;
+done
+
+echo "All host optimization builds complete."
--- a/sk2decompile/evaluation/bringupbench/scripts/clean-all-benchmarks.sh
+++ b/sk2decompile/evaluation/bringupbench/scripts/clean-all-benchmarks.sh
@ -0,0 +1,21 @@
+#!/bin/bash
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+EVAL_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
+
+# Load config; allow environment overrides
+if [[ -f "${EVAL_ROOT}/config.env" ]]; then
+  set -a
+  source "${EVAL_ROOT}/config.env"
+  set +a
+fi
+
+BENCH_REPO_ROOT="${BENCH_REPO_ROOT:?Set BENCH_REPO_ROOT in config.env or environment}"
+
+cd "${BENCH_REPO_ROOT}"
+
+echo "==> Running make all-clean"
+make all-clean
+
+echo "All benchmarks cleaned."
--- a/sk2decompile/evaluation/bringupbench/scripts/decompile-all-pseudo.sh
+++ b/sk2decompile/evaluation/bringupbench/scripts/decompile-all-pseudo.sh
@ -0,0 +1,50 @@
+#!/bin/bash
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+EVAL_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
+
+# Load config; allow environment overrides
+if [[ -f "${EVAL_ROOT}/config.env" ]]; then
+  set -a
+  source "${EVAL_ROOT}/config.env"
+  set +a
+fi
+
+BENCH_REPO_ROOT="${BENCH_REPO_ROOT:?Set BENCH_REPO_ROOT in config.env or environment}"
+
+IDA_BIN="${IDA_BIN:-/home/bairidreamer/software/IDA-Pro/idat}"
+DUMP_SCRIPT="${EVAL_ROOT}/scripts/dump_pseudo.py"
+
+if [[ ! -x "${IDA_BIN}" ]]; then
+  echo "error: IDA binary not found or not executable at ${IDA_BIN}" >&2
+  exit 1
+fi
+
+if [[ ! -f "${DUMP_SCRIPT}" ]]; then
+  echo "error: dump script not found at ${DUMP_SCRIPT}" >&2
+  exit 1
+fi
+
+readarray -t BINARIES < <(
+  find "${BENCH_REPO_ROOT}" -mindepth 2 -maxdepth 2 -type f \
+    \( -iname '*.o0' -o -iname '*.o1' -o -iname '*.o2' -o -iname '*.o3' \) \
+    ! -path "${BENCH_REPO_ROOT}/scripts/*" \
+    ! -path "${BENCH_REPO_ROOT}/target/*" \
+    ! -path "${BENCH_REPO_ROOT}/common/*" \
+    ! -path "${BENCH_REPO_ROOT}/.git/*" \
+    | sort
+)
+
+if [[ ${#BINARIES[@]} -eq 0 ]]; then
+  echo "error: no O0/O1/O2/O3 binaries found under ${BENCH_REPO_ROOT}" >&2
+  exit 1
+fi
+
+for binary_path in "${BINARIES[@]}"; do
+  output_path="${binary_path}.pseudo"
+  echo "==> Decompiling ${binary_path#${BENCH_REPO_ROOT}/} -> ${output_path#${BENCH_REPO_ROOT}/}"
+  "${IDA_BIN}" -A "-S${DUMP_SCRIPT} ${output_path}" "${binary_path}"
+done
+
+echo "All pseudocode dumps are located alongside their binaries."
--- a/sk2decompile/evaluation/bringupbench/scripts/disasm-all-objdump.sh
+++ b/sk2decompile/evaluation/bringupbench/scripts/disasm-all-objdump.sh
@ -0,0 +1,66 @@
+#!/bin/bash
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+EVAL_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
+
+# Load config; allow environment overrides
+if [[ -f "${EVAL_ROOT}/config.env" ]]; then
+  set -a
+  source "${EVAL_ROOT}/config.env"
+  set +a
+fi
+
+BENCH_REPO_ROOT="${BENCH_REPO_ROOT:?Set BENCH_REPO_ROOT in config.env or environment}"
+
+OBJDUMP_BIN="${OBJDUMP:-objdump}"
+NUM_JOBS="${JOBS:-}"
+
+if ! command -v "${OBJDUMP_BIN}" >/dev/null 2>&1; then
+  echo "error: objdump binary '${OBJDUMP_BIN}' not found" >&2
+  exit 1
+fi
+
+if [[ -z "${NUM_JOBS}" ]]; then
+  if command -v nproc >/dev/null 2>&1; then
+    NUM_JOBS="$(nproc)"
+  elif [[ "$(uname)" == "Darwin" ]]; then
+    NUM_JOBS="$(sysctl -n hw.ncpu)"
+  else
+    NUM_JOBS=4
+  fi
+fi
+
+if ! [[ "${NUM_JOBS}" =~ ^[0-9]+$ ]] || (( NUM_JOBS <= 0 )); then
+  echo "error: invalid JOBS value '${NUM_JOBS}'" >&2
+  exit 1
+fi
+
+readarray -t BINARIES < <(
+  find "${BENCH_REPO_ROOT}" -mindepth 2 -maxdepth 2 -type f \
+    \( -iname '*.o0' -o -iname '*.o1' -o -iname '*.o2' -o -iname '*.o3' \) \
+    ! -path "${BENCH_REPO_ROOT}/scripts/*" \
+    ! -path "${BENCH_REPO_ROOT}/target/*" \
+    ! -path "${BENCH_REPO_ROOT}/common/*" \
+    ! -path "${BENCH_REPO_ROOT}/.git/*" \
+    | sort
+)
+
+if [[ ${#BINARIES[@]} -eq 0 ]]; then
+  echo "error: no O0/O1/O2/O3 binaries found under ${BENCH_REPO_ROOT}" >&2
+  exit 1
+fi
+
+export OBJDUMP_BIN BENCH_REPO_ROOT
+
+printf '%s\0' "${BINARIES[@]}" | xargs -0 -n1 -P "${NUM_JOBS}" bash -c '
+  binary_path="$1"
+  bench_repo_root="${BENCH_REPO_ROOT}"
+  output_path="${binary_path}.s"
+  rel_in="${binary_path#"${bench_repo_root}/"}"
+  rel_out="${output_path#"${bench_repo_root}/"}"
+  echo "==> Disassembling ${rel_in} -> ${rel_out}"
+  "${OBJDUMP_BIN}" -d "${binary_path}" > "${output_path}"
+' _
+
+echo "Assembly listings written alongside each binary (extension .s)."
--- a/sk2decompile/evaluation/bringupbench/scripts/dump_pseudo.py
+++ b/sk2decompile/evaluation/bringupbench/scripts/dump_pseudo.py
@ -0,0 +1,62 @@
+"""
+Headless IDA/Hex-Rays helper to dump pseudocode for every discovered function.
+Usage (from shell):
+    idat -A -S"scripts/dump_pseudo.py /path/to/output" /path/to/binary
+"""
+
+from __future__ import annotations
+
+import os
+import sys
+
+import ida_auto
+import ida_funcs
+import ida_hexrays
+import ida_pro
+import idautils
+import idc
+
+
+def _get_output_path() -> str:
+    # IDA populates idc.ARGV with the script path at index 0 and the
+    # user-provided arguments afterwards.
+    if len(idc.ARGV) < 2:
+        raise RuntimeError("output path argument missing")
+    return os.path.abspath(idc.ARGV[1])
+
+
+def main() -> None:
+    try:
+        output_path = _get_output_path()
+    except Exception as exc:  # pragma: no cover - defensive
+        print(f"[dump_pseudo] {exc}", file=sys.stderr)
+        ida_pro.qexit(1)
+        return
+
+    ida_auto.auto_wait()
+
+    if not ida_hexrays.init_hexrays_plugin():
+        print("[dump_pseudo] Hex-Rays decompiler is unavailable", file=sys.stderr)
+        ida_pro.qexit(1)
+        return
+
+    os.makedirs(os.path.dirname(output_path), exist_ok=True)
+
+    with open(output_path, "w", encoding="utf-8") as handle:
+        for ea in idautils.Functions():
+            name = ida_funcs.get_func_name(ea)
+            handle.write(f"/* {name} @ 0x{ea:x} */\n")
+            try:
+                cfunc = ida_hexrays.decompile(ea)
+            except ida_hexrays.DecompilationFailure as exc:
+                handle.write(f"// decompilation failed: {exc}\n\n")
+                continue
+
+            handle.write(str(cfunc))
+            handle.write("\n\n")
+
+    ida_pro.qexit(0)
+
+
+if __name__ == "__main__":
+    main()
--- a/sk2decompile/evaluation/bringupbench/scripts/eval_infer_out.py
+++ b/sk2decompile/evaluation/bringupbench/scripts/eval_infer_out.py
@ -0,0 +1,682 @@
+#!/usr/bin/env python3
+"""
+Evaluate infer-out-model2 functions by patching benchmark sources inside an
+isolated workspace, rebuilding, executing, and collecting structured logs for
+every case listed in a JSONL file.
+"""
+from __future__ import annotations
+
+import argparse
+import json
+import os
+import re
+import shutil
+import subprocess
+import sys
+from concurrent.futures import ThreadPoolExecutor, as_completed
+from dataclasses import asdict, dataclass, field
+from datetime import datetime
+from pathlib import Path
+from typing import Dict, Iterable, List, Optional, Tuple
+
+
+def _load_config_env() -> dict:
+    """Load config.env from the eval project root."""
+    eval_root = Path(__file__).resolve().parents[1]
+    config_path = eval_root / "config.env"
+    config = {}
+    if config_path.exists():
+        for line in config_path.read_text().splitlines():
+            line = line.strip()
+            if not line or line.startswith("#"):
+                continue
+            if "=" in line:
+                key, _, value = line.partition("=")
+                config[key.strip()] = value.strip()
+    return config
+
+
+def _get_bench_root(cli_value: str | None = None) -> Path:
+    """Resolve the benchmark repo root from CLI arg, env var, or config.env."""
+    if cli_value:
+        return Path(cli_value).resolve()
+    env_val = os.environ.get("BENCH_REPO_ROOT")
+    if env_val:
+        return Path(env_val).resolve()
+    config = _load_config_env()
+    if "BENCH_REPO_ROOT" in config:
+        return Path(config["BENCH_REPO_ROOT"]).resolve()
+    sys.exit("error: BENCH_REPO_ROOT not set. Use --bench-root, set the env var, or configure config.env")
+
+
+@dataclass
+class CaseResult:
+    """Container for the outcome of processing a single case."""
+
+    case_id: str
+    source_path: str
+    benchmark_dir: str
+    output_dir: str
+    workspace_dir: str = ""
+    artifact_dir: str = ""
+    replacement_applied: bool = False
+    build_status: str = "skipped"  # succeeded | failed | skipped
+    test_status: str = "skipped"
+    notes: List[str] = field(default_factory=list)
+    errors: List[str] = field(default_factory=list)
+    log_files: Dict[str, str] = field(default_factory=dict)
+
+
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(
+        description="Replace functions with infer-out-model2 bodies, build, "
+        "execute, and record results without modifying the original benchmarks."
+    )
+    parser.add_argument(
+        "jsonl",
+        help="Path to the merged.*.jsonl file containing cases to evaluate.",
+    )
+    parser.add_argument(
+        "--bench-root",
+        default=None,
+        help="Path to the Bringup-Bench repository root (default: from config.env).",
+    )
+    parser.add_argument(
+        "--limit",
+        type=int,
+        default=None,
+        help="Optional limit on the number of cases to process.",
+    )
+    parser.add_argument(
+        "--target",
+        default="host",
+        help="Benchmark build target passed as TARGET=<target> (default: host).",
+    )
+    parser.add_argument(
+        "--report-dir",
+        default="reports/infer_out_eval",
+        help="Directory (relative to eval root) where aggregated reports are written.",
+    )
+    parser.add_argument(
+        "--workspace-root",
+        default="reports/infer_out_eval/workspaces",
+        help="Directory (relative to eval root) to host temporary build workspaces.",
+    )
+    parser.add_argument(
+        "--skip-clean",
+        action="store_true",
+        help="Skip running 'make clean' inside the workspace (useful when iterating).",
+    )
+    parser.add_argument(
+        "--keep-workspaces",
+        action="store_true",
+        help="Keep temporary workspaces after each case finishes (default removes them).",
+    )
+    parser.add_argument(
+        "--command-timeout",
+        type=int,
+        default=20,
+        help="Timeout (in seconds) for each make invocation; 0 disables the timeout.",
+    )
+    parser.add_argument(
+        "--jobs",
+        type=int,
+        default=96,
+        help="Number of cases to process in parallel (default: 1).",
+    )
+    return parser.parse_args()
+
+
+def canonicalize(text: str) -> str:
+    """Normalize newlines for reliable substring matching."""
+    return text.replace("\r\n", "\n")
+
+
+def replace_function_body(
+    full_source: str, reference_function: str, inferred_function: str
+) -> Tuple[str, bool]:
+    """
+    Replace the exact reference_function text with inferred_function.
+
+    Returns the updated source and a boolean indicating if replacement happened.
+    """
+    source_norm = canonicalize(full_source)
+    reference_norm = canonicalize(reference_function)
+    inferred_norm = canonicalize(inferred_function).rstrip() + "\n"
+
+    candidates = (
+        reference_norm,
+        reference_norm.rstrip() + "\n",
+        reference_norm.strip(),
+    )
+
+    for snippet in candidates:
+        start_idx = source_norm.find(snippet)
+        if start_idx == -1:
+            continue
+        end_idx = start_idx + len(snippet)
+        updated = source_norm[:start_idx] + inferred_norm + source_norm[end_idx:]
+        return updated, True
+    return full_source, False
+
+
+def compose_case_id(case: Dict) -> str:
+    """Build a stable identifier for a case."""
+    return (
+        f"{case['source']['path']}::{case['source']['function_name']}"
+        f"@{case['pseudo']['address']}"
+    )
+
+
+def ensure_case_output_dir(
+    output_root: Path, pseudo_path_str: str, pseudo_address: str, result: CaseResult
+) -> Path:
+    """Create the per-case output directory, handling file path collisions."""
+    pseudo_rel = Path(pseudo_path_str)
+    base_dir = output_root / pseudo_rel
+
+    if base_dir.exists() and base_dir.is_file():
+        fallback = base_dir.parent / f"{base_dir.name}.infer_eval"
+        fallback.mkdir(parents=True, exist_ok=True)
+        result.notes.append(
+            f"pseudo.path '{pseudo_path_str}' is a file; using '{fallback.relative_to(output_root)}' for logs."
+        )
+        base_dir = fallback
+    else:
+        base_dir.mkdir(parents=True, exist_ok=True)
+
+    case_dir = base_dir / pseudo_address
+    if case_dir.exists():
+        shutil.rmtree(case_dir)
+    case_dir.mkdir(parents=True, exist_ok=True)
+    return case_dir
+
+
+def run_command(
+    command: List[str],
+    cwd: Path,
+    log_handle,
+    step_name: str,
+    timeout: Optional[int],
+) -> Optional[int]:
+    """Run a command, capture stdout/stderr, and write everything to log_handle."""
+    log_handle.write(f"\n[{step_name}] $ {' '.join(command)}\n")
+    log_handle.flush()
+    try:
+        completed = subprocess.run(
+            command,
+            cwd=str(cwd),
+            stdout=subprocess.PIPE,
+            stderr=subprocess.STDOUT,
+            text=True,
+            encoding="utf-8",
+            errors="replace",
+            timeout=timeout if timeout and timeout > 0 else None,
+        )
+        log_handle.write(completed.stdout)
+        log_handle.write(f"[{step_name}] exit code: {completed.returncode}\n")
+        log_handle.flush()
+        return completed.returncode
+    except subprocess.TimeoutExpired as exc:
+        output = exc.output or exc.stdout
+        if output:
+            if isinstance(output, bytes):
+                log_handle.write(output.decode("utf-8", "replace"))
+            else:
+                log_handle.write(output)
+        log_handle.write(
+            f"[{step_name}] timed out after {timeout} seconds; terminating process.\n"
+        )
+        log_handle.flush()
+        return None
+
+
+def write_case_artifacts(
+    case_dir: Path,
+    case: Dict,
+    modified_source: str,
+    original_source: str,
+) -> None:
+    """Persist reusable artifacts for a case."""
+    (case_dir / "case.json").write_text(json.dumps(case, indent=2), encoding="utf-8")
+    (case_dir / "modified_source.c").write_text(modified_source, encoding="utf-8")
+    (case_dir / "original_source.c").write_text(original_source, encoding="utf-8")
+    (case_dir / "original_function.c").write_text(
+        canonicalize(case["source"]["content"]), encoding="utf-8"
+    )
+    (case_dir / "infer_function.c").write_text(
+        canonicalize(case["pseudo"]["content-fix"]), encoding="utf-8"
+    )
+
+
+def sanitize_case_id(case_id: str) -> str:
+    """Generate filesystem-safe case identifier."""
+    sanitized = re.sub(r"[^A-Za-z0-9._-]+", "_", case_id)
+    return sanitized.strip("_") or "case"
+
+
+def copy_ignore_eval_dirs(_src: str, names: List[str]) -> List[str]:
+    """Ignore helper to skip evaluation artifacts when copying benchmark dirs."""
+    ignored: List[str] = []
+    for name in names:
+        if name.endswith(".infer_eval"):
+            ignored.append(name)
+    return ignored
+
+
+def prepare_workspace(
+    repo_root: Path,
+    benchmark_dir: Path,
+    workspace_root: Path,
+    case_id: str,
+) -> Tuple[Path, Path]:
+    """Clone the necessary subset of the repo into a temporary workspace."""
+    workspace_case_root = workspace_root / sanitize_case_id(case_id)
+    if workspace_case_root.exists():
+        shutil.rmtree(workspace_case_root)
+    workspace_repo_root = workspace_case_root / "repo"
+    workspace_repo_root.mkdir(parents=True, exist_ok=True)
+
+    shutil.copy2(repo_root / "Makefile", workspace_repo_root / "Makefile")
+    shutil.copytree(repo_root / "common", workspace_repo_root / "common", dirs_exist_ok=True)
+    shutil.copytree(repo_root / "target", workspace_repo_root / "target", dirs_exist_ok=True)
+    shutil.copytree(
+        benchmark_dir,
+        workspace_repo_root / benchmark_dir.name,
+        dirs_exist_ok=True,
+        ignore=copy_ignore_eval_dirs,
+    )
+    return workspace_case_root, workspace_repo_root
+
+
+def relative_to_repo(path: Path, repo_root: Path) -> str:
+    """Return a path relative to repo_root when possible."""
+    try:
+        return str(path.relative_to(repo_root))
+    except ValueError:
+        return str(path)
+
+
+def init_case_result(case: Dict, repo_root: Path) -> CaseResult:
+    """Create a CaseResult with basic metadata for the given case."""
+    source_rel = Path(case["source"]["path"])
+    benchmark_dir_path = (repo_root / source_rel).parent
+    try:
+        benchmark_rel = str(benchmark_dir_path.relative_to(repo_root))
+    except ValueError:
+        benchmark_rel = str(benchmark_dir_path)
+    return CaseResult(
+        case_id=compose_case_id(case),
+        source_path=str(source_rel),
+        benchmark_dir=benchmark_rel,
+        output_dir="",
+    )
+
+
+def snapshot_artifacts(
+    case_dir: Path,
+    workspace_benchmark_dir: Path,
+    eval_root: Path,
+    result: CaseResult,
+) -> None:
+    """Copy the workspace benchmark directory into the case directory."""
+    artifacts_dir = case_dir / "artifacts"
+    if artifacts_dir.exists():
+        shutil.rmtree(artifacts_dir)
+    try:
+        shutil.copytree(workspace_benchmark_dir, artifacts_dir)
+        result.artifact_dir = relative_to_repo(artifacts_dir, eval_root)
+    except Exception as exc:  # pragma: no cover - defensive
+        result.notes.append(f"Failed to copy artifacts: {exc}")
+
+
+def process_case(
+    case: Dict,
+    args: argparse.Namespace,
+    repo_root: Path,
+    eval_root: Path,
+) -> CaseResult:
+    """Process a single JSONL entry."""
+    case_id = compose_case_id(case)
+    source_rel = Path(case["source"]["path"])
+    source_path = repo_root / source_rel
+    benchmark_dir = source_path.parent
+
+    result = init_case_result(case, repo_root)
+
+    if not source_path.exists():
+        result.errors.append(f"Source file '{source_rel}' does not exist.")
+        return result
+
+    try:
+        case_dir = ensure_case_output_dir(
+            eval_root, case["pseudo"]["path"], case["pseudo"]["address"], result
+        )
+    except Exception as exc:  # pragma: no cover - defensive
+        result.errors.append(f"Failed to prepare case directory: {exc}")
+        return result
+
+    result.output_dir = str(case_dir.relative_to(eval_root))
+
+    full_source_text = source_path.read_text(encoding="utf-8")
+    updated_source, replaced = replace_function_body(
+        full_source_text,
+        case["source"]["content"],
+        case["pseudo"]["content-fix"],
+    )
+
+    if not replaced:
+        result.errors.append(
+            "Could not locate the original function snippet in source file."
+        )
+        return result
+
+    result.replacement_applied = True
+    write_case_artifacts(case_dir, case, updated_source, full_source_text)
+
+    workspace_root = Path(args.workspace_root)
+    if not workspace_root.is_absolute():
+        workspace_root = eval_root / workspace_root
+    workspace_root.mkdir(parents=True, exist_ok=True)
+
+    workspace_case_root: Optional[Path] = None
+    try:
+        workspace_case_root, workspace_repo_root = prepare_workspace(
+            repo_root, benchmark_dir, workspace_root, case_id
+        )
+        workspace_benchmark_dir = workspace_repo_root / benchmark_dir.name
+        artifacts_captured = False
+
+        def capture_artifacts() -> None:
+            nonlocal artifacts_captured
+            if artifacts_captured:
+                return
+            snapshot_artifacts(case_dir, workspace_benchmark_dir, eval_root, result)
+            artifacts_captured = True
+
+        workspace_source_path = workspace_repo_root / source_rel
+        workspace_source_path.write_text(updated_source, encoding="utf-8")
+
+        result.workspace_dir = relative_to_repo(workspace_case_root, eval_root)
+
+        log_path = case_dir / "case.log"
+        with log_path.open("w", encoding="utf-8") as log_handle:
+            log_handle.write(f"Case: {case_id}\n")
+            log_handle.write(f"Workspace: {workspace_case_root}\n")
+            log_handle.write(f"Benchmark copy: {workspace_benchmark_dir}\n")
+            log_handle.write(f"Target: {args.target}\n")
+            log_handle.flush()
+
+            if not args.skip_clean:
+                clean_rc = run_command(
+                    ["make", f"TARGET={args.target}", "clean"],
+                    workspace_benchmark_dir,
+                    log_handle,
+                    "clean",
+                    args.command_timeout,
+                )
+                if clean_rc is None:
+                    result.errors.append(
+                        f"'make clean' timed out after {args.command_timeout} seconds."
+                    )
+                    capture_artifacts()
+                    result.log_files["case"] = relative_to_repo(log_path, eval_root)
+                    return result
+                if clean_rc != 0:
+                    result.build_status = "failed"
+                    result.errors.append("make clean failed.")
+                    capture_artifacts()
+                    result.log_files["case"] = relative_to_repo(log_path, eval_root)
+                    return result
+            else:
+                log_handle.write("Skipping 'make clean' per --skip-clean flag.\n")
+
+            build_rc = run_command(
+                ["make", f"TARGET={args.target}", "build"],
+                workspace_benchmark_dir,
+                log_handle,
+                "build",
+                args.command_timeout,
+            )
+
+            result.log_files["case"] = relative_to_repo(log_path, eval_root)
+            if build_rc is None:
+                result.build_status = "failed"
+                result.errors.append(
+                    f"'make build' timed out after {args.command_timeout} seconds."
+                )
+                capture_artifacts()
+                log_handle.write("Skipping test because build timed out.\n")
+                return result
+            if build_rc == 0:
+                result.build_status = "succeeded"
+            else:
+                result.build_status = "failed"
+                result.errors.append("make build failed.")
+                log_handle.write("Skipping test because build failed.\n")
+                capture_artifacts()
+                return result
+
+            test_rc = run_command(
+                ["make", f"TARGET={args.target}", "test"],
+                workspace_benchmark_dir,
+                log_handle,
+                "test",
+                args.command_timeout,
+            )
+
+            if test_rc is None:
+                result.test_status = "failed"
+                result.errors.append(
+                    f"'make test' timed out after {args.command_timeout} seconds."
+                )
+            elif test_rc == 0:
+                result.test_status = "succeeded"
+            else:
+                result.test_status = "failed"
+                result.errors.append("make test failed.")
+
+            capture_artifacts()
+
+    finally:
+        if (
+            workspace_case_root
+            and workspace_case_root.exists()
+            and not args.keep_workspaces
+        ):
+            shutil.rmtree(workspace_case_root, ignore_errors=True)
+
+    return result
+
+
+def collect_cases(jsonl_path: Path, limit: Optional[int]) -> Iterable[Dict]:
+    """Yield cases from jsonl file respecting the optional limit."""
+    processed = 0
+    with jsonl_path.open("r", encoding="utf-8") as handle:
+        for line in handle:
+            stripped = line.strip()
+            if not stripped:
+                continue
+            yield json.loads(stripped)
+            processed += 1
+            if limit is not None and processed >= limit:
+                break
+
+
+def compute_summary(results: List[CaseResult]) -> Dict:
+    """Aggregate statistics over all case results."""
+    total = len(results)
+    replacements = sum(1 for r in results if r.replacement_applied)
+    build_success = sum(1 for r in results if r.build_status == "succeeded")
+    test_success = sum(1 for r in results if r.test_status == "succeeded")
+
+    def frac(passed: int, denom: int) -> float:
+        return round(passed / denom, 4) if denom else 0.0
+
+    per_benchmark: Dict[str, Dict[str, float]] = {}
+    for r in results:
+        stats = per_benchmark.setdefault(
+            r.benchmark_dir,
+            {
+                "cases": 0,
+                "replacements": 0,
+                "build_success": 0,
+                "test_success": 0,
+            },
+        )
+        stats["cases"] += 1
+        if r.replacement_applied:
+            stats["replacements"] += 1
+        if r.build_status == "succeeded":
+            stats["build_success"] += 1
+        if r.test_status == "succeeded":
+            stats["test_success"] += 1
+
+    for stats in per_benchmark.values():
+        stats["replacement_rate"] = frac(stats["replacements"], stats["cases"])
+        stats["build_rate"] = frac(stats["build_success"], stats["cases"])
+        stats["test_rate"] = frac(stats["test_success"], stats["cases"])
+
+    summary = {
+        "total_cases": total,
+        "replacement_success_count": replacements,
+        "replacement_success_rate": frac(replacements, total),
+        "compilable_count": build_success,
+        "compilable_rate": frac(build_success, total),
+        "executable_count": test_success,
+        "executable_rate": frac(test_success, total),
+        "compilation_failures": [
+            r.case_id for r in results if r.build_status == "failed"
+        ],
+        "execution_failures": [
+            r.case_id
+            for r in results
+            if r.build_status == "succeeded" and r.test_status == "failed"
+        ],
+        "cases": [asdict(r) for r in results],
+        "by_benchmark": per_benchmark,
+    }
+    return summary
+
+
+def write_summary(
+    eval_root: Path,
+    args: argparse.Namespace,
+    jsonl_path: Path,
+    summary: Dict,
+) -> Tuple[Path, Path]:
+    """Write JSON and Markdown summary reports."""
+    report_root = eval_root / args.report_dir
+    report_root.mkdir(parents=True, exist_ok=True)
+
+    timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
+    base_name = f"{jsonl_path.stem}-{args.target}"
+    json_report = report_root / f"{base_name}-{timestamp}.json"
+    markdown_report = report_root / f"{base_name}-{timestamp}.md"
+
+    json_report.write_text(json.dumps(summary, indent=2), encoding="utf-8")
+
+    benchmark_lines = [
+        "| Benchmark | Cases | Replacement% | Build% | Exec% |",
+        "| --- | --- | --- | --- | --- |",
+    ]
+    for bench, stats in sorted(summary["by_benchmark"].items()):
+        benchmark_lines.append(
+            f"| {bench} | {stats['cases']} | "
+            f"{stats['replacement_rate']*100:.2f}% | "
+            f"{stats['build_rate']*100:.2f}% | "
+            f"{stats['test_rate']*100:.2f}% |"
+        )
+    if len(benchmark_lines) == 2:
+        benchmark_lines.append("| (none) | 0 | 0.00% | 0.00% | 0.00% |")
+
+    compilation_items = summary["compilation_failures"] or ["None"]
+    execution_items = summary["execution_failures"] or ["None"]
+
+    relative_jsonl = relative_to_repo(jsonl_path, eval_root)
+
+    lines = [
+        f"# Infer-Out Model 2 Evaluation ({base_name})",
+        "",
+        f"- Timestamp: {timestamp}",
+        f"- Source JSONL: {relative_jsonl}",
+        f"- Target: {args.target}",
+        f"- Total cases: {summary['total_cases']}",
+        f"- Replacement success: {summary['replacement_success_count']} "
+        f"({summary['replacement_success_rate']*100:.2f}%)",
+        f"- Compilable: {summary['compilable_count']} "
+        f"({summary['compilable_rate']*100:.2f}%)",
+        f"- Executable: {summary['executable_count']} "
+        f"({summary['executable_rate']*100:.2f}%)",
+        "",
+        "## Benchmark Breakdown",
+        *benchmark_lines,
+        "",
+        "## Compilation Failures",
+    ]
+    lines.extend(f"- {cid}" for cid in compilation_items)
+    lines.append("")
+    lines.append("## Execution Failures")
+    lines.extend(f"- {cid}" for cid in execution_items)
+
+    markdown_report.write_text("\n".join(lines), encoding="utf-8")
+    return json_report, markdown_report
+
+
+def main() -> int:
+    args = parse_args()
+    eval_root = Path(__file__).resolve().parents[1]
+    repo_root = _get_bench_root(args.bench_root)
+    jsonl_path = Path(args.jsonl)
+    if not jsonl_path.is_absolute():
+        jsonl_path = eval_root / jsonl_path
+
+    if not jsonl_path.exists():
+        print(f"JSONL file '{jsonl_path}' not found.", file=sys.stderr)
+        return 1
+
+    cases = list(collect_cases(jsonl_path, args.limit))
+    if not cases:
+        print("No cases to process.")
+        return 0
+
+    results: List[Optional[CaseResult]] = [None] * len(cases)
+
+    def record_result(idx: int, case_result: CaseResult) -> None:
+        results[idx] = case_result
+        status = (
+            f"build={case_result.build_status}, test={case_result.test_status}"
+            if case_result.replacement_applied
+            else "replacement_failed"
+        )
+        print(f"[{idx + 1}] {case_result.case_id}: {status}")
+
+    if args.jobs <= 1:
+        for idx, case in enumerate(cases):
+            case_result = process_case(case, args, repo_root, eval_root)
+            record_result(idx, case_result)
+    else:
+        with ThreadPoolExecutor(max_workers=args.jobs) as executor:
+            future_to_idx = {
+                executor.submit(process_case, case, args, repo_root, eval_root): idx
+                for idx, case in enumerate(cases)
+            }
+            for future in as_completed(future_to_idx):
+                idx = future_to_idx[future]
+                try:
+                    case_result = future.result()
+                except Exception as exc:  # pragma: no cover - defensive
+                    case_result = init_case_result(cases[idx], repo_root)
+                    case_result.errors.append(f"Unhandled exception: {exc}")
+                record_result(idx, case_result)
+
+    final_results = [res for res in results if res is not None]
+
+    summary = compute_summary(final_results)
+    json_report, markdown_report = write_summary(eval_root, args, jsonl_path, summary)
+    print(f"Wrote summary reports:\n - {json_report}\n - {markdown_report}")
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
--- a/sk2decompile/verl/SK2DECOMPILE/README.md
+++ b/sk2decompile/verl/SK2DECOMPILE/README.md
@ -0,0 +1,180 @@
+# SK²Decompile — Reinforcement Learning with VERL
+
+This directory contains the RL (Reinforcement Learning) training pipeline for SK²Decompile, built on top of the [VERL](https://github.com/volcengine/verl) framework (Sheng et al., 2024).
+
+For the full methodology and experimental details, please refer to our paper:
+> **SK²Decompile: LLM-based Two-Phase Binary Decompilation from Skeleton to Skin**
+> [[arXiv:2509.22114]](https://arxiv.org/abs/2509.22114)
+
+## Overview
+
+After supervised fine-tuning (SFT), SK²Decompile applies reinforcement learning to further align each phase's model with task-specific objectives. We adopt the **GRPO** (Group Relative Policy Optimization) algorithm (DeepSeek-AI et al., 2025) to train both models with their respective reward signals:
+
+- **Structure Recovery** (Skeleton): The reward is based on compiler feedback — a positive reward is granted only if the generated IR successfully compiles, with an additional component reflecting the correctness of placeholder recovery (Equation 3 in the paper).
+- **Identifier Naming** (Skin): The reward is the cosine similarity between the embeddings of the generated code and the reference source code, encouraging semantically aligned identifier predictions rather than exact lexical matches (Equation 4 in the paper).
+
+The reward functions and training scripts provided here are **reference implementations** for reproducing the RL training pipeline. For the precise reward formulations and design rationale, please refer to Section 3.5 of the paper.
+
+## Directory Structure
+
+```
+SK2DECOMPILE/
+├── README.md                          # This file
+├── data/
+│   └── sk2decompile-rl-examples.jsonl # Example RL training data
+├── reward_functions/                  # Reference reward implementations
+│   ├── __init__.py
+│   ├── exe_type.py                    # Example: compilability + placeholder Jaccard
+│   ├── sim_exe.py                     # Example: compilability + word-level similarity
+│   ├── embedding_gte.py               # Example: embedding-based identifier similarity (GTE)
+│   └── embedding_qwen3.py            # Example: embedding-based identifier similarity (Qwen3)
+└── scripts/
+    ├── run_struct_rl.sh               # Reference script: Structure Recovery RL
+    └── run_ident_rl.sh                # Reference script: Identifier Naming RL
+```
+
+## Reward Formulations (from the Paper)
+
+### Structure Recovery Reward (Eq. 3)
+
+The Structure Recovery reward consists of two components:
+
+1. **Compilability**: The generated IR is compiled using the ground-truth header. A reward of 1.0 is granted only upon successful compilation (verified via [Psyche-C](https://github.com/ltcmelo/psychec.git) for header generation).
+2. **Placeholder Recovery**: The Jaccard similarity between the generated placeholder set (I_gen) and the ground-truth set (I_IR).
+
+```
+r_placeholder = |I_gen ∩ I_IR| / |I_gen ∪ I_IR|
+
+r_structure = { 0.0,                        if IR cannot be compiled
+              { 1.0 + r_placeholder,         if IR can be compiled
+```
+
+### Identifier Naming Reward (Eq. 4)
+
+The Identifier Naming reward measures the semantic similarity between the generated code and the reference source code using embedding cosine similarity:
+
+```
+r_identifier = cos(e_gen, e_src) = (e_gen · e_src) / (||e_gen|| · ||e_src||)
+```
+
+where `e_gen` and `e_src` are the embeddings of the generated and reference code respectively. In our experiments, we use qwen-embedding-0.6B (Zhang et al., 2025) as the embedding model.
+
+> **Note**: The reward functions in `reward_functions/` are reference implementations that demonstrate the reward design. Please refer to Section 3.5 of the paper for the complete formulation and design rationale.
+
+## Reproduction Guide
+
+### Step 1: Install VERL
+
+Our RL training is based on **VERL v0.4.1** ([HybridFlow](https://github.com/volcengine/verl), Sheng et al., 2024). We recommend using the same version for reproducibility.
+
+```bash
+git clone https://github.com/volcengine/verl.git
+cd verl
+git checkout v0.4.1  # or the commit closest to v0.4.1
+pip install -e .
+```
+
+### Step 2: Integrate Reward Functions
+
+Copy the reward functions into VERL's reward module and register them in the routing dispatcher:
+
+```bash
+# Copy reward functions
+cp reward_functions/exe_type.py       <VERL_DIR>/verl/utils/reward_score/sk2d_exe_type.py
+cp reward_functions/sim_exe.py        <VERL_DIR>/verl/utils/reward_score/sk2d_sim_exe.py
+cp reward_functions/embedding_gte.py  <VERL_DIR>/verl/utils/reward_score/sk2d_embedding_gte.py
+cp reward_functions/embedding_qwen3.py <VERL_DIR>/verl/utils/reward_score/sk2d_embedding_qwen3.py
+```
+
+Then add routing branches to `<VERL_DIR>/verl/utils/reward_score/__init__.py` in the `default_compute_score()` function:
+
+```python
+# Structure Recovery reward (example)
+elif data_source == "sk2decompile_structure":
+    from . import sk2d_exe_type
+    res = sk2d_exe_type.compute_score(solution_str, ground_truth, extra_info)
+
+# Identifier Naming reward (example)
+elif data_source == "sk2decompile_identifier":
+    from . import sk2d_embedding_qwen3
+    res = sk2d_embedding_qwen3.compute_score(solution_str, ground_truth, extra_info)
+```
+
+The `data_source` field in your training Parquet files determines which reward function is dispatched for each sample.
+
+### Step 3: Prepare Training Data
+
+Training data should be in Parquet format. Each row contains:
+
+| Field | Description |
+|-------|-------------|
+| `prompt` | Chat-format messages, e.g., `[{"role": "user", "content": "<pseudocode>... What is the source code?"}]` |
+| `data_source` | Reward function routing key (must match the branch registered in Step 2) |
+| `reward_model.ground_truth` | Expected output (IR for Structure Recovery, source code for Identifier Naming) |
+| `reward_model.style` | `"rule"` (rule-based reward) |
+| `extra_info.header` | C header declarations for compilability checking (Structure Recovery only) |
+
+See `data/sk2decompile-rl-examples.jsonl` for example data format. Convert JSONL to Parquet before training.
+
+### Step 4: Launch Training
+
+The reference training scripts are in `scripts/`. Edit the configuration variables at the top of each script before launching.
+
+**Structure Recovery RL:**
+```bash
+# Edit scripts/run_struct_rl.sh to set:
+#   VERL_DIR, VENV_PATH, MODEL_PATH, TRAIN_DATA, VAL_DATA, WANDB_*
+bash scripts/run_struct_rl.sh
+```
+
+**Identifier Naming RL** (requires a running embedding server):
+```bash
+# 1. Start the embedding server
+python -m vllm.entrypoints.openai.api_server \
+    --model Qwen3-Embedding-0.6B --port 8000 --dtype float16
+
+# 2. Edit scripts/run_ident_rl.sh to set:
+#   VERL_DIR, VENV_PATH, MODEL_PATH, TRAIN_DATA, VAL_DATA, WANDB_*
+bash scripts/run_ident_rl.sh
+```
+
+### Step 5: Install Additional Dependencies
+
+```bash
+# For compiler-based rewards (Structure Recovery)
+apt install gcc
+pip install psychec  # or build from https://github.com/ltcmelo/psychec.git
+
+# For embedding-based rewards (Identifier Naming)
+pip install tree-sitter==0.24.0 tree-sitter-c==0.23.4 openai
+```
+
+## Configurations
+
+Reference hyperparameters used in the training scripts:
+
+| Parameter | Structure Recovery | Identifier Naming |
+|-----------|:-:|:-:|
+| `train_batch_size` | 128 | 128 |
+| `max_prompt_length` | 1024 | 1024 |
+| `max_response_length` | 2048 | 2048 |
+| `lr` | 1e-6 | 1e-6 |
+| `kl_loss_coef` | 0.01 | 0.02 |
+| `kl_loss_type` | low_var_kl | low_var_kl |
+| `rollout.n` (GRPO samples) | 16 | 16 |
+| `total_epochs` | 2 | 2 |
+
+## Troubleshooting
+
+**OOM (Out of Memory)**:
+- Reduce `ppo_micro_batch_size_per_gpu` (default: 4)
+- Enable `actor.fsdp_config.param_offload=True`
+- Reduce `rollout.gpu_memory_utilization` (default: 0.80)
+
+**Embedding server connection error** (Identifier Naming only):
+- Ensure the vLLM embedding server is running on port 8000
+- Check environment variables: `QWEN3_EMBEDDING_API_BASE` (default: `http://127.0.0.1:8000/v1`)
+
+**Compilation timeout in reward** (Structure Recovery only):
+- The `gcc -c` call has a 5-second timeout per sample
+- If many samples timeout, check if the generated code contains infinite loops
--- a/sk2decompile/verl/SK2DECOMPILE/reward_functions/init.py
+++ b/sk2decompile/verl/SK2DECOMPILE/reward_functions/init.py
@ -0,0 +1,23 @@
+"""
+SK2Decompile — Reference Reward Functions for GRPO Training.
+
+This module provides reference implementations of reward functions used in the
+SK2Decompile RL training pipeline. These are example implementations that
+demonstrate the reward design described in Section 3.5 of the paper:
+
+  SK2Decompile: LLM-based Two-Phase Binary Decompilation from Skeleton to Skin
+  (arXiv:2509.22114)
+
+Reference implementations:
+- exe_type: Compilability + placeholder identifier Jaccard similarity
+- sim_exe: Compilability + word-level Jaccard similarity
+- embedding_gte: Tree-sitter identifier extraction + GTE embedding cosine similarity
+- embedding_qwen3: Tree-sitter identifier extraction + Qwen3 embedding cosine similarity
+
+To integrate into VERL, copy these files into verl/utils/reward_score/ and
+register routing branches in __init__.py. See README.md for details.
+"""
+
+from . import exe_type, sim_exe, embedding_gte, embedding_qwen3
+
+__all__ = ["exe_type", "sim_exe", "embedding_gte", "embedding_qwen3"]
--- a/sk2decompile/verl/SK2DECOMPILE/reward_functions/embedding_gte.py
+++ b/sk2decompile/verl/SK2DECOMPILE/reward_functions/embedding_gte.py
@ -0,0 +1,189 @@
+"""
+Reference reward function: GTE Embedding-based Identifier Similarity.
+
+This is a reference implementation of the Identifier Naming reward (Eq. 4)
+described in the SK2Decompile paper (arXiv:2509.22114, Section 3.5).
+
+Evaluates decompiled C code by:
+1. Using tree-sitter to parse C code and extract identifiers (func/var/type/field)
+2. Building a naming summary string per code sample
+3. Computing cosine similarity between GTE embeddings of the two summaries
+4. Squaring the similarity score to sharpen the reward signal
+
+Final score = cosine_similarity^2
+
+Requires:
+- A running OpenAI-compatible embedding server (e.g., vLLM serving gte-large-en-v1.5)
+- tree-sitter and tree-sitter-c packages
+
+Environment variables:
+- GTE_EMBEDDING_MODEL_PATH: Model name/path (default: "gte-large-en-v1.5")
+- GTE_EMBEDDING_API_KEY or OPENAI_API_KEY: API key (default: "none")
+- GTE_EMBEDDING_API_BASE: API base URL (default: "http://127.0.0.1:8000/v1")
+"""
+
+import math
+import os
+import random
+from typing import Dict, List, Optional, Sequence, Tuple
+
+from openai import OpenAI
+from tree_sitter import Language, Parser
+import tree_sitter_c as tsc
+
+# ---- OpenAI Embedding Client ----
+
+_MODEL_NAME = os.getenv("GTE_EMBEDDING_MODEL_PATH", "gte-large-en-v1.5")
+_API_KEY = os.getenv("GTE_EMBEDDING_API_KEY") or os.getenv("OPENAI_API_KEY") or "none"
+_API_BASE = os.getenv("GTE_EMBEDDING_API_BASE", "http://127.0.0.1:8000/v1")
+_client: Optional[OpenAI] = None
+
+
+def _get_client() -> OpenAI:
+    global _client
+    if _client is None:
+        if _API_BASE:
+            _client = OpenAI(api_key=_API_KEY, base_url=_API_BASE)
+        elif _API_KEY:
+            _client = OpenAI(api_key=_API_KEY)
+        else:
+            _client = OpenAI()
+    return _client
+
+
+def _embed_two(text_a: str, text_b: str) -> Tuple[List[float], List[float]]:
+    """Embed two texts in a single API call, return their embedding vectors."""
+    client = _get_client()
+    resp = client.embeddings.create(model=_MODEL_NAME, input=[text_a, text_b])
+    emb_a = [float(x) for x in resp.data[0].embedding]
+    emb_b = [float(x) for x in resp.data[1].embedding]
+    return emb_a, emb_b
+
+
+def _cosine_similarity(vec_a: Sequence[float], vec_b: Sequence[float]) -> float:
+    dot = sum(a * b for a, b in zip(vec_a, vec_b))
+    norm_a = math.sqrt(sum(a * a for a in vec_a))
+    norm_b = math.sqrt(sum(b * b for b in vec_b))
+    if norm_a == 0 or norm_b == 0:
+        return 0.0
+    return dot / (norm_a * norm_b)
+
+
+# ---- Tree-sitter C: Identifier Extraction ----
+
+C_LANG = Language(tsc.language())
+_TS_PARSER = Parser(C_LANG)
+
+
+def _classify_node(node):
+    """
+    Classify a tree-sitter node into identifier categories:
+    - func: function names (definitions + calls)
+    - var: variable names (parameters / local / global)
+    - type: type names
+    - field: struct field names
+    """
+    node_type = node.type
+    name = node.text.decode("utf8")
+
+    if node_type == "type_identifier":
+        return "type", name
+    if node_type == "field_identifier":
+        return "field", name
+    if node_type != "identifier":
+        return None, None
+
+    parent = node.parent
+    if parent:
+        parent_type = parent.type
+        if parent_type == "function_declarator" and parent.child_by_field_name("declarator") == node:
+            return "func", name
+        if parent_type == "call_expression" and parent.child_by_field_name("function") == node:
+            return "func", name
+        if parent_type in ("init_declarator", "parameter_declaration", "declaration", "pointer_declarator"):
+            return "var", name
+
+    return "var", name
+
+
+def _extract_identifiers_ts(code: str, max_per_type: int = 64) -> Dict[str, List[str]]:
+    """Extract identifiers from C code using tree-sitter, classified by type."""
+    tree = _TS_PARSER.parse(code.encode("utf8"))
+    result: Dict[str, List[str]] = {"func": [], "var": [], "type": [], "field": []}
+
+    stack = [tree.root_node]
+    while stack:
+        node = stack.pop()
+        id_type, name = _classify_node(node)
+        if id_type in result and len(result[id_type]) < max_per_type:
+            result[id_type].append(name)
+        stack.extend(node.children)
+
+    return result
+
+
+# ---- Summary Construction & Similarity ----
+
+
+def _build_summary_text(identifiers: Dict[str, List[str]], max_per_type: int = 64) -> str:
+    """
+    Build a naming summary string from classified identifiers.
+    Example: "func: foo bar || type: my_type || field: field1 field2 || var: i j k"
+    """
+    parts: List[str] = []
+    for kind in ("func", "type", "field", "var"):
+        names = identifiers.get(kind, [])
+        if not names:
+            continue
+        segment = f"{kind}: " + " ".join(names[:max_per_type])
+        parts.append(segment)
+    return " || ".join(parts)
+
+
+def _identifier_similarity_ts(candidate_text: str, reference_text: str):
+    """
+    Compute identifier-level similarity using embedding cosine similarity.
+
+    Steps:
+    1. Extract identifiers from both texts using tree-sitter
+    2. Build naming summary strings
+    3. Embed both summaries in a single API call
+    4. Return cosine similarity as name_score
+
+    Returns:
+        name_score: float in [0, 1]
+    """
+    cand_ids = _extract_identifiers_ts(candidate_text)
+    ref_ids = _extract_identifiers_ts(reference_text)
+
+    cand_summary = _build_summary_text(cand_ids)
+    ref_summary = _build_summary_text(ref_ids)
+
+    if not cand_summary or not ref_summary:
+        return 0.0
+
+    emb_cand, emb_ref = _embed_two(cand_summary, ref_summary)
+    return _cosine_similarity(emb_cand, emb_ref)
+
+
+# ---- Main Reward Function ----
+
+
+def compute_score(solution_str, ground_truth, extra_info=None):
+    """
+    Compute reward based on identifier naming similarity using GTE embeddings.
+    Returns score^2 to sharpen the reward signal.
+    """
+    if not isinstance(solution_str, str):
+        solution_str = "" if solution_str is None else str(solution_str)
+    if not isinstance(ground_truth, str):
+        ground_truth = "" if ground_truth is None else str(ground_truth)
+
+    candidate_text = solution_str.strip()
+    reference_text = ground_truth.strip()
+
+    if not candidate_text or not reference_text:
+        return 0.0
+
+    name_score = _identifier_similarity_ts(candidate_text, reference_text)
+    return name_score * name_score
--- a/sk2decompile/verl/SK2DECOMPILE/reward_functions/embedding_qwen3.py
+++ b/sk2decompile/verl/SK2DECOMPILE/reward_functions/embedding_qwen3.py
@ -0,0 +1,189 @@
+"""
+Reference reward function: Qwen3 Embedding-based Identifier Similarity.
+
+This is a reference implementation of the Identifier Naming reward (Eq. 4)
+described in the SK2Decompile paper (arXiv:2509.22114, Section 3.5).
+
+Evaluates decompiled C code by:
+1. Using tree-sitter to parse C code and extract identifiers (func/var/type/field)
+2. Building a naming summary string per code sample
+3. Computing cosine similarity between Qwen3 embeddings of the two summaries
+4. Squaring the similarity score to sharpen the reward signal
+
+Final score = cosine_similarity^2
+
+Requires:
+- A running OpenAI-compatible embedding server (e.g., vLLM serving Qwen3-Embedding-0.6B)
+- tree-sitter and tree-sitter-c packages
+
+Environment variables:
+- QWEN3_EMBEDDING_MODEL_PATH: Model name/path (default: "Qwen3-Embedding-0.6B")
+- QWEN3_EMBEDDING_API_KEY or OPENAI_API_KEY: API key (default: "none")
+- QWEN3_EMBEDDING_API_BASE: API base URL (default: "http://127.0.0.1:8000/v1")
+"""
+
+import math
+import os
+import random
+from typing import Dict, List, Optional, Sequence, Tuple
+
+from openai import OpenAI
+from tree_sitter import Language, Parser
+import tree_sitter_c as tsc
+
+# ---- OpenAI Embedding Client ----
+
+_MODEL_NAME = os.getenv("QWEN3_EMBEDDING_MODEL_PATH", "Qwen3-Embedding-0.6B")
+_API_KEY = os.getenv("QWEN3_EMBEDDING_API_KEY") or os.getenv("OPENAI_API_KEY") or "none"
+_API_BASE = os.getenv("QWEN3_EMBEDDING_API_BASE", "http://127.0.0.1:8000/v1")
+_client: Optional[OpenAI] = None
+
+
+def _get_client() -> OpenAI:
+    global _client
+    if _client is None:
+        if _API_BASE:
+            _client = OpenAI(api_key=_API_KEY, base_url=_API_BASE)
+        elif _API_KEY:
+            _client = OpenAI(api_key=_API_KEY)
+        else:
+            _client = OpenAI()
+    return _client
+
+
+def _embed_two(text_a: str, text_b: str) -> Tuple[List[float], List[float]]:
+    """Embed two texts in a single API call, return their embedding vectors."""
+    client = _get_client()
+    resp = client.embeddings.create(model=_MODEL_NAME, input=[text_a, text_b])
+    emb_a = [float(x) for x in resp.data[0].embedding]
+    emb_b = [float(x) for x in resp.data[1].embedding]
+    return emb_a, emb_b
+
+
+def _cosine_similarity(vec_a: Sequence[float], vec_b: Sequence[float]) -> float:
+    dot = sum(a * b for a, b in zip(vec_a, vec_b))
+    norm_a = math.sqrt(sum(a * a for a in vec_a))
+    norm_b = math.sqrt(sum(b * b for b in vec_b))
+    if norm_a == 0 or norm_b == 0:
+        return 0.0
+    return dot / (norm_a * norm_b)
+
+
+# ---- Tree-sitter C: Identifier Extraction ----
+
+C_LANG = Language(tsc.language())
+_TS_PARSER = Parser(C_LANG)
+
+
+def _classify_node(node):
+    """
+    Classify a tree-sitter node into identifier categories:
+    - func: function names (definitions + calls)
+    - var: variable names (parameters / local / global)
+    - type: type names
+    - field: struct field names
+    """
+    node_type = node.type
+    name = node.text.decode("utf8")
+
+    if node_type == "type_identifier":
+        return "type", name
+    if node_type == "field_identifier":
+        return "field", name
+    if node_type != "identifier":
+        return None, None
+
+    parent = node.parent
+    if parent:
+        parent_type = parent.type
+        if parent_type == "function_declarator" and parent.child_by_field_name("declarator") == node:
+            return "func", name
+        if parent_type == "call_expression" and parent.child_by_field_name("function") == node:
+            return "func", name
+        if parent_type in ("init_declarator", "parameter_declaration", "declaration", "pointer_declarator"):
+            return "var", name
+
+    return "var", name
+
+
+def _extract_identifiers_ts(code: str, max_per_type: int = 64) -> Dict[str, List[str]]:
+    """Extract identifiers from C code using tree-sitter, classified by type."""
+    tree = _TS_PARSER.parse(code.encode("utf8"))
+    result: Dict[str, List[str]] = {"func": [], "var": [], "type": [], "field": []}
+
+    stack = [tree.root_node]
+    while stack:
+        node = stack.pop()
+        id_type, name = _classify_node(node)
+        if id_type in result and len(result[id_type]) < max_per_type:
+            result[id_type].append(name)
+        stack.extend(node.children)
+
+    return result
+
+
+# ---- Summary Construction & Similarity ----
+
+
+def _build_summary_text(identifiers: Dict[str, List[str]], max_per_type: int = 64) -> str:
+    """
+    Build a naming summary string from classified identifiers.
+    Example: "func: foo bar || type: my_type || field: field1 field2 || var: i j k"
+    """
+    parts: List[str] = []
+    for kind in ("func", "type", "field", "var"):
+        names = identifiers.get(kind, [])
+        if not names:
+            continue
+        segment = f"{kind}: " + " ".join(names[:max_per_type])
+        parts.append(segment)
+    return " || ".join(parts)
+
+
+def _identifier_similarity_ts(candidate_text: str, reference_text: str):
+    """
+    Compute identifier-level similarity using embedding cosine similarity.
+
+    Steps:
+    1. Extract identifiers from both texts using tree-sitter
+    2. Build naming summary strings
+    3. Embed both summaries in a single API call
+    4. Return cosine similarity as name_score
+
+    Returns:
+        name_score: float in [0, 1]
+    """
+    cand_ids = _extract_identifiers_ts(candidate_text)
+    ref_ids = _extract_identifiers_ts(reference_text)
+
+    cand_summary = _build_summary_text(cand_ids)
+    ref_summary = _build_summary_text(ref_ids)
+
+    if not cand_summary or not ref_summary:
+        return 0.0
+
+    emb_cand, emb_ref = _embed_two(cand_summary, ref_summary)
+    return _cosine_similarity(emb_cand, emb_ref)
+
+
+# ---- Main Reward Function ----
+
+
+def compute_score(solution_str, ground_truth, extra_info=None):
+    """
+    Compute reward based on identifier naming similarity using Qwen3 embeddings.
+    Returns score^2 to sharpen the reward signal.
+    """
+    if not isinstance(solution_str, str):
+        solution_str = "" if solution_str is None else str(solution_str)
+    if not isinstance(ground_truth, str):
+        ground_truth = "" if ground_truth is None else str(ground_truth)
+
+    candidate_text = solution_str.strip()
+    reference_text = ground_truth.strip()
+
+    if not candidate_text or not reference_text:
+        return 0.0
+
+    name_score = _identifier_similarity_ts(candidate_text, reference_text)
+    return name_score * name_score
--- a/sk2decompile/verl/SK2DECOMPILE/reward_functions/exe_type.py
+++ b/sk2decompile/verl/SK2DECOMPILE/reward_functions/exe_type.py
@ -0,0 +1,85 @@
+"""
+Reference reward function: Compilability + Placeholder Identifier Matching.
+
+This is a reference implementation of the Structure Recovery reward (Eq. 3)
+described in the SK2Decompile paper (arXiv:2509.22114, Section 3.5).
+
+Evaluates decompiled C code by:
+1. Checking if the code compiles with gcc (compilability score: 0 or 1)
+2. Extracting placeholder identifier patterns (func*, type*, var*, field*) from
+   both candidate and ground truth, computing Jaccard similarity
+
+Final score = type_score + compilability_score if compilable, else 0.
+"""
+
+import os
+import re
+import subprocess
+import tempfile
+
+
+def compute_score(solution_str, ground_truth, extra_info=None):
+    type_score_value, _ = type_score(solution_str, ground_truth, extra_info)
+    compileable_score_value = compileable_score(solution_str, ground_truth, extra_info)
+
+    if compileable_score_value == 0.0:
+        return 0.0
+
+    return type_score_value + compileable_score_value
+
+
+def type_score(solution_str, ground_truth, extra_info=None):
+    """
+    Compute Jaccard similarity over identifier patterns (func*, type*, var*, field*)
+    between candidate and ground truth code.
+
+    Returns:
+        (jaccard_similarity, total_term_count)
+    """
+    patterns = [r'\bfunc\w*\b', r'\btype\w*\b', r'\bvar\w*\b', r'\bfield\w*\b']
+
+    def extract_terms(text):
+        terms = set()
+        for pattern in patterns:
+            terms.update(re.findall(pattern, text))
+        return terms
+
+    solution_terms = extract_terms(solution_str)
+    ground_truth_terms = extract_terms(ground_truth)
+
+    intersection = solution_terms.intersection(ground_truth_terms)
+    union = solution_terms.union(ground_truth_terms)
+
+    jaccard_similarity = len(intersection) / len(union) if union else 0.0
+    return jaccard_similarity, len(solution_terms) + len(ground_truth_terms)
+
+
+def compileable_score(solution_str, ground_truth, extra_info=None):
+    """
+    Check if the candidate C code compiles with gcc.
+
+    Args:
+        extra_info: Optional dict with 'header' key containing C header declarations.
+
+    Returns:
+        1.0 if compilable, 0.0 otherwise.
+    """
+    with tempfile.TemporaryDirectory() as tmpdir:
+        try:
+            source_file = os.path.join(tmpdir, "temp.c")
+            object_file = os.path.join(tmpdir, "temp.o")
+            header = extra_info.get('header', '') if extra_info else ''
+
+            with open(source_file, 'w') as f:
+                f.write(f'{header}\n\n{solution_str}')
+
+            proc = subprocess.run(
+                ['gcc', '-c', source_file, '-o', object_file],
+                stdout=subprocess.PIPE,
+                stderr=subprocess.PIPE,
+                timeout=5,
+                check=True
+            )
+            return 1.0 if proc.returncode == 0 else 0.0
+        except Exception:
+            return 0.0
--- a/sk2decompile/verl/SK2DECOMPILE/reward_functions/sim_exe.py
+++ b/sk2decompile/verl/SK2DECOMPILE/reward_functions/sim_exe.py
@ -0,0 +1,69 @@
+"""
+Reference reward function: Compilability + Word-level Jaccard Similarity.
+
+This is a reference implementation of an alternative Structure Recovery reward
+for the SK2Decompile RL training pipeline (arXiv:2509.22114, Section 3.5).
+
+Evaluates decompiled C code by:
+1. Computing word-level Jaccard similarity between candidate and ground truth
+2. Checking if the code compiles with gcc (compilability score: 0 or 1)
+
+Final score = jaccard_similarity + compilability_score if jaccard > 0.5, else 0.
+"""
+
+import os
+import subprocess
+import tempfile
+
+
+def compute_score(solution_str, ground_truth, extra_info=None):
+    sim_score = jaccard_similarity(solution_str, ground_truth)
+    compile_score = compileable_score(solution_str, ground_truth, extra_info)
+
+    if sim_score > 0.5:
+        return sim_score + compile_score
+    return 0
+
+
+def jaccard_similarity(str1, str2):
+    """Compute word-level Jaccard similarity between two strings."""
+    set1 = set(str1.lower().split())
+    set2 = set(str2.lower().split())
+
+    intersection = len(set1.intersection(set2))
+    union = len(set1.union(set2))
+
+    if union == 0:
+        return 0.0
+    return intersection / union
+
+
+def compileable_score(solution_str, ground_truth, extra_info=None):
+    """
+    Check if the candidate C code compiles with gcc.
+
+    Args:
+        extra_info: Optional dict with 'header' key containing C header declarations.
+
+    Returns:
+        1.0 if compilable, 0.0 otherwise.
+    """
+    with tempfile.TemporaryDirectory() as tmpdir:
+        try:
+            source_file = os.path.join(tmpdir, "temp.c")
+            object_file = os.path.join(tmpdir, "temp.o")
+            header = extra_info.get('header', '') if extra_info else ''
+
+            with open(source_file, 'w') as f:
+                f.write(f'{header}\n\n{solution_str}')
+
+            proc = subprocess.run(
+                ['gcc', '-c', source_file, '-o', object_file],
+                stdout=subprocess.PIPE,
+                stderr=subprocess.PIPE,
+                timeout=5,
+                check=True
+            )
+            return 1.0 if proc.returncode == 0 else 0.0
+        except Exception:
+            return 0.0
--- a/sk2decompile/verl/SK2DECOMPILE/scripts/run_ident_rl.sh
+++ b/sk2decompile/verl/SK2DECOMPILE/scripts/run_ident_rl.sh
@ -0,0 +1,120 @@
+#!/usr/bin/env bash
+# =============================================================================
+# SK2Decompile - Reference Script: Identifier Naming RL Training
+# =============================================================================
+# Reference GRPO training script for the Identifier Naming model.
+# Based on the VERL framework (v0.4.1) with embedding-based rewards.
+#
+# This is a reference configuration — please adjust parameters according
+# to your hardware setup and dataset. See the paper (arXiv:2509.22114,
+# Section 3.5) for the reward formulation details.
+#
+# Prerequisites:
+#   - VERL framework installed (https://github.com/volcengine/verl)
+#   - Reward functions integrated into verl/utils/reward_score/ (see README.md)
+#   - An OpenAI-compatible embedding server running locally
+#     e.g.: python -m vllm.entrypoints.openai.api_server \
+#               --model Qwen3-Embedding-0.6B --port 8000
+#   - tree-sitter, tree-sitter-c, openai packages installed
+#
+# Usage:
+#   bash run_ident_rl.sh
+# =============================================================================
+set -x
+
+# ---- User Configuration ----
+EMBEDDING_VARIANT="gte"  # Options: "gte" or "qwen3"
+
+VERL_DIR="<YOUR_VERL_DIR>"
+VENV_PATH="<YOUR_VENV_PATH>"
+MODEL_PATH="<YOUR_MODEL_PATH>"           # e.g., path to sk2decompile-ident-6.7b
+TRAIN_DATA="<YOUR_DATA_PATH>/train.parquet"
+VAL_DATA="<YOUR_DATA_PATH>/valid.parquet"
+
+# WandB configuration
+WANDB_API_KEY_VAL="<YOUR_WANDB_API_KEY>"
+WANDB_ENTITY_VAL="<YOUR_WANDB_ENTITY>"
+WANDB_PROJECT_VAL="<YOUR_WANDB_PROJECT>"
+
+# Training parameters
+NUM_NODES=1
+GPUS_PER_NODE=8
+KL_COEF=0.02
+TOTAL_EPOCHS=2
+SAVE_FREQ=25
+TEST_FREQ=25
+
+# ---- Environment Setup ----
+source ${VENV_PATH}/bin/activate
+
+export UCX_IB_PCI_RELAXED_ORDERING=1
+export NCCL_IB_PCI_RELAXED_ORDERING=1
+export NCCL_IB_TIMEOUT=22
+export NCCL_DEBUG=INFO
+export TRANSFORMERS_OFFLINE=0
+export TORCH_NCCL_AVOID_RECORD_STREAMS=1
+export NCCL_NVLS_ENABLE=0
+export NCCL_IB_DISABLE=0
+export CUDA_DEVICE_MAX_CONNECTIONS=1
+
+# ---- Task & Logging ----
+TASK_NAME="sk2decompile_ident-rl-${EMBEDDING_VARIANT}"
+LOG_DIR="${VERL_DIR}/logs"
+mkdir -p "$LOG_DIR"
+LOG_FILE="$LOG_DIR/${TASK_NAME}.log"
+ERR_FILE="$LOG_DIR/${TASK_NAME}.err"
+
+# ---- WandB ----
+export WANDB_API_KEY=${WANDB_API_KEY_VAL}
+export WANDB_ENTITY=${WANDB_ENTITY_VAL}
+export WANDB_PROJECT=${WANDB_PROJECT_VAL}
+export WANDB_NAME=${TASK_NAME}
+export WANDB_MODE='online'
+wandb login --relogin $WANDB_API_KEY
+
+# ---- Launch GRPO Training ----
+python3 -m verl.trainer.main_ppo --config-path=config \
+    --config-name='ppo_trainer-lm4dc.yaml' \
+    algorithm.adv_estimator=grpo \
+    data.train_files=${TRAIN_DATA} \
+    data.val_files=${VAL_DATA} \
+    data.train_batch_size=128 \
+    data.max_prompt_length=1024 \
+    data.max_response_length=2048 \
+    data.filter_overlong_prompts=True \
+    data.truncation='error' \
+    actor_rollout_ref.model.path=${MODEL_PATH} \
+    actor_rollout_ref.actor.optim.lr=1e-6 \
+    actor_rollout_ref.model.use_remove_padding=True \
+    actor_rollout_ref.actor.ppo_mini_batch_size=32 \
+    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
+    actor_rollout_ref.actor.use_kl_loss=True \
+    actor_rollout_ref.actor.kl_loss_coef=${KL_COEF} \
+    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
+    actor_rollout_ref.actor.entropy_coeff=0 \
+    actor_rollout_ref.model.enable_gradient_checkpointing=False \
+    actor_rollout_ref.actor.fsdp_config.param_offload=False \
+    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
+    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \
+    actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
+    actor_rollout_ref.rollout.name=vllm \
+    actor_rollout_ref.rollout.gpu_memory_utilization=0.80 \
+    actor_rollout_ref.rollout.n=16 \
+    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \
+    actor_rollout_ref.ref.fsdp_config.param_offload=True \
+    algorithm.use_kl_in_reward=False \
+    trainer.critic_warmup=0 \
+    trainer.logger=['console','wandb'] \
+    trainer.project_name='sk2decompile_rl' \
+    trainer.experiment_name=$TASK_NAME \
+    trainer.default_local_dir=${VERL_DIR}/checkpoints/${TASK_NAME} \
+    trainer.n_gpus_per_node=${GPUS_PER_NODE} \
+    trainer.nnodes=${NUM_NODES} \
+    trainer.save_freq=${SAVE_FREQ} \
+    trainer.test_freq=${TEST_FREQ} \
+    trainer.total_epochs=${TOTAL_EPOCHS} "$@" \
+    > >(tee -a "$LOG_FILE") \
+    2> >(tee -a "$ERR_FILE" >&2)
+
+echo "STDOUT saved to: $LOG_FILE"
+echo "STDERR saved to: $ERR_FILE"
--- a/sk2decompile/verl/SK2DECOMPILE/scripts/run_struct_rl.sh
+++ b/sk2decompile/verl/SK2DECOMPILE/scripts/run_struct_rl.sh
@ -0,0 +1,117 @@
+#!/usr/bin/env bash
+# =============================================================================
+# SK2Decompile - Reference Script: Structure Recovery RL Training
+# =============================================================================
+# Reference GRPO training script for the Structure Recovery model.
+# Based on the VERL framework (v0.4.1) with compiler-based rewards.
+#
+# This is a reference configuration — please adjust parameters according
+# to your hardware setup and dataset. See the paper (arXiv:2509.22114,
+# Section 3.5) for the reward formulation details.
+#
+# Prerequisites:
+#   - VERL framework installed (https://github.com/volcengine/verl)
+#   - Reward functions integrated into verl/utils/reward_score/ (see README.md)
+#   - gcc available for compilability checking
+#
+# Usage:
+#   bash run_struct_rl.sh
+# =============================================================================
+set -x
+
+# ---- User Configuration ----
+REWARD_VARIANT="exe_type"  # Options: "exe_type" or "sim_exe"
+
+VERL_DIR="<YOUR_VERL_DIR>"
+VENV_PATH="<YOUR_VENV_PATH>"
+MODEL_PATH="<YOUR_MODEL_PATH>"           # e.g., path to sk2decompile-struct-6.7b
+TRAIN_DATA="<YOUR_DATA_PATH>/train.parquet"
+VAL_DATA="<YOUR_DATA_PATH>/valid.parquet"
+
+# WandB configuration
+WANDB_API_KEY_VAL="<YOUR_WANDB_API_KEY>"
+WANDB_ENTITY_VAL="<YOUR_WANDB_ENTITY>"
+WANDB_PROJECT_VAL="<YOUR_WANDB_PROJECT>"
+
+# Training parameters
+NUM_NODES=2
+GPUS_PER_NODE=8
+KL_COEF=0.01
+TOTAL_EPOCHS=2
+SAVE_FREQ=25
+TEST_FREQ=25
+
+# ---- Environment Setup ----
+source ${VENV_PATH}/bin/activate
+
+export UCX_IB_PCI_RELAXED_ORDERING=1
+export NCCL_IB_PCI_RELAXED_ORDERING=1
+export NCCL_IB_TIMEOUT=22
+export NCCL_DEBUG=INFO
+export TRANSFORMERS_OFFLINE=0
+export TORCH_NCCL_AVOID_RECORD_STREAMS=1
+export NCCL_NVLS_ENABLE=0
+export NCCL_IB_DISABLE=0
+export CUDA_DEVICE_MAX_CONNECTIONS=1
+
+# ---- Task & Logging ----
+TASK_NAME="sk2decompile_struct-rl-${REWARD_VARIANT}"
+LOG_DIR="${VERL_DIR}/logs"
+mkdir -p "$LOG_DIR"
+LOG_FILE="$LOG_DIR/${TASK_NAME}.log"
+ERR_FILE="$LOG_DIR/${TASK_NAME}.err"
+
+# ---- WandB ----
+export WANDB_API_KEY=${WANDB_API_KEY_VAL}
+export WANDB_ENTITY=${WANDB_ENTITY_VAL}
+export WANDB_PROJECT=${WANDB_PROJECT_VAL}
+export WANDB_NAME=${TASK_NAME}
+export WANDB_MODE='online'
+wandb login --relogin $WANDB_API_KEY
+
+# ---- Launch GRPO Training ----
+python3 -m verl.trainer.main_ppo --config-path=config \
+    --config-name='ppo_trainer-lm4dc.yaml' \
+    algorithm.adv_estimator=grpo \
+    data.train_files=${TRAIN_DATA} \
+    data.val_files=${VAL_DATA} \
+    data.train_batch_size=128 \
+    data.max_prompt_length=1024 \
+    data.max_response_length=2048 \
+    data.filter_overlong_prompts=True \
+    data.truncation='error' \
+    actor_rollout_ref.model.path=${MODEL_PATH} \
+    actor_rollout_ref.actor.optim.lr=1e-6 \
+    actor_rollout_ref.model.use_remove_padding=True \
+    actor_rollout_ref.actor.ppo_mini_batch_size=32 \
+    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
+    actor_rollout_ref.actor.use_kl_loss=True \
+    actor_rollout_ref.actor.kl_loss_coef=${KL_COEF} \
+    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
+    actor_rollout_ref.actor.entropy_coeff=0 \
+    actor_rollout_ref.model.enable_gradient_checkpointing=False \
+    actor_rollout_ref.actor.fsdp_config.param_offload=False \
+    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
+    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \
+    actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
+    actor_rollout_ref.rollout.name=vllm \
+    actor_rollout_ref.rollout.gpu_memory_utilization=0.80 \
+    actor_rollout_ref.rollout.n=16 \
+    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \
+    actor_rollout_ref.ref.fsdp_config.param_offload=True \
+    algorithm.use_kl_in_reward=False \
+    trainer.critic_warmup=0 \
+    trainer.logger=['console','wandb'] \
+    trainer.project_name='sk2decompile_rl' \
+    trainer.experiment_name=$TASK_NAME \
+    trainer.default_local_dir=${VERL_DIR}/checkpoints/${TASK_NAME} \
+    trainer.n_gpus_per_node=${GPUS_PER_NODE} \
+    trainer.nnodes=${NUM_NODES} \
+    trainer.save_freq=${SAVE_FREQ} \
+    trainer.test_freq=${TEST_FREQ} \
+    trainer.total_epochs=${TOTAL_EPOCHS} "$@" \
+    > >(tee -a "$LOG_FILE") \
+    2> >(tee -a "$ERR_FILE" >&2)
+
+echo "STDOUT saved to: $LOG_FILE"
+echo "STDERR saved to: $ERR_FILE"