Merge pull request #73 from BaiRiDreamer/main

Merge VERL RL training + BringUpBench evaluation pipeline
This commit is contained in:
albertan017 2026-02-12 11:02:03 +08:00 committed by GitHub
commit 85b364bf09
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
30 changed files with 7308 additions and 6 deletions

View file

@ -34,8 +34,19 @@ Binary/Pseudo-code → [Phase 1: Skeleton] → Normalized IR → [Phase 2: Skin]
SK2Decompile/
├── Preprocess/ # Data preprocessing and normalization tools
├── LLaMA-Factory/ # Supervised Fine-Tuning (SFT) implementation
├── verl/ # Reinforcement Learning (RL) with compiler-based rewards
├── verl/ # Reinforcement Learning (RL) with VERL/GRPO
│ └── SK2DECOMPILE/
│ ├── data/ # Example RL training data
│ ├── reward_functions/ # Custom reward functions (4 variants)
│ ├── scripts/ # Training launch scripts
│ └── README.md # Detailed RL documentation
├── evaluation/ # Comprehensive evaluation suite
│ ├── bringupbench/ # BringUpBench evaluation (Section A.6)
│ │ ├── scripts/ # Pipeline scripts (compile, decompile, evaluate)
│ │ ├── data/ # Pre-built function maps and inference results
│ │ ├── reports/ # Evaluation result summaries
│ │ └── README.md # Detailed BringUpBench documentation
│ └── ... # HumanEval, MBPP evaluation scripts
└── README.md # This file
```
@ -107,20 +118,32 @@ llamafactory-cli train LLaMA-Factory/SK2DECOMPILE/train/norm2code-example.yaml
### Phase 2: Reinforcement Learning (RL)
Fine-tune models using compiler-based rewards for improved correctness:
After SFT, we apply GRPO (Group Relative Policy Optimization) to further align each model with task-specific objectives (Section 3.5 of the paper):
- **Structure Recovery**
- **Identifier Naming**
Our RL training is based on [VERL](https://github.com/volcengine/verl) v0.4.1 (Sheng et al., 2024).
#### Setup VERL
```bash
cd ../verl
# Follow installation instructions in verl/README.md
git clone https://github.com/volcengine/verl.git
cd verl && git checkout v0.4.1 && pip install -e .
pip install tree-sitter==0.24.0 tree-sitter-c==0.23.4 openai
```
#### Run RL Training
```bash
bash verl/SK2DECOMPILE/train/sk2decompile-rl.sh
# Structure Recovery RL
bash verl/SK2DECOMPILE/scripts/run_struct_rl.sh
# Identifier Naming RL (requires embedding server)
bash verl/SK2DECOMPILE/scripts/run_ident_rl.sh
```
**RL Training Data:** `verl/SK2DECOMPILE/data/sk2decompile-rl-examples.parquet`
See [`verl/SK2DECOMPILE/README.md`](verl/SK2DECOMPILE/README.md) for the full reproduction guide, including how to integrate reward functions into VERL and prepare training data.
**RL Training Data:** `verl/SK2DECOMPILE/data/sk2decompile-rl-examples.jsonl`
## Evaluation
```
@ -181,6 +204,12 @@ python gpt_judge.py --json_file your_json_file_path
--api_key your_openai_api_key
```
**BringUpBench Evaluation** (Section A.6 of the paper)
We also evaluate on [BringUpBench](https://github.com/toddmaustin/bringup-bench) — 90 self-contained C programs with 505 functions across O0O3. SK²Decompile achieves **42.3% compilation rate** and **27.0% re-executability rate**, compared to IDA Pro's 23.6% / 21.7%.
See [`evaluation/bringupbench/README.md`](evaluation/bringupbench/README.md) for the full reproduction pipeline, pre-built data, and detailed results.
## 📊 Results
Our approach achieves state-of-the-art performance:

View file

@ -0,0 +1,249 @@
# SK²Decompile — Evaluation on BringUpBench
This directory contains the evaluation pipeline for SK²Decompile on the [BringUpBench](https://github.com/toddmaustin/bringup-bench) benchmark, as described in **Section A.6** of our paper:
> **SK²Decompile: LLM-based Two-Phase Binary Decompilation from Skeleton to Skin**
> [[arXiv:2509.22114]](https://arxiv.org/abs/2509.22114)
## Overview
[BringUpBench](https://github.com/toddmaustin/bringup-bench) (Austin, 2024) is a benchmark suite of **90 self-contained C programs** designed for bringing up newly designed CPUs, accelerators, compilers, and operating systems. It has **zero library dependencies** — all programs rely solely on a built-in `libmin` library and only 4 system calls — making it an ideal, standardized test bed for decompilation evaluation on complex, real-world binaries.
We compiled, decompiled, and executed all projects across optimization levels O0O3, yielding **505 functions** in total. We compared SK²Decompile against the industry-standard rule-based decompiler, **IDA Pro** (Hex-Rays).
## Results
### SK²Decompile vs IDA Pro
| Opt Level | Functions | SK²Decompile Compilable | SK²Decompile Executable | IDA Compilable | IDA Executable |
|:---------:|:---------:|:-----------------------:|:-----------------------:|:--------------:|:--------------:|
| O0 | 382 | **50.26%** | **49.48%** | — | — |
| O1 | 379 | **40.90%** | **39.05%** | — | — |
| O2 | 368 | **37.77%** | **34.24%** | — | — |
| O3 | 359 | **31.75%** | **29.53%** | — | — |
| **Avg** | **1488** | **42.3%** | **27.0%** | **23.6%** | **21.7%** |
> The average row reports the paper's aggregate numbers (Table 8 in Section A.6). Per-opt-level IDA baselines are not separately reported in the paper. Detailed per-benchmark breakdowns are available in `reports/`.
## Directory Structure
```
bringupbench/
├── README.md # This file
├── config.env # Environment configuration (paths)
├── scripts/
│ ├── build-host-opt-levels.sh # Step 1: Compile benchmarks at O0-O3
│ ├── decompile-all-pseudo.sh # Step 2: IDA Pro batch decompilation
│ ├── dump_pseudo.py # IDA headless decompilation helper
│ ├── disasm-all-objdump.sh # Step 3: objdump batch disassembly
│ ├── build-func-maps.py # Step 4: Build function-level mappings
│ ├── clean-all-benchmarks.sh # Utility: clean all build artifacts
│ └── eval_infer_out.py # Step 5: Automated evaluation
├── data/
│ ├── func_maps/ # Pre-built function mappings (JSONL)
│ │ ├── merged.O0.func_map.jsonl # O0: 493 functions
│ │ ├── merged.O1.func_map.jsonl # O1: 449 functions
│ │ ├── merged.O2.func_map.jsonl # O2: 441 functions
│ │ └── merged.O3.func_map.jsonl # O3: 439 functions
│ └── infer_results/ # SK²Decompile inference results
│ ├── merged.O0.func_map.infer.jsonl # O0: 382 evaluated functions
│ ├── merged.O1.func_map.infer.jsonl # O1: 379 evaluated functions
│ ├── merged.O2.func_map.infer.jsonl # O2: 368 evaluated functions
│ └── merged.O3.func_map.infer.jsonl # O3: 359 evaluated functions
└── reports/ # Evaluation result summaries
├── O0_results.md
├── O1_results.md
├── O2_results.md
└── O3_results.md
```
## Reproduction Pipeline
Our evaluation pipeline consists of five steps, as described in the paper:
```
Source (.c)
▼ Step 1: Compilation
Binary (.host.O0 ~ .host.O3)
├──▶ Step 2: Baseline Extraction (IDA Pro) ──▶ Pseudocode (.pseudo)
├──▶ Step 3: Ground Truth Mapping ──▶ Function Maps (.func_map.jsonl)
▼ Step 4: Decompilation (SK²Decompile)
Inferred C code (.func_map.infer.jsonl)
▼ Step 5: Validation
Evaluation Reports (reports/)
```
### Prerequisites
| Dependency | Purpose | Installation |
|------------|---------|-------------|
| [Bringup-Bench](https://github.com/toddmaustin/bringup-bench) | Upstream benchmark suite (90 C programs) | `git clone https://github.com/toddmaustin/bringup-bench.git` |
| GCC | Compile benchmarks | `apt install gcc` |
| IDA Pro + Hex-Rays | Decompile binaries to pseudocode | Commercial software |
| objdump (binutils) | Disassemble binaries | `apt install binutils` |
| clang-format | Pseudocode normalization | `apt install clang-format` |
| Python >= 3.10 | Run evaluation scripts | `apt install python3` |
### Quick Start (Evaluation Only)
If you only want to reproduce the evaluation step (Step 5), the pre-built data is included in `data/`. You only need the Bringup-Bench source repository:
```bash
# 1. Clone Bringup-Bench
git clone https://github.com/toddmaustin/bringup-bench.git
# 2. Configure paths
cd bringupbench
vim config.env # Set BENCH_REPO_ROOT to your bringup-bench path
# 3. Run evaluation (e.g., O0)
python3 scripts/eval_infer_out.py data/infer_results/merged.O0.func_map.infer.jsonl
# 4. Check results
cat reports/O0_results.md
```
### Full Pipeline (From Scratch)
To reproduce the entire pipeline from compilation to evaluation:
```bash
cd bringupbench
vim config.env # Set BENCH_REPO_ROOT and IDA_BIN
```
**Step 1: Compile benchmarks at O0O3**
Build all 90 Bringup-Bench programs at four optimization levels, producing `<name>.host.O{0,1,2,3}` binaries.
```bash
scripts/build-host-opt-levels.sh
```
**Step 2: Baseline Extraction (IDA Pro)**
Use IDA Pro in headless mode to decompile all binaries, producing `.pseudo` files with Hex-Rays pseudocode.
```bash
scripts/decompile-all-pseudo.sh
```
Each function is delimited by `/* function_name @ 0xADDRESS */` in the output.
**Step 3: Ground Truth Mapping**
Parse source code, pseudocode, and assembly; match functions by name across all three representations; normalize pseudocode (remove IDA-specific types, hex-to-decimal conversion, clang-format).
```bash
# Disassemble (optional, for assembly mapping)
scripts/disasm-all-objdump.sh
# Build function-level mappings
python3 scripts/build-func-maps.py
```
Output: per-binary `.func_map.jsonl` files. Merge them per optimization level:
```bash
cat $BENCH_REPO_ROOT/*/*.host.O0.func_map.jsonl > data/func_maps/merged.O0.func_map.jsonl
cat $BENCH_REPO_ROOT/*/*.host.O1.func_map.jsonl > data/func_maps/merged.O1.func_map.jsonl
cat $BENCH_REPO_ROOT/*/*.host.O2.func_map.jsonl > data/func_maps/merged.O2.func_map.jsonl
cat $BENCH_REPO_ROOT/*/*.host.O3.func_map.jsonl > data/func_maps/merged.O3.func_map.jsonl
```
**Step 4: Decompilation (SK²Decompile Inference)**
Feed the `pseudo_normalize` field from the function maps to SK²Decompile. The two-phase inference pipeline (see `../sk2decompile_inf.py`) produces C code for each function. Results should be written into the JSONL with the `pseudo.content-fix` field containing the final decompiled function body.
```bash
# Example: use the main SK²Decompile inference pipeline
cd ../ # back to sk2decompile/evaluation/
python3 sk2decompile_inf.py \
--dataset_path bringupbench/data/func_maps/merged.O0.func_map.jsonl \
--model_path LLM4Binary/sk2decompile-struct-6.7b \
--recover_model_path LLM4Binary/sk2decompile-ident-6.7b
```
**Step 5: Validation**
For each function, replace the original source with the decompiled output, rebuild in an isolated workspace, and run the project's test suite.
```bash
python3 scripts/eval_infer_out.py data/infer_results/merged.O0.func_map.infer.jsonl \
--jobs 16 \
--command-timeout 20
```
Common options:
```bash
--jobs N # Parallel workers (default: 96)
--command-timeout S # Timeout per make command in seconds (default: 20)
--limit N # Process only first N cases (for debugging)
--keep-workspaces # Keep temporary build directories
```
## Data Format
### func_map.jsonl (Function Mappings)
Each line is a JSON object containing the source, pseudocode, and assembly for one function:
```jsonc
{
"source": {
"path": "ackermann/ackermann.c", // Source file (relative to BENCH_REPO_ROOT)
"function_name": "ackermann", // Function name
"content": "int ackermann(int m, ...) { ... }\n" // Complete function body
},
"pseudo": {
"path": "ackermann/ackermann.host.O0.pseudo",
"function_name": "ackermann",
"address": "0x11e9", // Function address in binary
"label": "ackermann",
"content": "__int64 __fastcall ackermann(...) { ... }\n" // Raw IDA pseudocode
},
"pseudo_normalize": "int ackermann(...) { ... }", // Normalized pseudocode
"binary": "ackermann/ackermann.host.O0", // Binary file path
"assembly": "<ackermann>:\npush %rbp\n..." // Cleaned objdump output
}
```
### func_map.infer.jsonl (Inference Results)
Extends `func_map.jsonl` with SK²Decompile inference outputs:
```jsonc
{
// ... all fields from func_map.jsonl ...
"pseudo": {
// ... all fields above, plus:
"content-fix": "..." // Final decompiled function (used for source replacement)
},
"infer-out-model1": "...", // Phase 1 (Structure Recovery) raw output
"infer-out-model2": "...", // Phase 2 (Identifier Naming) raw output
"pseudo_normalize-fix": "..." // Corrected normalized pseudocode
}
```
## Evaluation Metrics
| Metric | Definition |
|--------|-----------|
| **Replacement Rate** | Fraction of functions where the decompiled output can be located and substituted into the original source file |
| **Compilable Rate** | Fraction of functions where the modified source compiles successfully (`make build`) |
| **Executable Rate** | Fraction of functions where the compiled program passes its test suite (`make test`, output matches reference) |
The evaluation uses BringUpBench's own build infrastructure (`Makefile`, `libmin`, `libtarg`) to compile and validate. Each function is tested in an isolated workspace to prevent cross-contamination.
## Notes
- BringUpBench programs are self-contained with zero external dependencies, making them ideal for evaluating decompilation without the confounding factor of missing headers or libraries.
- The `func_maps/` data contains more functions than `infer_results/` because some functions are filtered during inference (e.g., exceeding token limits).
- All scripts load paths from `config.env`. You can also override via environment variables or CLI arguments (priority: CLI > env > config.env).
- For the complete SK²Decompile methodology and other benchmark results (HumanEval, MBPP, ExeBench, GitHub2025), see the [main README](../../README.md).

View file

@ -0,0 +1,14 @@
# BringUpBench Evaluation — Environment Configuration
# All scripts resolve paths from this file.
# Values can be overridden by same-named environment variables or CLI arguments.
# Priority: CLI args > environment variables > config.env
# Absolute path to the Bringup-Bench repository
# Clone from: https://github.com/toddmaustin/bringup-bench.git
BENCH_REPO_ROOT=/path/to/bringup-bench
# IDA Pro command-line executable (required for Step 2: decompilation)
IDA_BIN=/path/to/idat
# Default build target (host = native x86-64 Linux)
DEFAULT_TARGET=host

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View file

@ -0,0 +1,296 @@
# Infer-Out Model 2 Evaluation (merged.O0.func_map.infer-host)
- Timestamp: 20251119-171008
- Source JSONL: merged.O0.func_map.infer.jsonl
- Target: host
- Total cases: 382
- Replacement success: 382 (100.00%)
- Compilable: 192 (50.26%)
- Executable: 189 (49.48%)
## Benchmark Breakdown
| Benchmark | Cases | Replacement% | Build% | Exec% |
| --- | --- | --- | --- | --- |
| ackermann | 2 | 100.00% | 50.00% | 50.00% |
| aes | 9 | 100.00% | 33.33% | 33.33% |
| anagram | 12 | 100.00% | 58.33% | 58.33% |
| audio-codec | 4 | 100.00% | 50.00% | 50.00% |
| avl-tree | 14 | 100.00% | 35.71% | 35.71% |
| banner | 1 | 100.00% | 0.00% | 0.00% |
| bit-kernels | 5 | 100.00% | 100.00% | 100.00% |
| blake2b | 6 | 100.00% | 16.67% | 16.67% |
| bloom-filter | 3 | 100.00% | 33.33% | 33.33% |
| boyer-moore-search | 3 | 100.00% | 0.00% | 0.00% |
| bubble-sort | 2 | 100.00% | 100.00% | 100.00% |
| c-interp | 10 | 100.00% | 70.00% | 70.00% |
| ccmac | 2 | 100.00% | 50.00% | 50.00% |
| checkers | 15 | 100.00% | 80.00% | 80.00% |
| cipher | 3 | 100.00% | 33.33% | 33.33% |
| congrad | 6 | 100.00% | 66.67% | 66.67% |
| connect4-minimax | 13 | 100.00% | 61.54% | 61.54% |
| convex-hull | 4 | 100.00% | 75.00% | 75.00% |
| dhrystone | 5 | 100.00% | 60.00% | 60.00% |
| distinctness | 2 | 100.00% | 0.00% | 0.00% |
| fft-int | 4 | 100.00% | 50.00% | 50.00% |
| flood-fill | 2 | 100.00% | 50.00% | 50.00% |
| frac-calc | 10 | 100.00% | 60.00% | 60.00% |
| fuzzy-match | 4 | 100.00% | 25.00% | 25.00% |
| fy-shuffle | 4 | 100.00% | 50.00% | 50.00% |
| gcd-list | 2 | 100.00% | 0.00% | 0.00% |
| grad-descent | 4 | 100.00% | 75.00% | 75.00% |
| graph-tests | 19 | 100.00% | 21.05% | 21.05% |
| hanoi | 2 | 100.00% | 50.00% | 50.00% |
| heapsort | 2 | 100.00% | 50.00% | 50.00% |
| heat-calc | 1 | 100.00% | 0.00% | 0.00% |
| huff-encode | 12 | 100.00% | 91.67% | 91.67% |
| idct-alg | 4 | 100.00% | 50.00% | 50.00% |
| indirect-test | 2 | 100.00% | 50.00% | 50.00% |
| k-means | 6 | 100.00% | 100.00% | 100.00% |
| kadane | 2 | 100.00% | 50.00% | 50.00% |
| kepler | 7 | 100.00% | 28.57% | 28.57% |
| knapsack | 3 | 100.00% | 33.33% | 33.33% |
| knights-tour | 3 | 100.00% | 66.67% | 66.67% |
| life | 14 | 100.00% | 78.57% | 71.43% |
| longdiv | 7 | 100.00% | 71.43% | 71.43% |
| lu-decomp | 3 | 100.00% | 33.33% | 33.33% |
| lz-compress | 2 | 100.00% | 100.00% | 100.00% |
| mandelbrot | 1 | 100.00% | 0.00% | 0.00% |
| matmult | 1 | 100.00% | 0.00% | 0.00% |
| max-subseq | 2 | 100.00% | 0.00% | 0.00% |
| mersenne | 3 | 100.00% | 0.00% | 0.00% |
| minspan | 8 | 100.00% | 62.50% | 62.50% |
| monte-carlo | 1 | 100.00% | 0.00% | 0.00% |
| murmur-hash | 2 | 100.00% | 0.00% | 0.00% |
| n-queens | 3 | 100.00% | 66.67% | 66.67% |
| natlog | 1 | 100.00% | 0.00% | 0.00% |
| nbody-sim | 1 | 100.00% | 0.00% | 0.00% |
| nr-solver | 1 | 100.00% | 100.00% | 100.00% |
| packet-filter | 3 | 100.00% | 33.33% | 33.33% |
| parrondo | 3 | 100.00% | 33.33% | 33.33% |
| pascal | 3 | 100.00% | 100.00% | 100.00% |
| pi-calc | 1 | 100.00% | 0.00% | 0.00% |
| primal-test | 3 | 100.00% | 0.00% | 0.00% |
| priority-queue | 5 | 100.00% | 80.00% | 80.00% |
| qsort-demo | 5 | 100.00% | 0.00% | 0.00% |
| qsort-test | 3 | 100.00% | 66.67% | 66.67% |
| quaternions | 4 | 100.00% | 0.00% | 0.00% |
| rabinkarp-search | 2 | 100.00% | 50.00% | 50.00% |
| rand-test | 2 | 100.00% | 0.00% | 0.00% |
| ransac | 2 | 100.00% | 50.00% | 50.00% |
| regex-parser | 11 | 100.00% | 72.73% | 63.64% |
| rho-factor | 4 | 100.00% | 75.00% | 75.00% |
| rle-compress | 2 | 100.00% | 50.00% | 50.00% |
| rsa-cipher | 4 | 100.00% | 0.00% | 0.00% |
| sat-solver | 5 | 100.00% | 60.00% | 60.00% |
| shortest-path | 3 | 100.00% | 66.67% | 66.67% |
| sieve | 2 | 100.00% | 50.00% | 50.00% |
| simple-grep | 1 | 100.00% | 0.00% | 0.00% |
| spelt2num | 1 | 100.00% | 0.00% | 0.00% |
| spirograph | 2 | 100.00% | 50.00% | 50.00% |
| sudoku-solver | 4 | 100.00% | 75.00% | 75.00% |
| tetris-sim | 12 | 100.00% | 75.00% | 75.00% |
| tiny-NN | 2 | 100.00% | 50.00% | 50.00% |
| topo-sort | 7 | 100.00% | 0.00% | 0.00% |
| totient | 4 | 100.00% | 75.00% | 75.00% |
| transcend | 3 | 100.00% | 66.67% | 66.67% |
| uniquify | 1 | 100.00% | 0.00% | 0.00% |
| vectors-3d | 8 | 100.00% | 12.50% | 12.50% |
| verlet | 4 | 100.00% | 25.00% | 0.00% |
| weekday | 2 | 100.00% | 0.00% | 0.00% |
## Compilation Failures
- ackermann/ackermann.c::main@0x13b9
- aes/aes.c::aes_decrypt@0x1a65
- aes/aes.c::aes_encrypt@0x1943
- aes/aes.c::inv_shift_rows@0x1396
- aes/aes.c::key_expansion@0x179a
- aes/aes.c::main@0x1b87
- aes/aes.c::shift_rows@0x12e5
- anagram/anagram.c::BuildMask@0x13e7
- anagram/anagram.c::BuildWord@0x17e5
- anagram/anagram.c::FindAnagram@0x1ba6
- anagram/anagram.c::ReadDict@0x121f
- anagram/anagram.c::main@0x1f71
- audio-codec/audio-codec.c::decode@0x12f5
- audio-codec/audio-codec.c::main@0x14b3
- avl-tree/avlcore.c::DeleteByElement@0x240f
- avl-tree/avlcore.c::DeleteByElementRecursive@0x21af
- avl-tree/avlcore.c::DeleteLeftMost@0x2086
- avl-tree/avlcore.c::FindByElement@0x1a46
- avl-tree/avlcore.c::Height@0x2475
- avl-tree/avlcore.c::Insert@0x1fc4
- avl-tree/avlcore.c::SingleLeftRotation@0x1b3a
- avl-tree/avl-tree.c::main@0x1399
- avl-tree/avl-tree.c::printTree@0x11e9
- banner/banner.c::main@0x11e9
- blake2b/blake2b.c::BLAKE2B@0x1a9b
- blake2b/blake2b.c::F@0x1502
- blake2b/blake2b.c::G@0x1258
- blake2b/blake2b.c::blake2b@0x1cd3
- blake2b/blake2b.c::test@0x2071
- bloom-filter/bloom-filter.c::bad_search@0x11e9
- bloom-filter/bloom-filter.c::main@0x123d
- boyer-moore-search/boyer-moore-search.c::badCharHeuristic@0x11e9
- boyer-moore-search/boyer-moore-search.c::main@0x146d
- boyer-moore-search/boyer-moore-search.c::search@0x126d
- c-interp/c-interp.c::eval@0x457c
- c-interp/c-interp.c::main@0x4e03
- c-interp/c-interp.c::next@0x11e9
- ccmac/ccmac.c::main@0x127e
- checkers/functions.c::fill_print_initial@0x1793
- checkers/functions.c::generate_node_children@0x29ff
- checkers/checkers.c::main@0x11e9
- cipher/cipher.c::encipher@0x11e9
- cipher/cipher.c::main@0x13cd
- congrad/congrad.c::cg_solve@0x1643
- congrad/congrad.c::main@0x199b
- connect4-minimax/connect4-minimax.c::init_board@0x11e9
- connect4-minimax/connect4-minimax.c::main@0x2299
- connect4-minimax/connect4-minimax.c::minimax@0x1d07
- connect4-minimax/connect4-minimax.c::play_game@0x20d1
- connect4-minimax/connect4-minimax.c::score_position@0x1a02
- convex-hull/convex-hull.c::main@0x13e7
- dhrystone/dhrystone.c::Proc_1@0x199f
- dhrystone/dhrystone.c::main@0x11e9
- distinctness/distinctness.c::isDistinct@0x11e9
- distinctness/distinctness.c::main@0x15d8
- fft-int/fft-int.c::db_from_ampl@0x1807
- fft-int/fft-int.c::fix_fft@0x11e9
- flood-fill/flood-fill.c::main@0x144d
- frac-calc/frac-calc.c::copyr@0x14d4
- frac-calc/frac-calc.c::divtokens@0x15b8
- frac-calc/frac-calc.c::help@0x13d9
- frac-calc/frac-calc.c::main@0x11e9
- fuzzy-match/fuzzy-match.c::compute_score@0x2379
- fuzzy-match/fuzzy-match.c::fuzzy_match_recurse@0x2283
- fuzzy-match/fuzzy-match.c::main@0x24b3
- fy-shuffle/fy-shuffle.c::main@0x1378
- fy-shuffle/fy-shuffle.c::rand_int@0x11e9
- gcd-list/gcd-list.c::gcd@0x11e9
- gcd-list/gcd-list.c::main@0x125e
- grad-descent/grad-descent.c::main@0x1413
- graph-tests/graph-tests.c::addEdge@0x12c9
- graph-tests/graph-tests.c::addVertex@0x19f6
- graph-tests/graph-tests.c::bfs@0x15ce
- graph-tests/graph-tests.c::bfs_test@0x16e9
- graph-tests/graph-tests.c::bubbleSort@0x1829
- graph-tests/graph-tests.c::createGraph@0x1221
- graph-tests/graph-tests.c::createNode@0x11e9
- graph-tests/graph-tests.c::createQueue@0x1372
- graph-tests/graph-tests.c::dequeue@0x145d
- graph-tests/graph-tests.c::enqueue@0x13d7
- graph-tests/graph-tests.c::insertAtTheBegin@0x17b1
- graph-tests/graph-tests.c::link_list@0x18b8
- graph-tests/graph-tests.c::main@0x1d6c
- graph-tests/graph-tests.c::printQueue@0x151b
- graph-tests/graph-tests.c::swap@0x17f8
- hanoi/hanoi.c::main@0x12d4
- heapsort/heapsort.c::main@0x155f
- heat-calc/heat-calc.c::main@0x11e9
- huff-encode/huff-encode.c::main@0x192d
- idct-alg/idct-alg.c::C@0x11e9
- idct-alg/idct-alg.c::main@0x1472
- indirect-test/indirect-test.c::main@0x12c9
- kadane/kadane.c::main@0x1276
- kepler/kepler.c::bin_fact@0x1b3e
- kepler/kepler.c::binary@0x12c6
- kepler/kepler.c::e_series@0x1389
- kepler/kepler.c::j_series@0x1501
- kepler/kepler.c::main@0x1608
- knapsack/knapsack.c::main@0x138e
- knapsack/knapsack.c::max@0x11e9
- knights-tour/knights-tour.c::solveKT@0x12d6
- life/life.c::getNumNeigbors@0x156f
- life/life.c::main@0x11e9
- life/life.c::process@0x1426
- longdiv/longdiv.c::main@0x18fd
- longdiv/longdiv.c::sub@0x11e9
- lu-decomp/lu-decomp.c::main@0x1520
- lu-decomp/lu-decomp.c::print_matrix@0x11e9
- mandelbrot/mandelbrot.c::main@0x1220
- matmult/matmult.c::main@0x11e9
- max-subseq/max-subseq.c::lcsAlgo@0x11e9
- max-subseq/max-subseq.c::main@0x171a
- mersenne/mersenne.c::genrand@0x12ee
- mersenne/mersenne.c::main@0x153a
- mersenne/mersenne.c::sgenrand@0x11e9
- minspan/minspan.c::displayPath@0x1af2
- minspan/minspan.c::main@0x1d8f
- minspan/minspan.c::minSpanTree@0x1297
- monte-carlo/monte-carlo.c::main@0x11e9
- murmur-hash/murmur-hash.c::main@0x13a9
- murmur-hash/murmur-hash.c::murmurhash@0x11e9
- n-queens/n-queens.c::main@0x12ec
- natlog/natlog.c::main@0x11e9
- nbody-sim/nbody-sim.c::main@0x11e9
- packet-filter/packet-filter.c::generate_packet@0x11e9
- packet-filter/packet-filter.c::main@0x14c3
- parrondo/parrondo.c::cointoss@0x11e9
- parrondo/parrondo.c::main@0x12cb
- pi-calc/pi-calc.c::main@0x11e9
- primal-test/primal-test.c::main@0x1459
- primal-test/primal-test.c::miller_rabin_int@0x12fd
- primal-test/primal-test.c::powm@0x11e9
- priority-queue/priority-queue.c::main@0x13ee
- qsort-demo/qsort-demo.c::main@0x17bf
- qsort-demo/qsort-demo.c::print_struct_array@0x155e
- qsort-demo/qsort-demo.c::sort_cstrings_example@0x1401
- qsort-demo/qsort-demo.c::sort_integers_example@0x1280
- qsort-demo/qsort-demo.c::sort_structs_example@0x1603
- qsort-test/qsort-test.c::main@0x1415
- quaternions/quaternions.c::euler_from_quat@0x1447
- quaternions/quaternions.c::quat_from_euler@0x11e9
- quaternions/quaternions.c::quaternion_multiply@0x1655
- quaternions/quaternions.c::test@0x18b2
- rabinkarp-search/rabinkarp-search.c::main@0x1341
- rand-test/rand-test.c::main@0x1913
- rand-test/rand-test.c::run_tests@0x1258
- ransac/ransac.c::main@0x1466
- regex-parser/regex-parser.c::main@0x32b9
- regex-parser/regex-parser.c::re_compile@0x22e1
- regex-parser/regex-parser.c::re_print@0x278f
- rho-factor/rho-factor.c::main@0x5c7d
- rle-compress/rle-compress.c::run_length_encode@0x11e9
- rsa-cipher/rsa-cipher.c::main@0x1634
- rsa-cipher/rsa-cipher.c::mod_inverse@0x1363
- rsa-cipher/rsa-cipher.c::mod_pow@0x11e9
- rsa-cipher/rsa-cipher.c::print_hex_int128@0x14ef
- sat-solver/sat-solver.c::main@0x1518
- sat-solver/sat-solver.c::printFormula@0x1391
- shortest-path/shortest-path.c::main@0x1469
- sieve/sieve.c::main@0x1300
- simple-grep/simple-grep.c::main@0x11e9
- spelt2num/spelt2num.c::main@0x11e9
- spirograph/spirograph.c::spirograph@0x11e9
- sudoku-solver/sudoku-solver.c::main@0x1532
- tetris-sim/tetris-sim.c::best_move@0x1810
- tetris-sim/tetris-sim.c::evaluate_board@0x1686
- tetris-sim/tetris-sim.c::main@0x1ba5
- tiny-NN/tiny-NN.c::train@0x1485
- topo-sort/topo-sort.c::addEdge@0x12cf
- topo-sort/topo-sort.c::createGraph@0x1259
- topo-sort/topo-sort.c::createListNode@0x1221
- topo-sort/topo-sort.c::createStackNode@0x11e9
- topo-sort/topo-sort.c::main@0x153d
- topo-sort/topo-sort.c::topologicalSort@0x13fd
- topo-sort/topo-sort.c::topologicalSortUtil@0x1332
- totient/totient.c::my_gcd@0x11e9
- transcend/transcend.c::init_inputs_f64@0x1235
- uniquify/uniquify.c::main@0x1228
- vectors-3d/vectors-3d.c::get_cross_matrix@0x1601
- vectors-3d/vectors-3d.c::print_vector@0x144f
- vectors-3d/vectors-3d.c::test@0x17fb
- vectors-3d/vectors-3d.c::unit_vec@0x1510
- vectors-3d/vectors-3d.c::vector_add@0x126d
- vectors-3d/vectors-3d.c::vector_prod@0x1373
- vectors-3d/vectors-3d.c::vector_sub@0x11e9
- verlet/verlet.c::main@0x170b
- verlet/verlet.c::vb_init@0x1271
- verlet/verlet.c::vb_step_avg@0x13aa
- weekday/weekday.c::dayOfWeek@0x11e9
- weekday/weekday.c::main@0x130d
## Execution Failures
- life/life.c::init@0x1237
- regex-parser/regex-parser.c::matchpattern@0x313f
- verlet/verlet.c::vb_checksum@0x160b

View file

@ -0,0 +1,334 @@
# Infer-Out Model 2 Evaluation (merged.O1.func_map.infer-host)
- Timestamp: 20251119-171212
- Source JSONL: merged.O1.func_map.infer.jsonl
- Target: host
- Total cases: 379
- Replacement success: 379 (100.00%)
- Compilable: 155 (40.90%)
- Executable: 148 (39.05%)
## Benchmark Breakdown
| Benchmark | Cases | Replacement% | Build% | Exec% |
| --- | --- | --- | --- | --- |
| ackermann | 2 | 100.00% | 50.00% | 50.00% |
| aes | 9 | 100.00% | 33.33% | 33.33% |
| anagram | 13 | 100.00% | 53.85% | 53.85% |
| audio-codec | 3 | 100.00% | 0.00% | 0.00% |
| avl-tree | 17 | 100.00% | 29.41% | 29.41% |
| banner | 1 | 100.00% | 0.00% | 0.00% |
| bit-kernels | 5 | 100.00% | 80.00% | 80.00% |
| blake2b | 5 | 100.00% | 20.00% | 20.00% |
| bloom-filter | 4 | 100.00% | 50.00% | 50.00% |
| boyer-moore-search | 3 | 100.00% | 0.00% | 0.00% |
| bubble-sort | 3 | 100.00% | 100.00% | 100.00% |
| c-interp | 10 | 100.00% | 60.00% | 60.00% |
| ccmac | 1 | 100.00% | 0.00% | 0.00% |
| checkers | 16 | 100.00% | 81.25% | 81.25% |
| cipher | 3 | 100.00% | 33.33% | 0.00% |
| congrad | 2 | 100.00% | 0.00% | 0.00% |
| connect4-minimax | 13 | 100.00% | 61.54% | 61.54% |
| convex-hull | 4 | 100.00% | 75.00% | 75.00% |
| dhrystone | 5 | 100.00% | 40.00% | 40.00% |
| distinctness | 2 | 100.00% | 0.00% | 0.00% |
| fft-int | 4 | 100.00% | 75.00% | 75.00% |
| flood-fill | 2 | 100.00% | 50.00% | 50.00% |
| frac-calc | 10 | 100.00% | 40.00% | 40.00% |
| fuzzy-match | 3 | 100.00% | 33.33% | 33.33% |
| fy-shuffle | 3 | 100.00% | 33.33% | 33.33% |
| gcd-list | 2 | 100.00% | 0.00% | 0.00% |
| grad-descent | 4 | 100.00% | 0.00% | 0.00% |
| graph-tests | 19 | 100.00% | 21.05% | 21.05% |
| hanoi | 2 | 100.00% | 50.00% | 50.00% |
| heapsort | 2 | 100.00% | 50.00% | 50.00% |
| heat-calc | 1 | 100.00% | 0.00% | 0.00% |
| huff-encode | 13 | 100.00% | 92.31% | 92.31% |
| idct-alg | 3 | 100.00% | 66.67% | 33.33% |
| indirect-test | 2 | 100.00% | 50.00% | 50.00% |
| k-means | 6 | 100.00% | 50.00% | 50.00% |
| kadane | 2 | 100.00% | 50.00% | 50.00% |
| kepler | 7 | 100.00% | 14.29% | 14.29% |
| knapsack | 3 | 100.00% | 33.33% | 33.33% |
| knights-tour | 3 | 100.00% | 66.67% | 66.67% |
| life | 14 | 100.00% | 21.43% | 14.29% |
| longdiv | 7 | 100.00% | 71.43% | 71.43% |
| lu-decomp | 3 | 100.00% | 33.33% | 33.33% |
| lz-compress | 2 | 100.00% | 100.00% | 100.00% |
| mandelbrot | 1 | 100.00% | 0.00% | 0.00% |
| matmult | 1 | 100.00% | 0.00% | 0.00% |
| max-subseq | 2 | 100.00% | 0.00% | 0.00% |
| mersenne | 3 | 100.00% | 0.00% | 0.00% |
| minspan | 8 | 100.00% | 37.50% | 25.00% |
| monte-carlo | 1 | 100.00% | 0.00% | 0.00% |
| murmur-hash | 2 | 100.00% | 0.00% | 0.00% |
| n-queens | 3 | 100.00% | 66.67% | 66.67% |
| natlog | 1 | 100.00% | 0.00% | 0.00% |
| nbody-sim | 1 | 100.00% | 0.00% | 0.00% |
| nr-solver | 1 | 100.00% | 100.00% | 100.00% |
| packet-filter | 4 | 100.00% | 25.00% | 25.00% |
| parrondo | 2 | 100.00% | 0.00% | 0.00% |
| pascal | 3 | 100.00% | 33.33% | 33.33% |
| pi-calc | 1 | 100.00% | 0.00% | 0.00% |
| primal-test | 3 | 100.00% | 33.33% | 33.33% |
| priority-queue | 5 | 100.00% | 80.00% | 80.00% |
| qsort-demo | 7 | 100.00% | 28.57% | 28.57% |
| qsort-test | 5 | 100.00% | 80.00% | 80.00% |
| quaternions | 4 | 100.00% | 0.00% | 0.00% |
| rabinkarp-search | 2 | 100.00% | 0.00% | 0.00% |
| rand-test | 3 | 100.00% | 0.00% | 0.00% |
| ransac | 2 | 100.00% | 0.00% | 0.00% |
| regex-parser | 8 | 100.00% | 25.00% | 12.50% |
| rho-factor | 4 | 100.00% | 75.00% | 75.00% |
| rle-compress | 2 | 100.00% | 0.00% | 0.00% |
| rsa-cipher | 4 | 100.00% | 0.00% | 0.00% |
| sat-solver | 5 | 100.00% | 60.00% | 60.00% |
| shortest-path | 3 | 100.00% | 66.67% | 66.67% |
| sieve | 1 | 100.00% | 0.00% | 0.00% |
| simple-grep | 1 | 100.00% | 0.00% | 0.00% |
| spelt2num | 1 | 100.00% | 0.00% | 0.00% |
| spirograph | 2 | 100.00% | 50.00% | 50.00% |
| sudoku-solver | 4 | 100.00% | 50.00% | 50.00% |
| tetris-sim | 12 | 100.00% | 75.00% | 66.67% |
| tiny-NN | 5 | 100.00% | 40.00% | 40.00% |
| topo-sort | 7 | 100.00% | 0.00% | 0.00% |
| totient | 4 | 100.00% | 50.00% | 50.00% |
| transcend | 1 | 100.00% | 0.00% | 0.00% |
| uniquify | 1 | 100.00% | 0.00% | 0.00% |
| vectors-3d | 8 | 100.00% | 12.50% | 0.00% |
| verlet | 1 | 100.00% | 0.00% | 0.00% |
| weekday | 2 | 100.00% | 0.00% | 0.00% |
## Compilation Failures
- ackermann/ackermann.c::main@0x131c
- aes/aes.c::aes_decrypt@0x161b
- aes/aes.c::aes_encrypt@0x1560
- aes/aes.c::inv_shift_rows@0x12cd
- aes/aes.c::key_expansion@0x14c3
- aes/aes.c::main@0x16d1
- aes/aes.c::shift_rows@0x1248
- anagram/anagram.c::BuildMask@0x1372
- anagram/anagram.c::BuildWord@0x15cd
- anagram/anagram.c::DumpWords@0x17e8
- anagram/anagram.c::FindAnagram@0x1839
- anagram/anagram.c::ReadDict@0x1233
- anagram/anagram.c::main@0x1a93
- audio-codec/audio-codec.c::decode@0x1271
- audio-codec/audio-codec.c::encode@0x11e9
- audio-codec/audio-codec.c::main@0x12d7
- avl-tree/avlcore.c::CheckTreeNodeRotation@0x186a
- avl-tree/element.c::Compare@0x1764
- avl-tree/avlcore.c::DeleteByElement@0x1d2b
- avl-tree/avlcore.c::DeleteByElementRecursive@0x1b8b
- avl-tree/avlcore.c::DoubleLeftRotation@0x1845
- avl-tree/avlcore.c::DoubleRightRotation@0x1821
- avl-tree/avlcore.c::FindByElement@0x1790
- avl-tree/avlcore.c::Height@0x1d6e
- avl-tree/avlcore.c::Insert@0x1a73
- avl-tree/avlcore.c::InsertNode@0x199b
- avl-tree/avl-tree.c::main@0x1380
- avl-tree/avl-tree.c::printTree@0x11e9
- banner/banner.c::main@0x11e9
- bit-kernels/bit-kernels.c::main@0x12e8
- blake2b/blake2b.c::F@0x1258
- blake2b/blake2b.c::G@0x11e9
- blake2b/blake2b.c::blake2b@0x1616
- blake2b/blake2b.c::test@0x1982
- bloom-filter/bloom-filter.c::bad_search@0x11e9
- bloom-filter/bloom-filter.c::main@0x1217
- boyer-moore-search/boyer-moore-search.c::badCharHeuristic@0x11e9
- boyer-moore-search/boyer-moore-search.c::main@0x1329
- boyer-moore-search/boyer-moore-search.c::search@0x1223
- c-interp/c-interp.c::eval@0x35d3
- c-interp/c-interp.c::function_body@0x310b
- c-interp/c-interp.c::main@0x3c45
- c-interp/c-interp.c::next@0x11e9
- ccmac/ccmac.c::main@0x11e9
- checkers/functions.c::fill_print_initial@0x15dd
- checkers/functions.c::link_new_node@0x204d
- checkers/checkers.c::main@0x11e9
- cipher/cipher.c::encipher@0x11e9
- cipher/cipher.c::main@0x12b3
- congrad/congrad.c::cg_spmv@0x11e9
- congrad/congrad.c::main@0x125a
- connect4-minimax/connect4-minimax.c::init_board@0x11e9
- connect4-minimax/connect4-minimax.c::main@0x1c5d
- connect4-minimax/connect4-minimax.c::minimax@0x17ed
- connect4-minimax/connect4-minimax.c::play_game@0x1b13
- connect4-minimax/connect4-minimax.c::score_position@0x158e
- convex-hull/convex-hull.c::main@0x130d
- dhrystone/dhrystone.c::PFunc_1@0x12ab
- dhrystone/dhrystone.c::PFunc_2@0x12c8
- dhrystone/dhrystone.c::main@0x1311
- distinctness/distinctness.c::isDistinct@0x11e9
- distinctness/distinctness.c::main@0x1342
- fft-int/fft-int.c::db_from_ampl@0x1513
- flood-fill/flood-fill.c::main@0x130f
- frac-calc/frac-calc.c::avaliatokens@0x1421
- frac-calc/frac-calc.c::calcula@0x172a
- frac-calc/frac-calc.c::copyr@0x12b5
- frac-calc/frac-calc.c::divtokens@0x1636
- frac-calc/frac-calc.c::help@0x11e9
- frac-calc/frac-calc.c::main@0x18c1
- fuzzy-match/fuzzy-match.c::fuzzy_match_recurse@0x21e9
- fuzzy-match/fuzzy-match.c::main@0x2391
- fy-shuffle/fy-shuffle.c::fy_shuffle@0x11e9
- fy-shuffle/fy-shuffle.c::main@0x12de
- gcd-list/gcd-list.c::gcd@0x11e9
- gcd-list/gcd-list.c::main@0x121c
- grad-descent/grad-descent.c::derivateWRTBias@0x1247
- grad-descent/grad-descent.c::derivateWRTWeight@0x11e9
- grad-descent/grad-descent.c::gradientDescent@0x129d
- grad-descent/grad-descent.c::main@0x1312
- graph-tests/graph-tests.c::addEdge@0x127b
- graph-tests/graph-tests.c::addVertex@0x1743
- graph-tests/graph-tests.c::bfs@0x144f
- graph-tests/graph-tests.c::bfs_test@0x150f
- graph-tests/graph-tests.c::bubbleSort@0x15e7
- graph-tests/graph-tests.c::createGraph@0x1206
- graph-tests/graph-tests.c::createNode@0x11e9
- graph-tests/graph-tests.c::createQueue@0x12cd
- graph-tests/graph-tests.c::dequeue@0x1357
- graph-tests/graph-tests.c::enqueue@0x130a
- graph-tests/graph-tests.c::insertAtTheBegin@0x15ae
- graph-tests/graph-tests.c::link_list@0x163c
- graph-tests/graph-tests.c::main@0x1a0e
- graph-tests/graph-tests.c::printQueue@0x13cc
- graph-tests/graph-tests.c::swap@0x15da
- hanoi/hanoi.c::main@0x1261
- heapsort/heapsort.c::main@0x13d4
- heat-calc/heat-calc.c::main@0x11e9
- huff-encode/huff-encode.c::main@0x15ef
- idct-alg/idct-alg.c::main@0x140e
- indirect-test/indirect-test.c::main@0x1257
- k-means/k-means.c::calculateNearst@0x11e9
- k-means/k-means.c::main@0x1922
- k-means/k-means.c::printEPS@0x1546
- kadane/kadane.c::main@0x123b
- kepler/kepler.c::J@0x18c0
- kepler/kepler.c::bin_fact@0x1718
- kepler/kepler.c::binary@0x121d
- kepler/kepler.c::e_series@0x17a2
- kepler/kepler.c::j_series@0x19bb
- kepler/kepler.c::main@0x131f
- knapsack/knapsack.c::main@0x128b
- knapsack/knapsack.c::max@0x11e9
- knights-tour/knights-tour.c::solveKT@0x1341
- life/life.c::getDown@0x1406
- life/life.c::getDownLeft@0x1487
- life/life.c::getDownRight@0x14b4
- life/life.c::getLeft@0x1390
- life/life.c::getNumNeigbors@0x14e2
- life/life.c::getRight@0x13b7
- life/life.c::getUp@0x13df
- life/life.c::getUpLeft@0x142e
- life/life.c::getUpRight@0x145a
- life/life.c::main@0x1664
- life/life.c::process@0x15a3
- longdiv/longdiv.c::main@0x1691
- longdiv/longdiv.c::sub@0x11e9
- lu-decomp/lu-decomp.c::main@0x13ad
- lu-decomp/lu-decomp.c::print_matrix@0x11e9
- mandelbrot/mandelbrot.c::main@0x120d
- matmult/matmult.c::main@0x11e9
- max-subseq/max-subseq.c::lcsAlgo@0x11e9
- max-subseq/max-subseq.c::main@0x14c4
- mersenne/mersenne.c::genrand@0x125b
- mersenne/mersenne.c::main@0x1398
- mersenne/mersenne.c::sgenrand@0x11e9
- minspan/minspan.c::displayGraph@0x13f5
- minspan/minspan.c::displayGraph1@0x14f3
- minspan/minspan.c::displayPath@0x15fa
- minspan/minspan.c::main@0x175b
- minspan/minspan.c::minSpanTree@0x1231
- monte-carlo/monte-carlo.c::main@0x11e9
- murmur-hash/murmur-hash.c::main@0x12a3
- murmur-hash/murmur-hash.c::murmurhash@0x11e9
- n-queens/n-queens.c::main@0x12b1
- natlog/natlog.c::main@0x11e9
- nbody-sim/nbody-sim.c::main@0x11e9
- packet-filter/packet-filter.c::check_packet_filter@0x133d
- packet-filter/packet-filter.c::generate_packet@0x11e9
- packet-filter/packet-filter.c::main@0x145c
- parrondo/parrondo.c::main@0x127d
- parrondo/parrondo.c::play_c@0x1238
- pascal/pascal.c::main@0x12d1
- pascal/pascal.c::print_centered@0x122b
- pi-calc/pi-calc.c::main@0x11e9
- primal-test/primal-test.c::main@0x13ea
- primal-test/primal-test.c::miller_rabin_int@0x1243
- priority-queue/priority-queue.c::main@0x130a
- qsort-demo/qsort-demo.c::main@0x163f
- qsort-demo/qsort-demo.c::print_struct_array@0x1470
- qsort-demo/qsort-demo.c::sort_cstrings_example@0x13b3
- qsort-demo/qsort-demo.c::sort_integers_example@0x1292
- qsort-demo/qsort-demo.c::sort_structs_example@0x14d2
- qsort-test/qsort-test.c::main@0x133f
- quaternions/quaternions.c::euler_from_quat@0x136c
- quaternions/quaternions.c::main@0x15bf
- quaternions/quaternions.c::quat_from_euler@0x11e9
- quaternions/quaternions.c::quaternion_multiply@0x1487
- rabinkarp-search/rabinkarp-search.c::main@0x1366
- rabinkarp-search/rabinkarp-search.c::search@0x11e9
- rand-test/rand-test.c::bad_rand@0x11e9
- rand-test/rand-test.c::main@0x1514
- rand-test/rand-test.c::run_tests@0x1220
- ransac/ransac.c::main@0x13cf
- ransac/ransac.c::ransac_line_fitting@0x1238
- regex-parser/regex-parser.c::main@0x2b4b
- regex-parser/regex-parser.c::matchalphanum@0x21fc
- regex-parser/regex-parser.c::matchcharclass@0x222a
- regex-parser/regex-parser.c::matchone@0x23e1
- regex-parser/regex-parser.c::re_compile@0x270b
- regex-parser/regex-parser.c::re_print@0x2964
- rho-factor/rho-factor.c::main@0x3ef0
- rle-compress/rle-compress.c::main@0x1318
- rle-compress/rle-compress.c::run_length_encode@0x11e9
- rsa-cipher/rsa-cipher.c::main@0x1527
- rsa-cipher/rsa-cipher.c::mod_inverse@0x12f3
- rsa-cipher/rsa-cipher.c::mod_pow@0x11e9
- rsa-cipher/rsa-cipher.c::print_hex_int128@0x1444
- sat-solver/sat-solver.c::main@0x141e
- sat-solver/sat-solver.c::printFormula@0x12ff
- shortest-path/shortest-path.c::main@0x1333
- sieve/sieve.c::main@0x11e9
- simple-grep/simple-grep.c::main@0x11e9
- spelt2num/spelt2num.c::main@0x11e9
- spirograph/spirograph.c::spirograph@0x11e9
- sudoku-solver/sudoku-solver.c::isSafe@0x11e9
- sudoku-solver/sudoku-solver.c::main@0x13e5
- tetris-sim/tetris-sim.c::best_move@0x157c
- tetris-sim/tetris-sim.c::evaluate_board@0x144b
- tetris-sim/tetris-sim.c::main@0x180d
- tiny-NN/tiny-NN.c::main@0x16a4
- tiny-NN/tiny-NN.c::sampleSine@0x1251
- tiny-NN/tiny-NN.c::train@0x133c
- topo-sort/topo-sort.c::addEdge@0x127d
- topo-sort/topo-sort.c::createGraph@0x1223
- topo-sort/topo-sort.c::createListNode@0x1206
- topo-sort/topo-sort.c::createStackNode@0x11e9
- topo-sort/topo-sort.c::main@0x1424
- topo-sort/topo-sort.c::topologicalSort@0x132c
- topo-sort/topo-sort.c::topologicalSortUtil@0x12b7
- totient/totient.c::main@0x12bf
- totient/totient.c::my_gcd@0x11e9
- transcend/transcend.c::main@0x11e9
- uniquify/uniquify.c::main@0x1201
- vectors-3d/vectors-3d.c::get_cross_matrix@0x13c2
- vectors-3d/vectors-3d.c::main@0x14cb
- vectors-3d/vectors-3d.c::print_vector@0x12dc
- vectors-3d/vectors-3d.c::unit_vec@0x1331
- vectors-3d/vectors-3d.c::vector_add@0x121f
- vectors-3d/vectors-3d.c::vector_prod@0x127e
- vectors-3d/vectors-3d.c::vector_sub@0x11e9
- verlet/verlet.c::main@0x11e9
- weekday/weekday.c::dayOfWeek@0x11e9
- weekday/weekday.c::main@0x12ea
## Execution Failures
- cipher/cipher.c::decipher@0x1251
- idct-alg/idct-alg.c::idct_2d@0x1216
- life/life.c::init@0x11e9
- minspan/minspan.c::displayTree@0x16b7
- regex-parser/regex-parser.c::matchpattern@0x2491
- tetris-sim/tetris-sim.c::clear_lines@0x12b6
- vectors-3d/vectors-3d.c::get_angle@0x1429

View file

@ -0,0 +1,345 @@
# Infer-Out Model 2 Evaluation (merged.O2.func_map.infer-host)
- Timestamp: 20251119-170633
- Source JSONL: merged.O2.func_map.infer.jsonl
- Target: host
- Total cases: 368
- Replacement success: 368 (100.00%)
- Compilable: 139 (37.77%)
- Executable: 126 (34.24%)
## Benchmark Breakdown
| Benchmark | Cases | Replacement% | Build% | Exec% |
| --- | --- | --- | --- | --- |
| ackermann | 2 | 100.00% | 50.00% | 50.00% |
| aes | 10 | 100.00% | 20.00% | 20.00% |
| anagram | 13 | 100.00% | 46.15% | 46.15% |
| audio-codec | 3 | 100.00% | 33.33% | 33.33% |
| avl-tree | 15 | 100.00% | 20.00% | 20.00% |
| banner | 1 | 100.00% | 0.00% | 0.00% |
| bit-kernels | 3 | 100.00% | 66.67% | 66.67% |
| blake2b | 4 | 100.00% | 0.00% | 0.00% |
| bloom-filter | 4 | 100.00% | 50.00% | 50.00% |
| boyer-moore-search | 3 | 100.00% | 0.00% | 0.00% |
| bubble-sort | 3 | 100.00% | 100.00% | 100.00% |
| c-interp | 10 | 100.00% | 50.00% | 50.00% |
| ccmac | 1 | 100.00% | 0.00% | 0.00% |
| checkers | 16 | 100.00% | 68.75% | 62.50% |
| cipher | 3 | 100.00% | 66.67% | 0.00% |
| congrad | 1 | 100.00% | 0.00% | 0.00% |
| connect4-minimax | 13 | 100.00% | 61.54% | 53.85% |
| convex-hull | 4 | 100.00% | 75.00% | 75.00% |
| dhrystone | 5 | 100.00% | 20.00% | 20.00% |
| distinctness | 2 | 100.00% | 0.00% | 0.00% |
| fft-int | 4 | 100.00% | 50.00% | 50.00% |
| flood-fill | 2 | 100.00% | 50.00% | 50.00% |
| frac-calc | 10 | 100.00% | 50.00% | 50.00% |
| fuzzy-match | 3 | 100.00% | 33.33% | 33.33% |
| fy-shuffle | 3 | 100.00% | 33.33% | 33.33% |
| gcd-list | 2 | 100.00% | 50.00% | 0.00% |
| grad-descent | 4 | 100.00% | 25.00% | 25.00% |
| graph-tests | 20 | 100.00% | 10.00% | 10.00% |
| hanoi | 1 | 100.00% | 0.00% | 0.00% |
| heapsort | 2 | 100.00% | 50.00% | 50.00% |
| heat-calc | 1 | 100.00% | 0.00% | 0.00% |
| huff-encode | 13 | 100.00% | 92.31% | 92.31% |
| idct-alg | 3 | 100.00% | 66.67% | 33.33% |
| indirect-test | 2 | 100.00% | 50.00% | 50.00% |
| k-means | 6 | 100.00% | 33.33% | 33.33% |
| kadane | 2 | 100.00% | 50.00% | 50.00% |
| kepler | 7 | 100.00% | 14.29% | 14.29% |
| knapsack | 3 | 100.00% | 33.33% | 33.33% |
| knights-tour | 3 | 100.00% | 33.33% | 33.33% |
| life | 14 | 100.00% | 21.43% | 14.29% |
| longdiv | 6 | 100.00% | 50.00% | 50.00% |
| lu-decomp | 3 | 100.00% | 33.33% | 33.33% |
| lz-compress | 2 | 100.00% | 100.00% | 100.00% |
| mandelbrot | 1 | 100.00% | 0.00% | 0.00% |
| matmult | 1 | 100.00% | 0.00% | 0.00% |
| max-subseq | 2 | 100.00% | 0.00% | 0.00% |
| mersenne | 3 | 100.00% | 0.00% | 0.00% |
| minspan | 8 | 100.00% | 25.00% | 25.00% |
| monte-carlo | 1 | 100.00% | 0.00% | 0.00% |
| murmur-hash | 2 | 100.00% | 0.00% | 0.00% |
| n-queens | 3 | 100.00% | 66.67% | 66.67% |
| natlog | 1 | 100.00% | 0.00% | 0.00% |
| nbody-sim | 1 | 100.00% | 0.00% | 0.00% |
| nr-solver | 1 | 100.00% | 100.00% | 100.00% |
| packet-filter | 4 | 100.00% | 0.00% | 0.00% |
| parrondo | 2 | 100.00% | 50.00% | 50.00% |
| pascal | 3 | 100.00% | 66.67% | 66.67% |
| pi-calc | 1 | 100.00% | 0.00% | 0.00% |
| primal-test | 3 | 100.00% | 33.33% | 33.33% |
| priority-queue | 5 | 100.00% | 80.00% | 80.00% |
| qsort-demo | 7 | 100.00% | 28.57% | 28.57% |
| qsort-test | 5 | 100.00% | 80.00% | 80.00% |
| quaternions | 4 | 100.00% | 0.00% | 0.00% |
| rabinkarp-search | 2 | 100.00% | 0.00% | 0.00% |
| rand-test | 3 | 100.00% | 0.00% | 0.00% |
| ransac | 2 | 100.00% | 50.00% | 0.00% |
| regex-parser | 7 | 100.00% | 28.57% | 14.29% |
| rho-factor | 3 | 100.00% | 66.67% | 66.67% |
| rle-compress | 2 | 100.00% | 0.00% | 0.00% |
| rsa-cipher | 4 | 100.00% | 0.00% | 0.00% |
| sat-solver | 5 | 100.00% | 60.00% | 60.00% |
| shortest-path | 3 | 100.00% | 66.67% | 66.67% |
| sieve | 1 | 100.00% | 0.00% | 0.00% |
| simple-grep | 1 | 100.00% | 0.00% | 0.00% |
| spelt2num | 1 | 100.00% | 0.00% | 0.00% |
| spirograph | 2 | 100.00% | 50.00% | 0.00% |
| sudoku-solver | 4 | 100.00% | 50.00% | 50.00% |
| tetris-sim | 12 | 100.00% | 75.00% | 58.33% |
| tiny-NN | 4 | 100.00% | 25.00% | 25.00% |
| topo-sort | 7 | 100.00% | 0.00% | 0.00% |
| totient | 2 | 100.00% | 50.00% | 50.00% |
| transcend | 1 | 100.00% | 0.00% | 0.00% |
| uniquify | 1 | 100.00% | 0.00% | 0.00% |
| vectors-3d | 8 | 100.00% | 12.50% | 0.00% |
| verlet | 1 | 100.00% | 0.00% | 0.00% |
| weekday | 2 | 100.00% | 0.00% | 0.00% |
## Compilation Failures
- ackermann/ackermann.c::main@0x1100
- aes/aes.c::aes_decrypt@0x18c0
- aes/aes.c::aes_encrypt@0x1780
- aes/aes.c::inv_mix_columns@0x1640
- aes/aes.c::inv_shift_rows@0x14f0
- aes/aes.c::key_expansion@0x16d0
- aes/aes.c::main@0x1100
- aes/aes.c::mix_columns@0x1580
- aes/aes.c::shift_rows@0x1480
- anagram/anagram.c::BuildMask@0x14c0
- anagram/anagram.c::BuildWord@0x17d0
- anagram/anagram.c::DumpCandidates@0x19a0
- anagram/anagram.c::DumpWords@0x1a30
- anagram/anagram.c::FindAnagram@0x1a90
- anagram/anagram.c::ReadDict@0x1360
- anagram/anagram.c::main@0x1120
- audio-codec/audio-codec.c::decode@0x1440
- audio-codec/audio-codec.c::main@0x1100
- avl-tree/avlcore.c::CheckTreeNodeRotation@0x1c30
- avl-tree/element.c::Compare@0x1ad0
- avl-tree/avlcore.c::DeleteByElement@0x2860
- avl-tree/avlcore.c::DeleteByElementRecursive@0x26d0
- avl-tree/avlcore.c::DeleteLeftMost@0x2610
- avl-tree/avlcore.c::DoubleLeftRotation@0x1c00
- avl-tree/avlcore.c::DoubleRightRotation@0x1bd0
- avl-tree/avlcore.c::FindByElement@0x1b00
- avl-tree/avlcore.c::Insert@0x1f30
- avl-tree/avlcore.c::MakeEmpty@0x1f80
- avl-tree/avl-tree.c::breadth@0x1760
- avl-tree/avl-tree.c::main@0x1120
- banner/banner.c::main@0x1120
- bit-kernels/bit-kernels.c::main@0x1120
- blake2b/blake2b.c::F@0x12a0
- blake2b/blake2b.c::G@0x1230
- blake2b/blake2b.c::blake2b@0x1620
- blake2b/blake2b.c::test@0x19d0
- bloom-filter/bloom-filter.c::bad_search@0x1430
- bloom-filter/bloom-filter.c::main@0x1120
- boyer-moore-search/boyer-moore-search.c::badCharHeuristic@0x15d0
- boyer-moore-search/boyer-moore-search.c::main@0x1140
- boyer-moore-search/boyer-moore-search.c::search@0x1630
- c-interp/c-interp.c::eval@0x3e90
- c-interp/c-interp.c::function_body@0x37f0
- c-interp/c-interp.c::function_declaration@0x3a10
- c-interp/c-interp.c::main@0x1120
- c-interp/c-interp.c::next@0x1580
- ccmac/ccmac.c::main@0x1120
- checkers/functions.c::fill_print_initial@0x1630
- checkers/functions.c::free_tree@0x2460
- checkers/functions.c::generate_node_children@0x21c0
- checkers/functions.c::link_new_node@0x20e0
- checkers/checkers.c::main@0x1150
- cipher/cipher.c::main@0x1100
- congrad/congrad.c::main@0x1100
- connect4-minimax/connect4-minimax.c::init_board@0x1230
- connect4-minimax/connect4-minimax.c::main@0x1100
- connect4-minimax/connect4-minimax.c::minimax@0x1840
- connect4-minimax/connect4-minimax.c::play_game@0x1c90
- connect4-minimax/connect4-minimax.c::score_position@0x1620
- convex-hull/convex-hull.c::main@0x1100
- dhrystone/dhrystone.c::PFunc_1@0x1970
- dhrystone/dhrystone.c::PFunc_2@0x1990
- dhrystone/dhrystone.c::PProc_8@0x1900
- dhrystone/dhrystone.c::main@0x1100
- distinctness/distinctness.c::isDistinct@0x12a0
- distinctness/distinctness.c::main@0x1100
- fft-int/fft-int.c::db_from_ampl@0x1670
- fft-int/fft-int.c::fix_fft@0x1320
- flood-fill/flood-fill.c::main@0x1100
- frac-calc/frac-calc.c::avaliatokens@0x15f0
- frac-calc/frac-calc.c::copyr@0x1460
- frac-calc/frac-calc.c::divtokens@0x1840
- frac-calc/frac-calc.c::help@0x13b0
- frac-calc/frac-calc.c::main@0x1120
- fuzzy-match/fuzzy-match.c::fuzzy_match_recurse@0x2360
- fuzzy-match/fuzzy-match.c::main@0x2100
- fy-shuffle/fy-shuffle.c::fy_shuffle@0x1440
- fy-shuffle/fy-shuffle.c::main@0x1100
- gcd-list/gcd-list.c::main@0x1120
- grad-descent/grad-descent.c::derivateWRTBias@0x12d0
- grad-descent/grad-descent.c::derivateWRTWeight@0x1270
- grad-descent/grad-descent.c::main@0x1100
- graph-tests/graph-tests.c::DFS_test@0x1c20
- graph-tests/graph-tests.c::addEdge@0x1320
- graph-tests/graph-tests.c::addVertex@0x1a50
- graph-tests/graph-tests.c::bfs@0x1540
- graph-tests/graph-tests.c::bfs_test@0x1720
- graph-tests/graph-tests.c::bubbleSort@0x1880
- graph-tests/graph-tests.c::createGraph@0x1260
- graph-tests/graph-tests.c::createNode@0x1240
- graph-tests/graph-tests.c::createQueue@0x1390
- graph-tests/graph-tests.c::depthFirstSearch@0x1b20
- graph-tests/graph-tests.c::dequeue@0x1430
- graph-tests/graph-tests.c::enqueue@0x13e0
- graph-tests/graph-tests.c::getAdjUnvisitedVertex@0x1ac0
- graph-tests/graph-tests.c::insertAtTheBegin@0x1840
- graph-tests/graph-tests.c::link_list@0x18e0
- graph-tests/graph-tests.c::main@0x1120
- graph-tests/graph-tests.c::printQueue@0x14c0
- graph-tests/graph-tests.c::swap@0x1870
- hanoi/hanoi.c::main@0x1100
- heapsort/heapsort.c::main@0x1100
- heat-calc/heat-calc.c::main@0x1100
- huff-encode/huff-encode.c::main@0x1120
- idct-alg/idct-alg.c::main@0x1100
- indirect-test/indirect-test.c::main@0x1100
- k-means/k-means.c::calculateNearst@0x1310
- k-means/k-means.c::kMeans@0x1420
- k-means/k-means.c::main@0x1120
- k-means/k-means.c::printEPS@0x16b0
- kadane/kadane.c::main@0x1100
- kepler/kepler.c::J@0x1920
- kepler/kepler.c::bin_fact@0x1740
- kepler/kepler.c::binary@0x16a0
- kepler/kepler.c::e_series@0x17e0
- kepler/kepler.c::j_series@0x1a20
- kepler/kepler.c::main@0x1100
- knapsack/knapsack.c::main@0x1100
- knapsack/knapsack.c::max@0x1310
- knights-tour/knights-tour.c::solveKT@0x1390
- knights-tour/knights-tour.c::solveKTUtil@0x14f0
- life/life.c::getDown@0x16e0
- life/life.c::getDownLeft@0x1770
- life/life.c::getDownRight@0x17a0
- life/life.c::getLeft@0x1650
- life/life.c::getNumNeigbors@0x1390
- life/life.c::getRight@0x1680
- life/life.c::getUp@0x16b0
- life/life.c::getUpLeft@0x1710
- life/life.c::getUpRight@0x1740
- life/life.c::main@0x1100
- life/life.c::process@0x1550
- longdiv/longdiv.c::main@0x1120
- longdiv/longdiv.c::sbc@0x1a20
- longdiv/longdiv.c::sub@0x19c0
- lu-decomp/lu-decomp.c::main@0x1100
- lu-decomp/lu-decomp.c::print_matrix@0x13a0
- mandelbrot/mandelbrot.c::main@0x1100
- matmult/matmult.c::main@0x1100
- max-subseq/max-subseq.c::lcsAlgo@0x1290
- max-subseq/max-subseq.c::main@0x1120
- mersenne/mersenne.c::genrand@0x1310
- mersenne/mersenne.c::main@0x1100
- mersenne/mersenne.c::sgenrand@0x1290
- minspan/minspan.c::displayGraph@0x14f0
- minspan/minspan.c::displayGraph1@0x15f0
- minspan/minspan.c::displayPath@0x1700
- minspan/minspan.c::displayTree@0x17a0
- minspan/minspan.c::main@0x1100
- minspan/minspan.c::minSpanTree@0x12f0
- monte-carlo/monte-carlo.c::main@0x1100
- murmur-hash/murmur-hash.c::main@0x1100
- murmur-hash/murmur-hash.c::murmurhash@0x1290
- n-queens/n-queens.c::main@0x1120
- natlog/natlog.c::main@0x1100
- nbody-sim/nbody-sim.c::main@0x1100
- packet-filter/packet-filter.c::check_packet_filter@0x1430
- packet-filter/packet-filter.c::generate_packet@0x12d0
- packet-filter/packet-filter.c::main@0x1100
- packet-filter/packet-filter.c::print_packet@0x1490
- parrondo/parrondo.c::main@0x1100
- pascal/pascal.c::main@0x1100
- pi-calc/pi-calc.c::main@0x1100
- primal-test/primal-test.c::main@0x1100
- primal-test/primal-test.c::miller_rabin_int@0x1510
- priority-queue/priority-queue.c::main@0x1120
- qsort-demo/qsort-demo.c::main@0x1120
- qsort-demo/qsort-demo.c::print_struct_array@0x15c0
- qsort-demo/qsort-demo.c::sort_cstrings_example@0x14a0
- qsort-demo/qsort-demo.c::sort_integers_example@0x1310
- qsort-demo/qsort-demo.c::sort_structs_example@0x1640
- qsort-test/qsort-test.c::main@0x1120
- quaternions/quaternions.c::euler_from_quat@0x1580
- quaternions/quaternions.c::main@0x1100
- quaternions/quaternions.c::quat_from_euler@0x13f0
- quaternions/quaternions.c::quaternion_multiply@0x16b0
- rabinkarp-search/rabinkarp-search.c::main@0x1120
- rabinkarp-search/rabinkarp-search.c::search@0x13a0
- rand-test/rand-test.c::bad_rand@0x1240
- rand-test/rand-test.c::main@0x1100
- rand-test/rand-test.c::run_tests@0x1280
- ransac/ransac.c::main@0x1100
- regex-parser/regex-parser.c::main@0x2100
- regex-parser/regex-parser.c::matchcharclass@0x23b0
- regex-parser/regex-parser.c::matchone@0x2560
- regex-parser/regex-parser.c::re_compile@0x2930
- regex-parser/regex-parser.c::re_print@0x2bf0
- rho-factor/rho-factor.c::main@0x1120
- rle-compress/rle-compress.c::main@0x1120
- rle-compress/rle-compress.c::run_length_encode@0x1330
- rsa-cipher/rsa-cipher.c::main@0x1100
- rsa-cipher/rsa-cipher.c::mod_inverse@0x1670
- rsa-cipher/rsa-cipher.c::mod_pow@0x1580
- rsa-cipher/rsa-cipher.c::print_hex_int128@0x1790
- sat-solver/sat-solver.c::main@0x1100
- sat-solver/sat-solver.c::printFormula@0x1390
- shortest-path/shortest-path.c::main@0x1100
- sieve/sieve.c::main@0x1100
- simple-grep/simple-grep.c::main@0x1120
- spelt2num/spelt2num.c::main@0x1100
- spirograph/spirograph.c::spirograph@0x1230
- sudoku-solver/sudoku-solver.c::isSafe@0x1250
- sudoku-solver/sudoku-solver.c::main@0x1100
- tetris-sim/tetris-sim.c::best_move@0x1860
- tetris-sim/tetris-sim.c::evaluate_board@0x1640
- tetris-sim/tetris-sim.c::main@0x1120
- tiny-NN/tiny-NN.c::main@0x1120
- tiny-NN/tiny-NN.c::sampleSine@0x12d0
- tiny-NN/tiny-NN.c::train@0x13e0
- topo-sort/topo-sort.c::addEdge@0x1370
- topo-sort/topo-sort.c::createGraph@0x1300
- topo-sort/topo-sort.c::createListNode@0x12e0
- topo-sort/topo-sort.c::createStackNode@0x12c0
- topo-sort/topo-sort.c::main@0x1120
- topo-sort/topo-sort.c::topologicalSort@0x1450
- topo-sort/topo-sort.c::topologicalSortUtil@0x13c0
- totient/totient.c::main@0x1100
- transcend/transcend.c::main@0x1120
- uniquify/uniquify.c::main@0x1120
- vectors-3d/vectors-3d.c::get_cross_matrix@0x1760
- vectors-3d/vectors-3d.c::main@0x1100
- vectors-3d/vectors-3d.c::print_vector@0x1620
- vectors-3d/vectors-3d.c::unit_vec@0x1690
- vectors-3d/vectors-3d.c::vector_add@0x1550
- vectors-3d/vectors-3d.c::vector_prod@0x15c0
- vectors-3d/vectors-3d.c::vector_sub@0x1510
- verlet/verlet.c::main@0x1100
- weekday/weekday.c::dayOfWeek@0x1350
- weekday/weekday.c::main@0x1100
## Execution Failures
- checkers/functions.c::all_possible_moves@0x1a60
- cipher/cipher.c::decipher@0x1360
- cipher/cipher.c::encipher@0x12f0
- connect4-minimax/connect4-minimax.c::terminal_score@0x1800
- gcd-list/gcd-list.c::gcd@0x1310
- idct-alg/idct-alg.c::idct_2d@0x12f0
- life/life.c::init@0x1220
- ransac/ransac.c::ransac_line_fitting@0x1410
- regex-parser/regex-parser.c::matchpattern@0x2670
- spirograph/spirograph.c::test@0x1390
- tetris-sim/tetris-sim.c::clear_lines@0x1480
- tetris-sim/tetris-sim.c::simulate_board@0x17c0
- vectors-3d/vectors-3d.c::get_angle@0x17d0

View file

@ -0,0 +1,355 @@
# Infer-Out Model 2 Evaluation (merged.O3.func_map.infer-host)
- Timestamp: 20251119-171533
- Source JSONL: merged.O3.func_map.infer.jsonl
- Target: host
- Total cases: 359
- Replacement success: 359 (100.00%)
- Compilable: 114 (31.75%)
- Executable: 106 (29.53%)
## Benchmark Breakdown
| Benchmark | Cases | Replacement% | Build% | Exec% |
| --- | --- | --- | --- | --- |
| ackermann | 2 | 100.00% | 50.00% | 50.00% |
| aes | 11 | 100.00% | 27.27% | 27.27% |
| anagram | 13 | 100.00% | 38.46% | 38.46% |
| audio-codec | 3 | 100.00% | 33.33% | 33.33% |
| avl-tree | 15 | 100.00% | 13.33% | 13.33% |
| banner | 1 | 100.00% | 0.00% | 0.00% |
| bit-kernels | 3 | 100.00% | 66.67% | 66.67% |
| blake2b | 3 | 100.00% | 0.00% | 0.00% |
| bloom-filter | 4 | 100.00% | 25.00% | 25.00% |
| boyer-moore-search | 3 | 100.00% | 0.00% | 0.00% |
| bubble-sort | 3 | 100.00% | 100.00% | 100.00% |
| c-interp | 10 | 100.00% | 40.00% | 40.00% |
| ccmac | 1 | 100.00% | 0.00% | 0.00% |
| checkers | 13 | 100.00% | 61.54% | 61.54% |
| cipher | 3 | 100.00% | 33.33% | 0.00% |
| congrad | 1 | 100.00% | 0.00% | 0.00% |
| connect4-minimax | 11 | 100.00% | 45.45% | 45.45% |
| convex-hull | 4 | 100.00% | 50.00% | 50.00% |
| dhrystone | 5 | 100.00% | 40.00% | 40.00% |
| distinctness | 2 | 100.00% | 0.00% | 0.00% |
| fft-int | 4 | 100.00% | 0.00% | 0.00% |
| flood-fill | 2 | 100.00% | 50.00% | 50.00% |
| frac-calc | 9 | 100.00% | 22.22% | 22.22% |
| fuzzy-match | 3 | 100.00% | 33.33% | 33.33% |
| fy-shuffle | 3 | 100.00% | 33.33% | 33.33% |
| gcd-list | 2 | 100.00% | 0.00% | 0.00% |
| grad-descent | 4 | 100.00% | 0.00% | 0.00% |
| graph-tests | 19 | 100.00% | 5.26% | 5.26% |
| hanoi | 1 | 100.00% | 0.00% | 0.00% |
| heapsort | 2 | 100.00% | 0.00% | 0.00% |
| heat-calc | 1 | 100.00% | 0.00% | 0.00% |
| huff-encode | 12 | 100.00% | 83.33% | 83.33% |
| idct-alg | 3 | 100.00% | 66.67% | 33.33% |
| indirect-test | 2 | 100.00% | 50.00% | 50.00% |
| k-means | 5 | 100.00% | 0.00% | 0.00% |
| kadane | 2 | 100.00% | 50.00% | 50.00% |
| kepler | 7 | 100.00% | 14.29% | 14.29% |
| knapsack | 3 | 100.00% | 33.33% | 33.33% |
| knights-tour | 3 | 100.00% | 33.33% | 33.33% |
| life | 14 | 100.00% | 21.43% | 14.29% |
| longdiv | 7 | 100.00% | 71.43% | 71.43% |
| lu-decomp | 3 | 100.00% | 33.33% | 33.33% |
| lz-compress | 2 | 100.00% | 100.00% | 100.00% |
| mandelbrot | 1 | 100.00% | 0.00% | 0.00% |
| max-subseq | 2 | 100.00% | 0.00% | 0.00% |
| mersenne | 4 | 100.00% | 0.00% | 0.00% |
| minspan | 8 | 100.00% | 25.00% | 25.00% |
| monte-carlo | 1 | 100.00% | 0.00% | 0.00% |
| murmur-hash | 2 | 100.00% | 0.00% | 0.00% |
| n-queens | 3 | 100.00% | 66.67% | 66.67% |
| natlog | 1 | 100.00% | 0.00% | 0.00% |
| nbody-sim | 1 | 100.00% | 0.00% | 0.00% |
| nr-solver | 1 | 100.00% | 100.00% | 100.00% |
| packet-filter | 4 | 100.00% | 0.00% | 0.00% |
| parrondo | 2 | 100.00% | 50.00% | 50.00% |
| pascal | 3 | 100.00% | 66.67% | 66.67% |
| pi-calc | 1 | 100.00% | 0.00% | 0.00% |
| primal-test | 3 | 100.00% | 66.67% | 66.67% |
| priority-queue | 5 | 100.00% | 40.00% | 40.00% |
| qsort-demo | 7 | 100.00% | 28.57% | 28.57% |
| qsort-test | 5 | 100.00% | 80.00% | 80.00% |
| quaternions | 4 | 100.00% | 0.00% | 0.00% |
| rabinkarp-search | 2 | 100.00% | 0.00% | 0.00% |
| rand-test | 3 | 100.00% | 0.00% | 0.00% |
| ransac | 2 | 100.00% | 50.00% | 0.00% |
| regex-parser | 8 | 100.00% | 25.00% | 25.00% |
| rho-factor | 1 | 100.00% | 100.00% | 100.00% |
| rle-compress | 2 | 100.00% | 0.00% | 0.00% |
| rsa-cipher | 4 | 100.00% | 0.00% | 0.00% |
| sat-solver | 5 | 100.00% | 60.00% | 40.00% |
| shortest-path | 3 | 100.00% | 33.33% | 33.33% |
| sieve | 1 | 100.00% | 0.00% | 0.00% |
| simple-grep | 1 | 100.00% | 0.00% | 0.00% |
| spelt2num | 1 | 100.00% | 0.00% | 0.00% |
| spirograph | 2 | 100.00% | 50.00% | 0.00% |
| sudoku-solver | 4 | 100.00% | 75.00% | 75.00% |
| tetris-sim | 12 | 100.00% | 58.33% | 50.00% |
| tiny-NN | 4 | 100.00% | 25.00% | 25.00% |
| topo-sort | 7 | 100.00% | 0.00% | 0.00% |
| totient | 2 | 100.00% | 50.00% | 50.00% |
| transcend | 1 | 100.00% | 0.00% | 0.00% |
| uniquify | 1 | 100.00% | 0.00% | 0.00% |
| vectors-3d | 8 | 100.00% | 12.50% | 0.00% |
| verlet | 1 | 100.00% | 0.00% | 0.00% |
| weekday | 2 | 100.00% | 0.00% | 0.00% |
## Compilation Failures
- ackermann/ackermann.c::main@0x1100
- aes/aes.c::add_round_key@0x1810
- aes/aes.c::aes_decrypt@0x2760
- aes/aes.c::aes_encrypt@0x2200
- aes/aes.c::inv_shift_rows@0x1af0
- aes/aes.c::key_expansion@0x1ff0
- aes/aes.c::main@0x1100
- aes/aes.c::mix_columns@0x1bd0
- aes/aes.c::shift_rows@0x1a30
- anagram/anagram.c::BuildMask@0x1620
- anagram/anagram.c::BuildWord@0x1940
- anagram/anagram.c::DumpCandidates@0x1c10
- anagram/anagram.c::DumpWords@0x1ca0
- anagram/anagram.c::FindAnagram@0x1d00
- anagram/anagram.c::ReadDict@0x14c0
- anagram/anagram.c::SortCandidates@0x1f10
- anagram/anagram.c::main@0x1120
- audio-codec/audio-codec.c::decode@0x1590
- audio-codec/audio-codec.c::main@0x1100
- avl-tree/avlcore.c::CheckTreeNodeRotation@0x1c50
- avl-tree/element.c::Compare@0x1af0
- avl-tree/avlcore.c::DeleteByElement@0x2e50
- avl-tree/avlcore.c::DeleteByElementRecursive@0x2bf0
- avl-tree/avlcore.c::DeleteLeftMost@0x2720
- avl-tree/avlcore.c::DoubleLeftRotation@0x1c20
- avl-tree/avlcore.c::DoubleRightRotation@0x1bf0
- avl-tree/avlcore.c::FindByElement@0x1b20
- avl-tree/avlcore.c::Insert@0x1f40
- avl-tree/avlcore.c::InsertNode@0x1e10
- avl-tree/avlcore.c::MakeEmpty@0x2090
- avl-tree/avl-tree.c::breadth@0x1780
- avl-tree/avl-tree.c::main@0x1120
- banner/banner.c::main@0x1120
- bit-kernels/bit-kernels.c::main@0x1120
- blake2b/blake2b.c::F@0x12e0
- blake2b/blake2b.c::blake2b@0x17b0
- blake2b/blake2b.c::test@0x1b50
- bloom-filter/bloom-filter.c::bad_search@0x1450
- bloom-filter/tinybloom.c::bfilter_intersect@0x1570
- bloom-filter/bloom-filter.c::main@0x1120
- boyer-moore-search/boyer-moore-search.c::badCharHeuristic@0x15d0
- boyer-moore-search/boyer-moore-search.c::main@0x1140
- boyer-moore-search/boyer-moore-search.c::search@0x1630
- c-interp/c-interp.c::enum_declaration@0x34f0
- c-interp/c-interp.c::eval@0x3ea0
- c-interp/c-interp.c::function_body@0x37f0
- c-interp/c-interp.c::function_declaration@0x3a10
- c-interp/c-interp.c::main@0x1120
- c-interp/c-interp.c::next@0x15a0
- ccmac/ccmac.c::main@0x1120
- checkers/functions.c::fill_print_initial@0x18e0
- checkers/functions.c::free_tree@0x6210
- checkers/functions.c::generate_node_children@0x35d0
- checkers/functions.c::link_new_node@0x34c0
- checkers/checkers.c::main@0x1130
- cipher/cipher.c::encipher@0x12f0
- cipher/cipher.c::main@0x1100
- congrad/congrad.c::main@0x1100
- connect4-minimax/connect4-minimax.c::board_full@0x1500
- connect4-minimax/connect4-minimax.c::evaluate_window@0x2380
- connect4-minimax/connect4-minimax.c::init_board@0x1230
- connect4-minimax/connect4-minimax.c::main@0x1100
- connect4-minimax/connect4-minimax.c::minimax@0x3c30
- connect4-minimax/connect4-minimax.c::play_game@0x4260
- convex-hull/convex-hull.c::main@0x1100
- convex-hull/convex-hull.c::sortPoints@0x1740
- dhrystone/dhrystone.c::PFunc_1@0x1980
- dhrystone/dhrystone.c::PProc_8@0x1910
- dhrystone/dhrystone.c::main@0x1100
- distinctness/distinctness.c::isDistinct@0x12a0
- distinctness/distinctness.c::main@0x1100
- fft-int/fft-int.c::db_from_ampl@0x1c50
- fft-int/fft-int.c::fix_fft@0x1370
- fft-int/fft-int.c::fix_loud@0x1a90
- fft-int/fft-int.c::window@0x1650
- flood-fill/flood-fill.c::main@0x1100
- frac-calc/frac-calc.c::avaliatokens@0x1730
- frac-calc/frac-calc.c::copyr@0x1550
- frac-calc/frac-calc.c::divtokens@0x1980
- frac-calc/frac-calc.c::help@0x14a0
- frac-calc/frac-calc.c::main@0x1120
- frac-calc/frac-calc.c::misto@0x1610
- frac-calc/frac-calc.c::simplifica@0x28f0
- fuzzy-match/fuzzy-match.c::fuzzy_match_recurse@0x23e0
- fuzzy-match/fuzzy-match.c::main@0x2100
- fy-shuffle/fy-shuffle.c::fy_shuffle@0x1440
- fy-shuffle/fy-shuffle.c::main@0x1100
- gcd-list/gcd-list.c::gcd@0x1310
- gcd-list/gcd-list.c::main@0x1120
- grad-descent/grad-descent.c::derivateWRTBias@0x12e0
- grad-descent/grad-descent.c::derivateWRTWeight@0x1270
- grad-descent/grad-descent.c::gradientDescent@0x1350
- grad-descent/grad-descent.c::main@0x1100
- graph-tests/graph-tests.c::DFS_test@0x2340
- graph-tests/graph-tests.c::addEdge@0x1610
- graph-tests/graph-tests.c::addVertex@0x1f80
- graph-tests/graph-tests.c::bfs@0x1830
- graph-tests/graph-tests.c::bfs_test@0x1a70
- graph-tests/graph-tests.c::bubbleSort@0x1db0
- graph-tests/graph-tests.c::createGraph@0x1550
- graph-tests/graph-tests.c::createNode@0x1530
- graph-tests/graph-tests.c::createQueue@0x1680
- graph-tests/graph-tests.c::depthFirstSearch@0x2110
- graph-tests/graph-tests.c::dequeue@0x1720
- graph-tests/graph-tests.c::enqueue@0x16d0
- graph-tests/graph-tests.c::insertAtTheBegin@0x1d70
- graph-tests/graph-tests.c::link_list@0x1e20
- graph-tests/graph-tests.c::main@0x1180
- graph-tests/graph-tests.c::printQueue@0x17b0
- graph-tests/graph-tests.c::swap@0x1da0
- graph-tests/graph-tests.c::towers@0x2490
- hanoi/hanoi.c::main@0x1100
- heapsort/heapsort.c::HSORT@0x12f0
- heapsort/heapsort.c::main@0x11a0
- heat-calc/heat-calc.c::main@0x1100
- huff-encode/huff-encode.c::buildHuffmanTree@0x18b0
- huff-encode/huff-encode.c::main@0x1120
- idct-alg/idct-alg.c::main@0x1100
- indirect-test/indirect-test.c::main@0x1100
- k-means/k-means.c::calculateCentroid@0x1390
- k-means/k-means.c::calculateNearst@0x1310
- k-means/k-means.c::kMeans@0x1400
- k-means/k-means.c::main@0x1120
- k-means/k-means.c::printEPS@0x16c0
- kadane/kadane.c::main@0x1100
- kepler/kepler.c::J@0x1b80
- kepler/kepler.c::bin_fact@0x1ad0
- kepler/kepler.c::binary@0x16a0
- kepler/kepler.c::e_series@0x1740
- kepler/kepler.c::j_series@0x1920
- kepler/kepler.c::main@0x1100
- knapsack/knapsack.c::main@0x1100
- knapsack/knapsack.c::max@0x1310
- knights-tour/knights-tour.c::solveKT@0x1830
- knights-tour/knights-tour.c::solveKTUtil@0x1980
- life/life.c::getDown@0x1960
- life/life.c::getDownLeft@0x19f0
- life/life.c::getDownRight@0x1a20
- life/life.c::getLeft@0x18d0
- life/life.c::getNumNeigbors@0x16d0
- life/life.c::getRight@0x1900
- life/life.c::getUp@0x1930
- life/life.c::getUpLeft@0x1990
- life/life.c::getUpRight@0x19c0
- life/life.c::main@0x1100
- life/life.c::process@0x1430
- longdiv/longdiv.c::main@0x1120
- longdiv/longdiv.c::sub@0x1a80
- lu-decomp/lu-decomp.c::main@0x1100
- lu-decomp/lu-decomp.c::print_matrix@0x1320
- mandelbrot/mandelbrot.c::main@0x1100
- max-subseq/max-subseq.c::lcsAlgo@0x1290
- max-subseq/max-subseq.c::main@0x1120
- mersenne/mersenne.c::genrand@0x1380
- mersenne/mersenne.c::lsgenrand@0x1320
- mersenne/mersenne.c::main@0x1100
- mersenne/mersenne.c::sgenrand@0x12d0
- minspan/minspan.c::displayGraph@0x1db0
- minspan/minspan.c::displayGraph1@0x1ee0
- minspan/minspan.c::displayPath@0x2020
- minspan/minspan.c::displayTree@0x20c0
- minspan/minspan.c::main@0x1100
- minspan/minspan.c::minSpanTree@0x1400
- monte-carlo/monte-carlo.c::main@0x1100
- murmur-hash/murmur-hash.c::main@0x1100
- murmur-hash/murmur-hash.c::murmurhash@0x1290
- n-queens/n-queens.c::main@0x1120
- natlog/natlog.c::main@0x1100
- nbody-sim/nbody-sim.c::main@0x1100
- packet-filter/packet-filter.c::check_packet_filter@0x1520
- packet-filter/packet-filter.c::generate_packet@0x13d0
- packet-filter/packet-filter.c::main@0x1100
- packet-filter/packet-filter.c::print_packet@0x1580
- parrondo/parrondo.c::main@0x1100
- pascal/pascal.c::main@0x1100
- pi-calc/pi-calc.c::main@0x1100
- primal-test/primal-test.c::main@0x1100
- priority-queue/priority-queue.c::main@0x1120
- priority-queue/priority-queue.c::newNode@0x13a0
- priority-queue/priority-queue.c::push@0x1420
- qsort-demo/qsort-demo.c::main@0x1120
- qsort-demo/qsort-demo.c::print_struct_array@0x15b0
- qsort-demo/qsort-demo.c::sort_cstrings_example@0x1480
- qsort-demo/qsort-demo.c::sort_integers_example@0x1310
- qsort-demo/qsort-demo.c::sort_structs_example@0x1630
- qsort-test/qsort-test.c::main@0x1120
- quaternions/quaternions.c::euler_from_quat@0x1550
- quaternions/quaternions.c::main@0x1100
- quaternions/quaternions.c::quat_from_euler@0x13e0
- quaternions/quaternions.c::quaternion_multiply@0x1670
- rabinkarp-search/rabinkarp-search.c::main@0x1120
- rabinkarp-search/rabinkarp-search.c::search@0x15a0
- rand-test/rand-test.c::bad_rand@0x1240
- rand-test/rand-test.c::main@0x1100
- rand-test/rand-test.c::run_tests@0x1280
- ransac/ransac.c::main@0x1100
- regex-parser/regex-parser.c::main@0x2100
- regex-parser/regex-parser.c::matchcharclass@0x2420
- regex-parser/regex-parser.c::matchone@0x25c0
- regex-parser/regex-parser.c::matchpattern@0x26d0
- regex-parser/regex-parser.c::re_compile@0x2ac0
- regex-parser/regex-parser.c::re_print@0x2e30
- rle-compress/rle-compress.c::main@0x1120
- rle-compress/rle-compress.c::run_length_encode@0x1330
- rsa-cipher/rsa-cipher.c::main@0x1100
- rsa-cipher/rsa-cipher.c::mod_inverse@0x15a0
- rsa-cipher/rsa-cipher.c::mod_pow@0x14b0
- rsa-cipher/rsa-cipher.c::print_hex_int128@0x16c0
- sat-solver/sat-solver.c::main@0x1100
- sat-solver/sat-solver.c::printFormula@0x1680
- shortest-path/shortest-path.c::floydWarshall@0x1330
- shortest-path/shortest-path.c::main@0x1100
- sieve/sieve.c::main@0x1100
- simple-grep/simple-grep.c::main@0x1120
- spelt2num/spelt2num.c::main@0x1100
- spirograph/spirograph.c::spirograph@0x1230
- sudoku-solver/sudoku-solver.c::main@0x1100
- tetris-sim/tetris-sim.c::aggregate_height@0x1b20
- tetris-sim/tetris-sim.c::best_move@0x21d0
- tetris-sim/tetris-sim.c::count_holes@0x1b70
- tetris-sim/tetris-sim.c::evaluate_board@0x1ca0
- tetris-sim/tetris-sim.c::main@0x1100
- tiny-NN/tiny-NN.c::main@0x1120
- tiny-NN/tiny-NN.c::sampleSine@0x12d0
- tiny-NN/tiny-NN.c::train@0x13e0
- topo-sort/topo-sort.c::addEdge@0x13f0
- topo-sort/topo-sort.c::createGraph@0x1380
- topo-sort/topo-sort.c::createListNode@0x1360
- topo-sort/topo-sort.c::createStackNode@0x1340
- topo-sort/topo-sort.c::main@0x1120
- topo-sort/topo-sort.c::topologicalSort@0x18b0
- topo-sort/topo-sort.c::topologicalSortUtil@0x1440
- totient/totient.c::main@0x1100
- transcend/transcend.c::main@0x1120
- uniquify/uniquify.c::main@0x1120
- vectors-3d/vectors-3d.c::get_cross_matrix@0x1850
- vectors-3d/vectors-3d.c::main@0x1100
- vectors-3d/vectors-3d.c::print_vector@0x1730
- vectors-3d/vectors-3d.c::unit_vec@0x17a0
- vectors-3d/vectors-3d.c::vector_add@0x1650
- vectors-3d/vectors-3d.c::vector_prod@0x16b0
- vectors-3d/vectors-3d.c::vector_sub@0x1620
- verlet/verlet.c::main@0x1100
- weekday/weekday.c::dayOfWeek@0x1290
- weekday/weekday.c::main@0x1100
## Execution Failures
- cipher/cipher.c::decipher@0x1360
- idct-alg/idct-alg.c::idct_2d@0x12f0
- life/life.c::init@0x12c0
- ransac/ransac.c::ransac_line_fitting@0x1410
- sat-solver/sat-solver.c::solveSAT@0x13a0
- spirograph/spirograph.c::test@0x1390
- tetris-sim/tetris-sim.c::clear_lines@0x19a0
- vectors-3d/vectors-3d.c::get_angle@0x18c0

View file

@ -0,0 +1,493 @@
#!/usr/bin/env python3
"""Generate function-level mappings across source, pseudo, and assembly outputs."""
from __future__ import annotations
import argparse
import json
import os
import re
import sys
from pathlib import Path
from typing import Dict, List, Optional
import subprocess
FUNC_KEYWORDS = {"if", "for", "while", "switch", "return", "sizeof", "do", "case", "else"}
TYPEDEF_MAP = {
"cpu_set_t": "int",
"nl_item": "int",
"__time_t": "int",
"__mode_t": "unsigned short",
"__off64_t": "long long",
"__blksize_t": "long",
"__ino_t": "unsigned long",
"__blkcnt_t": "unsigned long long",
"__syscall_slong_t": "long",
"__ssize_t": "long int",
"wchar_t": "unsigned short int",
"wctype_t": "unsigned short int",
"__int64": "long long",
"__int32": "int",
"__int16": "short",
"__int8": "char",
"_QWORD": "uint64_t",
"_OWORD": "long double",
"_DWORD": "uint32_t",
"size_t": "unsigned int",
"_BYTE": "uint8_t",
"_TBYTE": "uint16_t",
"_BOOL8": "uint8_t",
"gcc_va_list": "va_list",
"_WORD": "unsigned short",
"_BOOL4": "int",
"__va_list_tag": "va_list",
"_IO_FILE": "FILE",
"DIR": "int",
"__fsword_t": "long",
"__kernel_ulong_t": "int",
"cc_t": "int",
"speed_t": "int",
"fd_set": "int",
"__suseconds_t": "int",
"_UNKNOWN": "void",
"__sighandler_t": "void (*)(int)",
"__compar_fn_t": "int (*)(const void *, const void *)",
}
def _load_config_env() -> dict:
"""Load config.env from the eval project root."""
eval_root = Path(__file__).resolve().parents[1]
config_path = eval_root / "config.env"
config = {}
if config_path.exists():
for line in config_path.read_text().splitlines():
line = line.strip()
if not line or line.startswith("#"):
continue
if "=" in line:
key, _, value = line.partition("=")
config[key.strip()] = value.strip()
return config
def _get_bench_root(cli_value: str | None = None) -> Path:
"""Resolve the benchmark repo root from CLI arg, env var, or config.env."""
if cli_value:
return Path(cli_value).resolve()
env_val = os.environ.get("BENCH_REPO_ROOT")
if env_val:
return Path(env_val).resolve()
config = _load_config_env()
if "BENCH_REPO_ROOT" in config:
return Path(config["BENCH_REPO_ROOT"]).resolve()
sys.exit("error: BENCH_REPO_ROOT not set. Use --bench-root, set the env var, or configure config.env")
def _read_text(path: Path) -> str:
return path.read_text(encoding="utf-8")
def _strip_empty(code: str) -> str:
return "\n".join(line for line in code.splitlines() if line.strip())
def _good_func(func: str) -> bool:
body = "{".join(func.split("{", 1)[1:]) if "{" in func else func
total = 0
for line in body.splitlines():
if len(line.strip()) >= 3:
total += 1
return 3 < total < 300
def _format_with_clang(func: str, style: str = "Google") -> Optional[str]:
if not func:
return None
cmd = ["clang-format", f"--style={style}"]
try:
proc = subprocess.run(
cmd,
input=func,
text=True,
capture_output=True,
check=True,
timeout=15,
)
return proc.stdout
except Exception as e:
print(e)
return None
def _hex_to_dec(text: str) -> str:
pattern = re.compile(r"\b(0x[0-9a-fA-F]+)([uUlL]{1,3})?\b")
def convert(match: re.Match[str]) -> str:
hex_part = match.group(1)
suffix = match.group(2) or ""
return str(int(hex_part, 16)) + suffix
return pattern.sub(convert, text)
def _remove_keywords(text: str) -> str:
patterns = [
r"\b__fastcall\b",
r"\b__cdecl\b",
r"\b__ptr32\b",
r"\b__noreturn\s+noreturn\b",
]
combined = re.compile("|".join(patterns))
return combined.sub("", text)
def _replace_typedefs(text: str) -> str:
for alias, original in TYPEDEF_MAP.items():
pattern = re.compile(rf"\b{re.escape(alias)}\b")
text = pattern.sub(original, text)
return text
def _remove_comments(text: str) -> str:
text = re.sub(r"/\*.*?\*/", "", text, flags=re.DOTALL)
text = re.sub(r"//.*?$", "", text, flags=re.MULTILINE)
return text
def _process_code(code_str: str) -> str:
code_str = _remove_comments(code_str)
code_str = _hex_to_dec(code_str)
code_str = _remove_keywords(code_str)
code_str = _replace_typedefs(code_str)
return code_str
def _normalize_pseudo(text: str) -> str:
processed = _process_code(text)
if not processed.strip():
return ""
formatted = _format_with_clang(processed)
if formatted is None:
return ""
cleaned = _strip_empty(formatted)
if not cleaned or not _good_func(cleaned):
return ""
return cleaned
def _strip_comments_and_strings(text: str) -> str:
result = list(text)
i = 0
length = len(text)
while i < length:
nxt = text[i : i + 2]
ch = text[i]
if nxt == "//":
end = text.find("\n", i)
if end == -1:
end = length
for j in range(i, end):
result[j] = " "
i = end
continue
if nxt == "/*":
end = text.find("*/", i + 2)
if end == -1:
end = length - 2
for j in range(i, end + 2):
result[j] = " "
i = end + 2
continue
if ch in {'"', "'"}:
quote = ch
result[i] = " "
i += 1
while i < length:
c = text[i]
result[i] = " "
if c == "\\":
i += 2
continue
if c == quote:
i += 1
break
i += 1
continue
i += 1
return "".join(result)
def _find_matching_brace(text: str, start_idx: int) -> int:
depth = 0
i = start_idx
length = len(text)
while i < length:
nxt = text[i : i + 2]
ch = text[i]
if nxt == "//":
i = text.find("\n", i)
if i == -1:
return length - 1
continue
if nxt == "/*":
i = text.find("*/", i + 2)
if i == -1:
return length - 1
i += 2
continue
if ch in {'"', "'"}:
quote = ch
i += 1
while i < length:
c = text[i]
if c == "\\":
i += 2
continue
if c == quote:
i += 1
break
i += 1
continue
if ch == "{":
depth += 1
elif ch == "}":
depth -= 1
if depth == 0:
return i
i += 1
return length - 1
def _extract_source_functions(path: Path, repo_root: Path) -> Dict[str, Dict[str, str]]:
text = _read_text(path)
sanitized = _strip_comments_and_strings(text)
pattern = re.compile(
r"(?P<prefix>^|[;\n}])(?P<signature>[^{;}]*?)\b(?P<name>[A-Za-z_][\w]*)\s*\([^;{}]*\)\s*\{",
re.MULTILINE,
)
funcs: Dict[str, Dict[str, str]] = {}
for match in pattern.finditer(sanitized):
name = match.group("name")
if name in FUNC_KEYWORDS:
continue
brace_idx = sanitized.find("{", match.start("signature"))
if brace_idx == -1:
continue
end_idx = _find_matching_brace(text, brace_idx)
if end_idx <= brace_idx:
continue
start_idx = match.start("signature")
content = text[start_idx : end_idx + 1].strip("\n") + "\n"
funcs.setdefault(
name,
{
"path": str(path.relative_to(repo_root)),
"function_name": name,
"content": content,
},
)
return funcs
def _parse_makefile(makefile: Path) -> List[Path]:
text = _read_text(makefile)
prog_match = re.search(r"^PROG\s*=\s*(\S+)", text, flags=re.MULTILINE)
if not prog_match:
raise RuntimeError(f"PROG not found in {makefile}")
prog = prog_match.group(1).strip()
objs_match = re.search(r"^LOCAL_OBJS\s*=\s*(.*)$", text, flags=re.MULTILINE)
obj_tokens: List[str] = []
if objs_match:
obj_tokens = [token for token in objs_match.group(1).split() if token]
if not obj_tokens:
obj_tokens = [f"{prog}.o"]
src_paths: List[Path] = []
for token in obj_tokens:
if not token.endswith(".o"):
continue
candidate = makefile.parent / token.replace(".o", ".c")
if candidate.exists():
src_paths.append(candidate)
if not src_paths:
fallback = makefile.parent / f"{prog}.c"
if fallback.exists():
src_paths.append(fallback)
return src_paths
def _collect_source_functions(bench_dir: Path, repo_root: Path) -> Dict[str, Dict[str, str]]:
makefile = bench_dir / "Makefile"
srcs = _parse_makefile(makefile)
func_map: Dict[str, Dict[str, str]] = {}
for src in srcs:
func_map.update(_extract_source_functions(src, repo_root))
return func_map
def _parse_pseudo(pseudo_path: Path, repo_root: Path) -> Dict[str, Dict[str, str]]:
text = _read_text(pseudo_path)
lines = text.splitlines()
pattern = re.compile(r"^/\*\s*(?P<name>[^@]+?)\s*@\s*(?P<addr>0x[0-9a-fA-F]+)\s*\*/$")
current: Optional[str] = None
current_addr: Optional[str] = None
buffer: List[str] = []
out: Dict[str, Dict[str, str]] = {}
for raw_line in lines:
line = raw_line.strip()
match = pattern.match(line)
if match:
if current and buffer:
content = "\n".join(buffer).strip("\n") + "\n"
out.setdefault(
current,
{
"path": str(pseudo_path.relative_to(repo_root)),
"function_name": current,
"address": current_addr,
"label": current,
"content": content,
},
)
current = match.group("name").strip()
current_addr = match.group("addr")
buffer = []
else:
if current is not None:
buffer.append(raw_line)
if current and buffer:
content = "\n".join(buffer).strip("\n") + "\n"
out.setdefault(
current,
{
"path": str(pseudo_path.relative_to(repo_root)),
"function_name": current,
"address": current_addr,
"label": current,
"content": content,
},
)
return out
def _clean_instruction(raw: str) -> Optional[str]:
stripped = raw.strip()
if not stripped:
return None
parts = raw.split("\t")
if len(parts) >= 3:
relevant = parts[2:]
elif len(parts) == 2:
relevant = parts[1:]
else:
relevant = [stripped]
instr = "\t".join(relevant)
instr = instr.split("#")[0].strip()
if not instr:
return None
if all(c in "0123456789abcdefABCDEF" for c in instr.replace(" ", "")):
return None
return instr
def _clean_asm_block(name: str, lines: List[str]) -> str:
cleaned = [f"<{name}>:"]
for raw in lines[1:]:
instr = _clean_instruction(raw)
if instr:
cleaned.append(instr)
return "\n".join(cleaned) + "\n"
def _parse_assembly(asm_path: Path) -> Dict[str, str]:
lines = _read_text(asm_path).splitlines()
header = re.compile(r"^\s*([0-9a-fA-F]+)\s+<([^>]+)>:\s*$")
current: Optional[str] = None
buffer: List[str] = []
result: Dict[str, str] = {}
for line in lines:
match = header.match(line)
if match:
if current and buffer:
result.setdefault(current, _clean_asm_block(current, buffer))
current = match.group(2)
buffer = [line]
else:
if current is not None:
buffer.append(line)
if current and buffer:
result.setdefault(current, _clean_asm_block(current, buffer))
return result
def _discover_binaries(explicit: Optional[List[str]], repo_root: Path) -> List[Path]:
if explicit:
binaries: List[Path] = []
for entry in explicit:
candidate = Path(entry)
if not candidate.is_absolute():
candidate = repo_root / candidate
if candidate.exists():
binaries.append(candidate)
return binaries
matches = []
for path in repo_root.rglob("*.O*"):
suffix = path.suffix.lower()
if suffix in {".o0", ".o1", ".o2", ".o3"}:
matches.append(path)
return sorted(matches)
def _build_map(binary: Path, repo_root: Path) -> None:
pseudo_path = Path(str(binary) + ".pseudo")
asm_path = Path(str(binary) + ".s")
if not pseudo_path.exists() or not asm_path.exists():
print(f"[skip] Missing pseudo or assembly for {binary.relative_to(repo_root)}")
return
bench_dir = binary.parent
source_funcs = _collect_source_functions(bench_dir, repo_root)
pseudo_funcs = _parse_pseudo(pseudo_path, repo_root)
asm_funcs = _parse_assembly(asm_path)
common = sorted(set(source_funcs) & set(pseudo_funcs) & set(asm_funcs))
if not common:
print(f"[warn] No overlapping functions for {binary.relative_to(repo_root)}")
return
output_path = Path(str(binary) + ".func_map.jsonl")
rel_binary = str(binary.relative_to(repo_root))
with output_path.open("w", encoding="utf-8") as handle:
for name in common:
pseudo_entry = pseudo_funcs[name]
pseudo_norm = _normalize_pseudo(pseudo_entry.get("content", ""))
record = {
"source": source_funcs[name],
"pseudo": pseudo_entry,
"pseudo_normalize": pseudo_norm,
"binary": rel_binary,
"assembly": asm_funcs[name],
}
handle.write(json.dumps(record, ensure_ascii=False))
handle.write("\n")
print(f"[ok] {output_path.relative_to(repo_root)} -> {len(common)} functions")
def main(argv: List[str]) -> int:
parser = argparse.ArgumentParser(description="Map source/pseudo/assembly per function")
parser.add_argument(
"--binary",
action="append",
help="Specific binary path (relative to repo) to process; can be repeated.",
)
parser.add_argument(
"--bench-root",
default=None,
help="Path to the Bringup-Bench repository root (default: from config.env).",
)
args = parser.parse_args(argv)
repo_root = _get_bench_root(args.bench_root)
binaries = _discover_binaries(args.binary, repo_root)
if not binaries:
print("No binaries found", file=sys.stderr)
return 1
for binary in binaries:
_build_map(binary, repo_root)
return 0
if __name__ == "__main__":
raise SystemExit(main(sys.argv[1:]))

View file

@ -0,0 +1,24 @@
#!/bin/bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
EVAL_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
# Load config; allow environment overrides
if [[ -f "${EVAL_ROOT}/config.env" ]]; then
set -a
source "${EVAL_ROOT}/config.env"
set +a
fi
BENCH_REPO_ROOT="${BENCH_REPO_ROOT:?Set BENCH_REPO_ROOT in config.env or environment}"
cd "${BENCH_REPO_ROOT}"
for opt in 0 1 2 3; do
echo "==> Building host binaries with -O${opt}"
make TARGET=host OPT_CFLAGS="-O${opt} -g" run-tests
find . -maxdepth 2 -type f -name '*.host' -execdir mv {} {}.O${opt} \;
done
echo "All host optimization builds complete."

View file

@ -0,0 +1,21 @@
#!/bin/bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
EVAL_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
# Load config; allow environment overrides
if [[ -f "${EVAL_ROOT}/config.env" ]]; then
set -a
source "${EVAL_ROOT}/config.env"
set +a
fi
BENCH_REPO_ROOT="${BENCH_REPO_ROOT:?Set BENCH_REPO_ROOT in config.env or environment}"
cd "${BENCH_REPO_ROOT}"
echo "==> Running make all-clean"
make all-clean
echo "All benchmarks cleaned."

View file

@ -0,0 +1,50 @@
#!/bin/bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
EVAL_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
# Load config; allow environment overrides
if [[ -f "${EVAL_ROOT}/config.env" ]]; then
set -a
source "${EVAL_ROOT}/config.env"
set +a
fi
BENCH_REPO_ROOT="${BENCH_REPO_ROOT:?Set BENCH_REPO_ROOT in config.env or environment}"
IDA_BIN="${IDA_BIN:-/home/bairidreamer/software/IDA-Pro/idat}"
DUMP_SCRIPT="${EVAL_ROOT}/scripts/dump_pseudo.py"
if [[ ! -x "${IDA_BIN}" ]]; then
echo "error: IDA binary not found or not executable at ${IDA_BIN}" >&2
exit 1
fi
if [[ ! -f "${DUMP_SCRIPT}" ]]; then
echo "error: dump script not found at ${DUMP_SCRIPT}" >&2
exit 1
fi
readarray -t BINARIES < <(
find "${BENCH_REPO_ROOT}" -mindepth 2 -maxdepth 2 -type f \
\( -iname '*.o0' -o -iname '*.o1' -o -iname '*.o2' -o -iname '*.o3' \) \
! -path "${BENCH_REPO_ROOT}/scripts/*" \
! -path "${BENCH_REPO_ROOT}/target/*" \
! -path "${BENCH_REPO_ROOT}/common/*" \
! -path "${BENCH_REPO_ROOT}/.git/*" \
| sort
)
if [[ ${#BINARIES[@]} -eq 0 ]]; then
echo "error: no O0/O1/O2/O3 binaries found under ${BENCH_REPO_ROOT}" >&2
exit 1
fi
for binary_path in "${BINARIES[@]}"; do
output_path="${binary_path}.pseudo"
echo "==> Decompiling ${binary_path#${BENCH_REPO_ROOT}/} -> ${output_path#${BENCH_REPO_ROOT}/}"
"${IDA_BIN}" -A "-S${DUMP_SCRIPT} ${output_path}" "${binary_path}"
done
echo "All pseudocode dumps are located alongside their binaries."

View file

@ -0,0 +1,66 @@
#!/bin/bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
EVAL_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
# Load config; allow environment overrides
if [[ -f "${EVAL_ROOT}/config.env" ]]; then
set -a
source "${EVAL_ROOT}/config.env"
set +a
fi
BENCH_REPO_ROOT="${BENCH_REPO_ROOT:?Set BENCH_REPO_ROOT in config.env or environment}"
OBJDUMP_BIN="${OBJDUMP:-objdump}"
NUM_JOBS="${JOBS:-}"
if ! command -v "${OBJDUMP_BIN}" >/dev/null 2>&1; then
echo "error: objdump binary '${OBJDUMP_BIN}' not found" >&2
exit 1
fi
if [[ -z "${NUM_JOBS}" ]]; then
if command -v nproc >/dev/null 2>&1; then
NUM_JOBS="$(nproc)"
elif [[ "$(uname)" == "Darwin" ]]; then
NUM_JOBS="$(sysctl -n hw.ncpu)"
else
NUM_JOBS=4
fi
fi
if ! [[ "${NUM_JOBS}" =~ ^[0-9]+$ ]] || (( NUM_JOBS <= 0 )); then
echo "error: invalid JOBS value '${NUM_JOBS}'" >&2
exit 1
fi
readarray -t BINARIES < <(
find "${BENCH_REPO_ROOT}" -mindepth 2 -maxdepth 2 -type f \
\( -iname '*.o0' -o -iname '*.o1' -o -iname '*.o2' -o -iname '*.o3' \) \
! -path "${BENCH_REPO_ROOT}/scripts/*" \
! -path "${BENCH_REPO_ROOT}/target/*" \
! -path "${BENCH_REPO_ROOT}/common/*" \
! -path "${BENCH_REPO_ROOT}/.git/*" \
| sort
)
if [[ ${#BINARIES[@]} -eq 0 ]]; then
echo "error: no O0/O1/O2/O3 binaries found under ${BENCH_REPO_ROOT}" >&2
exit 1
fi
export OBJDUMP_BIN BENCH_REPO_ROOT
printf '%s\0' "${BINARIES[@]}" | xargs -0 -n1 -P "${NUM_JOBS}" bash -c '
binary_path="$1"
bench_repo_root="${BENCH_REPO_ROOT}"
output_path="${binary_path}.s"
rel_in="${binary_path#"${bench_repo_root}/"}"
rel_out="${output_path#"${bench_repo_root}/"}"
echo "==> Disassembling ${rel_in} -> ${rel_out}"
"${OBJDUMP_BIN}" -d "${binary_path}" > "${output_path}"
' _
echo "Assembly listings written alongside each binary (extension .s)."

View file

@ -0,0 +1,62 @@
"""
Headless IDA/Hex-Rays helper to dump pseudocode for every discovered function.
Usage (from shell):
idat -A -S"scripts/dump_pseudo.py /path/to/output" /path/to/binary
"""
from __future__ import annotations
import os
import sys
import ida_auto
import ida_funcs
import ida_hexrays
import ida_pro
import idautils
import idc
def _get_output_path() -> str:
# IDA populates idc.ARGV with the script path at index 0 and the
# user-provided arguments afterwards.
if len(idc.ARGV) < 2:
raise RuntimeError("output path argument missing")
return os.path.abspath(idc.ARGV[1])
def main() -> None:
try:
output_path = _get_output_path()
except Exception as exc: # pragma: no cover - defensive
print(f"[dump_pseudo] {exc}", file=sys.stderr)
ida_pro.qexit(1)
return
ida_auto.auto_wait()
if not ida_hexrays.init_hexrays_plugin():
print("[dump_pseudo] Hex-Rays decompiler is unavailable", file=sys.stderr)
ida_pro.qexit(1)
return
os.makedirs(os.path.dirname(output_path), exist_ok=True)
with open(output_path, "w", encoding="utf-8") as handle:
for ea in idautils.Functions():
name = ida_funcs.get_func_name(ea)
handle.write(f"/* {name} @ 0x{ea:x} */\n")
try:
cfunc = ida_hexrays.decompile(ea)
except ida_hexrays.DecompilationFailure as exc:
handle.write(f"// decompilation failed: {exc}\n\n")
continue
handle.write(str(cfunc))
handle.write("\n\n")
ida_pro.qexit(0)
if __name__ == "__main__":
main()

View file

@ -0,0 +1,682 @@
#!/usr/bin/env python3
"""
Evaluate infer-out-model2 functions by patching benchmark sources inside an
isolated workspace, rebuilding, executing, and collecting structured logs for
every case listed in a JSONL file.
"""
from __future__ import annotations
import argparse
import json
import os
import re
import shutil
import subprocess
import sys
from concurrent.futures import ThreadPoolExecutor, as_completed
from dataclasses import asdict, dataclass, field
from datetime import datetime
from pathlib import Path
from typing import Dict, Iterable, List, Optional, Tuple
def _load_config_env() -> dict:
"""Load config.env from the eval project root."""
eval_root = Path(__file__).resolve().parents[1]
config_path = eval_root / "config.env"
config = {}
if config_path.exists():
for line in config_path.read_text().splitlines():
line = line.strip()
if not line or line.startswith("#"):
continue
if "=" in line:
key, _, value = line.partition("=")
config[key.strip()] = value.strip()
return config
def _get_bench_root(cli_value: str | None = None) -> Path:
"""Resolve the benchmark repo root from CLI arg, env var, or config.env."""
if cli_value:
return Path(cli_value).resolve()
env_val = os.environ.get("BENCH_REPO_ROOT")
if env_val:
return Path(env_val).resolve()
config = _load_config_env()
if "BENCH_REPO_ROOT" in config:
return Path(config["BENCH_REPO_ROOT"]).resolve()
sys.exit("error: BENCH_REPO_ROOT not set. Use --bench-root, set the env var, or configure config.env")
@dataclass
class CaseResult:
"""Container for the outcome of processing a single case."""
case_id: str
source_path: str
benchmark_dir: str
output_dir: str
workspace_dir: str = ""
artifact_dir: str = ""
replacement_applied: bool = False
build_status: str = "skipped" # succeeded | failed | skipped
test_status: str = "skipped"
notes: List[str] = field(default_factory=list)
errors: List[str] = field(default_factory=list)
log_files: Dict[str, str] = field(default_factory=dict)
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(
description="Replace functions with infer-out-model2 bodies, build, "
"execute, and record results without modifying the original benchmarks."
)
parser.add_argument(
"jsonl",
help="Path to the merged.*.jsonl file containing cases to evaluate.",
)
parser.add_argument(
"--bench-root",
default=None,
help="Path to the Bringup-Bench repository root (default: from config.env).",
)
parser.add_argument(
"--limit",
type=int,
default=None,
help="Optional limit on the number of cases to process.",
)
parser.add_argument(
"--target",
default="host",
help="Benchmark build target passed as TARGET=<target> (default: host).",
)
parser.add_argument(
"--report-dir",
default="reports/infer_out_eval",
help="Directory (relative to eval root) where aggregated reports are written.",
)
parser.add_argument(
"--workspace-root",
default="reports/infer_out_eval/workspaces",
help="Directory (relative to eval root) to host temporary build workspaces.",
)
parser.add_argument(
"--skip-clean",
action="store_true",
help="Skip running 'make clean' inside the workspace (useful when iterating).",
)
parser.add_argument(
"--keep-workspaces",
action="store_true",
help="Keep temporary workspaces after each case finishes (default removes them).",
)
parser.add_argument(
"--command-timeout",
type=int,
default=20,
help="Timeout (in seconds) for each make invocation; 0 disables the timeout.",
)
parser.add_argument(
"--jobs",
type=int,
default=96,
help="Number of cases to process in parallel (default: 1).",
)
return parser.parse_args()
def canonicalize(text: str) -> str:
"""Normalize newlines for reliable substring matching."""
return text.replace("\r\n", "\n")
def replace_function_body(
full_source: str, reference_function: str, inferred_function: str
) -> Tuple[str, bool]:
"""
Replace the exact reference_function text with inferred_function.
Returns the updated source and a boolean indicating if replacement happened.
"""
source_norm = canonicalize(full_source)
reference_norm = canonicalize(reference_function)
inferred_norm = canonicalize(inferred_function).rstrip() + "\n"
candidates = (
reference_norm,
reference_norm.rstrip() + "\n",
reference_norm.strip(),
)
for snippet in candidates:
start_idx = source_norm.find(snippet)
if start_idx == -1:
continue
end_idx = start_idx + len(snippet)
updated = source_norm[:start_idx] + inferred_norm + source_norm[end_idx:]
return updated, True
return full_source, False
def compose_case_id(case: Dict) -> str:
"""Build a stable identifier for a case."""
return (
f"{case['source']['path']}::{case['source']['function_name']}"
f"@{case['pseudo']['address']}"
)
def ensure_case_output_dir(
output_root: Path, pseudo_path_str: str, pseudo_address: str, result: CaseResult
) -> Path:
"""Create the per-case output directory, handling file path collisions."""
pseudo_rel = Path(pseudo_path_str)
base_dir = output_root / pseudo_rel
if base_dir.exists() and base_dir.is_file():
fallback = base_dir.parent / f"{base_dir.name}.infer_eval"
fallback.mkdir(parents=True, exist_ok=True)
result.notes.append(
f"pseudo.path '{pseudo_path_str}' is a file; using '{fallback.relative_to(output_root)}' for logs."
)
base_dir = fallback
else:
base_dir.mkdir(parents=True, exist_ok=True)
case_dir = base_dir / pseudo_address
if case_dir.exists():
shutil.rmtree(case_dir)
case_dir.mkdir(parents=True, exist_ok=True)
return case_dir
def run_command(
command: List[str],
cwd: Path,
log_handle,
step_name: str,
timeout: Optional[int],
) -> Optional[int]:
"""Run a command, capture stdout/stderr, and write everything to log_handle."""
log_handle.write(f"\n[{step_name}] $ {' '.join(command)}\n")
log_handle.flush()
try:
completed = subprocess.run(
command,
cwd=str(cwd),
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
text=True,
encoding="utf-8",
errors="replace",
timeout=timeout if timeout and timeout > 0 else None,
)
log_handle.write(completed.stdout)
log_handle.write(f"[{step_name}] exit code: {completed.returncode}\n")
log_handle.flush()
return completed.returncode
except subprocess.TimeoutExpired as exc:
output = exc.output or exc.stdout
if output:
if isinstance(output, bytes):
log_handle.write(output.decode("utf-8", "replace"))
else:
log_handle.write(output)
log_handle.write(
f"[{step_name}] timed out after {timeout} seconds; terminating process.\n"
)
log_handle.flush()
return None
def write_case_artifacts(
case_dir: Path,
case: Dict,
modified_source: str,
original_source: str,
) -> None:
"""Persist reusable artifacts for a case."""
(case_dir / "case.json").write_text(json.dumps(case, indent=2), encoding="utf-8")
(case_dir / "modified_source.c").write_text(modified_source, encoding="utf-8")
(case_dir / "original_source.c").write_text(original_source, encoding="utf-8")
(case_dir / "original_function.c").write_text(
canonicalize(case["source"]["content"]), encoding="utf-8"
)
(case_dir / "infer_function.c").write_text(
canonicalize(case["pseudo"]["content-fix"]), encoding="utf-8"
)
def sanitize_case_id(case_id: str) -> str:
"""Generate filesystem-safe case identifier."""
sanitized = re.sub(r"[^A-Za-z0-9._-]+", "_", case_id)
return sanitized.strip("_") or "case"
def copy_ignore_eval_dirs(_src: str, names: List[str]) -> List[str]:
"""Ignore helper to skip evaluation artifacts when copying benchmark dirs."""
ignored: List[str] = []
for name in names:
if name.endswith(".infer_eval"):
ignored.append(name)
return ignored
def prepare_workspace(
repo_root: Path,
benchmark_dir: Path,
workspace_root: Path,
case_id: str,
) -> Tuple[Path, Path]:
"""Clone the necessary subset of the repo into a temporary workspace."""
workspace_case_root = workspace_root / sanitize_case_id(case_id)
if workspace_case_root.exists():
shutil.rmtree(workspace_case_root)
workspace_repo_root = workspace_case_root / "repo"
workspace_repo_root.mkdir(parents=True, exist_ok=True)
shutil.copy2(repo_root / "Makefile", workspace_repo_root / "Makefile")
shutil.copytree(repo_root / "common", workspace_repo_root / "common", dirs_exist_ok=True)
shutil.copytree(repo_root / "target", workspace_repo_root / "target", dirs_exist_ok=True)
shutil.copytree(
benchmark_dir,
workspace_repo_root / benchmark_dir.name,
dirs_exist_ok=True,
ignore=copy_ignore_eval_dirs,
)
return workspace_case_root, workspace_repo_root
def relative_to_repo(path: Path, repo_root: Path) -> str:
"""Return a path relative to repo_root when possible."""
try:
return str(path.relative_to(repo_root))
except ValueError:
return str(path)
def init_case_result(case: Dict, repo_root: Path) -> CaseResult:
"""Create a CaseResult with basic metadata for the given case."""
source_rel = Path(case["source"]["path"])
benchmark_dir_path = (repo_root / source_rel).parent
try:
benchmark_rel = str(benchmark_dir_path.relative_to(repo_root))
except ValueError:
benchmark_rel = str(benchmark_dir_path)
return CaseResult(
case_id=compose_case_id(case),
source_path=str(source_rel),
benchmark_dir=benchmark_rel,
output_dir="",
)
def snapshot_artifacts(
case_dir: Path,
workspace_benchmark_dir: Path,
eval_root: Path,
result: CaseResult,
) -> None:
"""Copy the workspace benchmark directory into the case directory."""
artifacts_dir = case_dir / "artifacts"
if artifacts_dir.exists():
shutil.rmtree(artifacts_dir)
try:
shutil.copytree(workspace_benchmark_dir, artifacts_dir)
result.artifact_dir = relative_to_repo(artifacts_dir, eval_root)
except Exception as exc: # pragma: no cover - defensive
result.notes.append(f"Failed to copy artifacts: {exc}")
def process_case(
case: Dict,
args: argparse.Namespace,
repo_root: Path,
eval_root: Path,
) -> CaseResult:
"""Process a single JSONL entry."""
case_id = compose_case_id(case)
source_rel = Path(case["source"]["path"])
source_path = repo_root / source_rel
benchmark_dir = source_path.parent
result = init_case_result(case, repo_root)
if not source_path.exists():
result.errors.append(f"Source file '{source_rel}' does not exist.")
return result
try:
case_dir = ensure_case_output_dir(
eval_root, case["pseudo"]["path"], case["pseudo"]["address"], result
)
except Exception as exc: # pragma: no cover - defensive
result.errors.append(f"Failed to prepare case directory: {exc}")
return result
result.output_dir = str(case_dir.relative_to(eval_root))
full_source_text = source_path.read_text(encoding="utf-8")
updated_source, replaced = replace_function_body(
full_source_text,
case["source"]["content"],
case["pseudo"]["content-fix"],
)
if not replaced:
result.errors.append(
"Could not locate the original function snippet in source file."
)
return result
result.replacement_applied = True
write_case_artifacts(case_dir, case, updated_source, full_source_text)
workspace_root = Path(args.workspace_root)
if not workspace_root.is_absolute():
workspace_root = eval_root / workspace_root
workspace_root.mkdir(parents=True, exist_ok=True)
workspace_case_root: Optional[Path] = None
try:
workspace_case_root, workspace_repo_root = prepare_workspace(
repo_root, benchmark_dir, workspace_root, case_id
)
workspace_benchmark_dir = workspace_repo_root / benchmark_dir.name
artifacts_captured = False
def capture_artifacts() -> None:
nonlocal artifacts_captured
if artifacts_captured:
return
snapshot_artifacts(case_dir, workspace_benchmark_dir, eval_root, result)
artifacts_captured = True
workspace_source_path = workspace_repo_root / source_rel
workspace_source_path.write_text(updated_source, encoding="utf-8")
result.workspace_dir = relative_to_repo(workspace_case_root, eval_root)
log_path = case_dir / "case.log"
with log_path.open("w", encoding="utf-8") as log_handle:
log_handle.write(f"Case: {case_id}\n")
log_handle.write(f"Workspace: {workspace_case_root}\n")
log_handle.write(f"Benchmark copy: {workspace_benchmark_dir}\n")
log_handle.write(f"Target: {args.target}\n")
log_handle.flush()
if not args.skip_clean:
clean_rc = run_command(
["make", f"TARGET={args.target}", "clean"],
workspace_benchmark_dir,
log_handle,
"clean",
args.command_timeout,
)
if clean_rc is None:
result.errors.append(
f"'make clean' timed out after {args.command_timeout} seconds."
)
capture_artifacts()
result.log_files["case"] = relative_to_repo(log_path, eval_root)
return result
if clean_rc != 0:
result.build_status = "failed"
result.errors.append("make clean failed.")
capture_artifacts()
result.log_files["case"] = relative_to_repo(log_path, eval_root)
return result
else:
log_handle.write("Skipping 'make clean' per --skip-clean flag.\n")
build_rc = run_command(
["make", f"TARGET={args.target}", "build"],
workspace_benchmark_dir,
log_handle,
"build",
args.command_timeout,
)
result.log_files["case"] = relative_to_repo(log_path, eval_root)
if build_rc is None:
result.build_status = "failed"
result.errors.append(
f"'make build' timed out after {args.command_timeout} seconds."
)
capture_artifacts()
log_handle.write("Skipping test because build timed out.\n")
return result
if build_rc == 0:
result.build_status = "succeeded"
else:
result.build_status = "failed"
result.errors.append("make build failed.")
log_handle.write("Skipping test because build failed.\n")
capture_artifacts()
return result
test_rc = run_command(
["make", f"TARGET={args.target}", "test"],
workspace_benchmark_dir,
log_handle,
"test",
args.command_timeout,
)
if test_rc is None:
result.test_status = "failed"
result.errors.append(
f"'make test' timed out after {args.command_timeout} seconds."
)
elif test_rc == 0:
result.test_status = "succeeded"
else:
result.test_status = "failed"
result.errors.append("make test failed.")
capture_artifacts()
finally:
if (
workspace_case_root
and workspace_case_root.exists()
and not args.keep_workspaces
):
shutil.rmtree(workspace_case_root, ignore_errors=True)
return result
def collect_cases(jsonl_path: Path, limit: Optional[int]) -> Iterable[Dict]:
"""Yield cases from jsonl file respecting the optional limit."""
processed = 0
with jsonl_path.open("r", encoding="utf-8") as handle:
for line in handle:
stripped = line.strip()
if not stripped:
continue
yield json.loads(stripped)
processed += 1
if limit is not None and processed >= limit:
break
def compute_summary(results: List[CaseResult]) -> Dict:
"""Aggregate statistics over all case results."""
total = len(results)
replacements = sum(1 for r in results if r.replacement_applied)
build_success = sum(1 for r in results if r.build_status == "succeeded")
test_success = sum(1 for r in results if r.test_status == "succeeded")
def frac(passed: int, denom: int) -> float:
return round(passed / denom, 4) if denom else 0.0
per_benchmark: Dict[str, Dict[str, float]] = {}
for r in results:
stats = per_benchmark.setdefault(
r.benchmark_dir,
{
"cases": 0,
"replacements": 0,
"build_success": 0,
"test_success": 0,
},
)
stats["cases"] += 1
if r.replacement_applied:
stats["replacements"] += 1
if r.build_status == "succeeded":
stats["build_success"] += 1
if r.test_status == "succeeded":
stats["test_success"] += 1
for stats in per_benchmark.values():
stats["replacement_rate"] = frac(stats["replacements"], stats["cases"])
stats["build_rate"] = frac(stats["build_success"], stats["cases"])
stats["test_rate"] = frac(stats["test_success"], stats["cases"])
summary = {
"total_cases": total,
"replacement_success_count": replacements,
"replacement_success_rate": frac(replacements, total),
"compilable_count": build_success,
"compilable_rate": frac(build_success, total),
"executable_count": test_success,
"executable_rate": frac(test_success, total),
"compilation_failures": [
r.case_id for r in results if r.build_status == "failed"
],
"execution_failures": [
r.case_id
for r in results
if r.build_status == "succeeded" and r.test_status == "failed"
],
"cases": [asdict(r) for r in results],
"by_benchmark": per_benchmark,
}
return summary
def write_summary(
eval_root: Path,
args: argparse.Namespace,
jsonl_path: Path,
summary: Dict,
) -> Tuple[Path, Path]:
"""Write JSON and Markdown summary reports."""
report_root = eval_root / args.report_dir
report_root.mkdir(parents=True, exist_ok=True)
timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
base_name = f"{jsonl_path.stem}-{args.target}"
json_report = report_root / f"{base_name}-{timestamp}.json"
markdown_report = report_root / f"{base_name}-{timestamp}.md"
json_report.write_text(json.dumps(summary, indent=2), encoding="utf-8")
benchmark_lines = [
"| Benchmark | Cases | Replacement% | Build% | Exec% |",
"| --- | --- | --- | --- | --- |",
]
for bench, stats in sorted(summary["by_benchmark"].items()):
benchmark_lines.append(
f"| {bench} | {stats['cases']} | "
f"{stats['replacement_rate']*100:.2f}% | "
f"{stats['build_rate']*100:.2f}% | "
f"{stats['test_rate']*100:.2f}% |"
)
if len(benchmark_lines) == 2:
benchmark_lines.append("| (none) | 0 | 0.00% | 0.00% | 0.00% |")
compilation_items = summary["compilation_failures"] or ["None"]
execution_items = summary["execution_failures"] or ["None"]
relative_jsonl = relative_to_repo(jsonl_path, eval_root)
lines = [
f"# Infer-Out Model 2 Evaluation ({base_name})",
"",
f"- Timestamp: {timestamp}",
f"- Source JSONL: {relative_jsonl}",
f"- Target: {args.target}",
f"- Total cases: {summary['total_cases']}",
f"- Replacement success: {summary['replacement_success_count']} "
f"({summary['replacement_success_rate']*100:.2f}%)",
f"- Compilable: {summary['compilable_count']} "
f"({summary['compilable_rate']*100:.2f}%)",
f"- Executable: {summary['executable_count']} "
f"({summary['executable_rate']*100:.2f}%)",
"",
"## Benchmark Breakdown",
*benchmark_lines,
"",
"## Compilation Failures",
]
lines.extend(f"- {cid}" for cid in compilation_items)
lines.append("")
lines.append("## Execution Failures")
lines.extend(f"- {cid}" for cid in execution_items)
markdown_report.write_text("\n".join(lines), encoding="utf-8")
return json_report, markdown_report
def main() -> int:
args = parse_args()
eval_root = Path(__file__).resolve().parents[1]
repo_root = _get_bench_root(args.bench_root)
jsonl_path = Path(args.jsonl)
if not jsonl_path.is_absolute():
jsonl_path = eval_root / jsonl_path
if not jsonl_path.exists():
print(f"JSONL file '{jsonl_path}' not found.", file=sys.stderr)
return 1
cases = list(collect_cases(jsonl_path, args.limit))
if not cases:
print("No cases to process.")
return 0
results: List[Optional[CaseResult]] = [None] * len(cases)
def record_result(idx: int, case_result: CaseResult) -> None:
results[idx] = case_result
status = (
f"build={case_result.build_status}, test={case_result.test_status}"
if case_result.replacement_applied
else "replacement_failed"
)
print(f"[{idx + 1}] {case_result.case_id}: {status}")
if args.jobs <= 1:
for idx, case in enumerate(cases):
case_result = process_case(case, args, repo_root, eval_root)
record_result(idx, case_result)
else:
with ThreadPoolExecutor(max_workers=args.jobs) as executor:
future_to_idx = {
executor.submit(process_case, case, args, repo_root, eval_root): idx
for idx, case in enumerate(cases)
}
for future in as_completed(future_to_idx):
idx = future_to_idx[future]
try:
case_result = future.result()
except Exception as exc: # pragma: no cover - defensive
case_result = init_case_result(cases[idx], repo_root)
case_result.errors.append(f"Unhandled exception: {exc}")
record_result(idx, case_result)
final_results = [res for res in results if res is not None]
summary = compute_summary(final_results)
json_report, markdown_report = write_summary(eval_root, args, jsonl_path, summary)
print(f"Wrote summary reports:\n - {json_report}\n - {markdown_report}")
return 0
if __name__ == "__main__":
raise SystemExit(main())

View file

@ -0,0 +1,180 @@
# SK²Decompile — Reinforcement Learning with VERL
This directory contains the RL (Reinforcement Learning) training pipeline for SK²Decompile, built on top of the [VERL](https://github.com/volcengine/verl) framework (Sheng et al., 2024).
For the full methodology and experimental details, please refer to our paper:
> **SK²Decompile: LLM-based Two-Phase Binary Decompilation from Skeleton to Skin**
> [[arXiv:2509.22114]](https://arxiv.org/abs/2509.22114)
## Overview
After supervised fine-tuning (SFT), SK²Decompile applies reinforcement learning to further align each phase's model with task-specific objectives. We adopt the **GRPO** (Group Relative Policy Optimization) algorithm (DeepSeek-AI et al., 2025) to train both models with their respective reward signals:
- **Structure Recovery** (Skeleton): The reward is based on compiler feedback — a positive reward is granted only if the generated IR successfully compiles, with an additional component reflecting the correctness of placeholder recovery (Equation 3 in the paper).
- **Identifier Naming** (Skin): The reward is the cosine similarity between the embeddings of the generated code and the reference source code, encouraging semantically aligned identifier predictions rather than exact lexical matches (Equation 4 in the paper).
The reward functions and training scripts provided here are **reference implementations** for reproducing the RL training pipeline. For the precise reward formulations and design rationale, please refer to Section 3.5 of the paper.
## Directory Structure
```
SK2DECOMPILE/
├── README.md # This file
├── data/
│ └── sk2decompile-rl-examples.jsonl # Example RL training data
├── reward_functions/ # Reference reward implementations
│ ├── __init__.py
│ ├── exe_type.py # Example: compilability + placeholder Jaccard
│ ├── sim_exe.py # Example: compilability + word-level similarity
│ ├── embedding_gte.py # Example: embedding-based identifier similarity (GTE)
│ └── embedding_qwen3.py # Example: embedding-based identifier similarity (Qwen3)
└── scripts/
├── run_struct_rl.sh # Reference script: Structure Recovery RL
└── run_ident_rl.sh # Reference script: Identifier Naming RL
```
## Reward Formulations (from the Paper)
### Structure Recovery Reward (Eq. 3)
The Structure Recovery reward consists of two components:
1. **Compilability**: The generated IR is compiled using the ground-truth header. A reward of 1.0 is granted only upon successful compilation (verified via [Psyche-C](https://github.com/ltcmelo/psychec.git) for header generation).
2. **Placeholder Recovery**: The Jaccard similarity between the generated placeholder set (I_gen) and the ground-truth set (I_IR).
```
r_placeholder = |I_gen ∩ I_IR| / |I_gen I_IR|
r_structure = { 0.0, if IR cannot be compiled
{ 1.0 + r_placeholder, if IR can be compiled
```
### Identifier Naming Reward (Eq. 4)
The Identifier Naming reward measures the semantic similarity between the generated code and the reference source code using embedding cosine similarity:
```
r_identifier = cos(e_gen, e_src) = (e_gen · e_src) / (||e_gen|| · ||e_src||)
```
where `e_gen` and `e_src` are the embeddings of the generated and reference code respectively. In our experiments, we use qwen-embedding-0.6B (Zhang et al., 2025) as the embedding model.
> **Note**: The reward functions in `reward_functions/` are reference implementations that demonstrate the reward design. Please refer to Section 3.5 of the paper for the complete formulation and design rationale.
## Reproduction Guide
### Step 1: Install VERL
Our RL training is based on **VERL v0.4.1** ([HybridFlow](https://github.com/volcengine/verl), Sheng et al., 2024). We recommend using the same version for reproducibility.
```bash
git clone https://github.com/volcengine/verl.git
cd verl
git checkout v0.4.1 # or the commit closest to v0.4.1
pip install -e .
```
### Step 2: Integrate Reward Functions
Copy the reward functions into VERL's reward module and register them in the routing dispatcher:
```bash
# Copy reward functions
cp reward_functions/exe_type.py <VERL_DIR>/verl/utils/reward_score/sk2d_exe_type.py
cp reward_functions/sim_exe.py <VERL_DIR>/verl/utils/reward_score/sk2d_sim_exe.py
cp reward_functions/embedding_gte.py <VERL_DIR>/verl/utils/reward_score/sk2d_embedding_gte.py
cp reward_functions/embedding_qwen3.py <VERL_DIR>/verl/utils/reward_score/sk2d_embedding_qwen3.py
```
Then add routing branches to `<VERL_DIR>/verl/utils/reward_score/__init__.py` in the `default_compute_score()` function:
```python
# Structure Recovery reward (example)
elif data_source == "sk2decompile_structure":
from . import sk2d_exe_type
res = sk2d_exe_type.compute_score(solution_str, ground_truth, extra_info)
# Identifier Naming reward (example)
elif data_source == "sk2decompile_identifier":
from . import sk2d_embedding_qwen3
res = sk2d_embedding_qwen3.compute_score(solution_str, ground_truth, extra_info)
```
The `data_source` field in your training Parquet files determines which reward function is dispatched for each sample.
### Step 3: Prepare Training Data
Training data should be in Parquet format. Each row contains:
| Field | Description |
|-------|-------------|
| `prompt` | Chat-format messages, e.g., `[{"role": "user", "content": "<pseudocode>... What is the source code?"}]` |
| `data_source` | Reward function routing key (must match the branch registered in Step 2) |
| `reward_model.ground_truth` | Expected output (IR for Structure Recovery, source code for Identifier Naming) |
| `reward_model.style` | `"rule"` (rule-based reward) |
| `extra_info.header` | C header declarations for compilability checking (Structure Recovery only) |
See `data/sk2decompile-rl-examples.jsonl` for example data format. Convert JSONL to Parquet before training.
### Step 4: Launch Training
The reference training scripts are in `scripts/`. Edit the configuration variables at the top of each script before launching.
**Structure Recovery RL:**
```bash
# Edit scripts/run_struct_rl.sh to set:
# VERL_DIR, VENV_PATH, MODEL_PATH, TRAIN_DATA, VAL_DATA, WANDB_*
bash scripts/run_struct_rl.sh
```
**Identifier Naming RL** (requires a running embedding server):
```bash
# 1. Start the embedding server
python -m vllm.entrypoints.openai.api_server \
--model Qwen3-Embedding-0.6B --port 8000 --dtype float16
# 2. Edit scripts/run_ident_rl.sh to set:
# VERL_DIR, VENV_PATH, MODEL_PATH, TRAIN_DATA, VAL_DATA, WANDB_*
bash scripts/run_ident_rl.sh
```
### Step 5: Install Additional Dependencies
```bash
# For compiler-based rewards (Structure Recovery)
apt install gcc
pip install psychec # or build from https://github.com/ltcmelo/psychec.git
# For embedding-based rewards (Identifier Naming)
pip install tree-sitter==0.24.0 tree-sitter-c==0.23.4 openai
```
## Configurations
Reference hyperparameters used in the training scripts:
| Parameter | Structure Recovery | Identifier Naming |
|-----------|:-:|:-:|
| `train_batch_size` | 128 | 128 |
| `max_prompt_length` | 1024 | 1024 |
| `max_response_length` | 2048 | 2048 |
| `lr` | 1e-6 | 1e-6 |
| `kl_loss_coef` | 0.01 | 0.02 |
| `kl_loss_type` | low_var_kl | low_var_kl |
| `rollout.n` (GRPO samples) | 16 | 16 |
| `total_epochs` | 2 | 2 |
## Troubleshooting
**OOM (Out of Memory)**:
- Reduce `ppo_micro_batch_size_per_gpu` (default: 4)
- Enable `actor.fsdp_config.param_offload=True`
- Reduce `rollout.gpu_memory_utilization` (default: 0.80)
**Embedding server connection error** (Identifier Naming only):
- Ensure the vLLM embedding server is running on port 8000
- Check environment variables: `QWEN3_EMBEDDING_API_BASE` (default: `http://127.0.0.1:8000/v1`)
**Compilation timeout in reward** (Structure Recovery only):
- The `gcc -c` call has a 5-second timeout per sample
- If many samples timeout, check if the generated code contains infinite loops

View file

@ -0,0 +1,23 @@
"""
SK2Decompile Reference Reward Functions for GRPO Training.
This module provides reference implementations of reward functions used in the
SK2Decompile RL training pipeline. These are example implementations that
demonstrate the reward design described in Section 3.5 of the paper:
SK2Decompile: LLM-based Two-Phase Binary Decompilation from Skeleton to Skin
(arXiv:2509.22114)
Reference implementations:
- exe_type: Compilability + placeholder identifier Jaccard similarity
- sim_exe: Compilability + word-level Jaccard similarity
- embedding_gte: Tree-sitter identifier extraction + GTE embedding cosine similarity
- embedding_qwen3: Tree-sitter identifier extraction + Qwen3 embedding cosine similarity
To integrate into VERL, copy these files into verl/utils/reward_score/ and
register routing branches in __init__.py. See README.md for details.
"""
from . import exe_type, sim_exe, embedding_gte, embedding_qwen3
__all__ = ["exe_type", "sim_exe", "embedding_gte", "embedding_qwen3"]

View file

@ -0,0 +1,189 @@
"""
Reference reward function: GTE Embedding-based Identifier Similarity.
This is a reference implementation of the Identifier Naming reward (Eq. 4)
described in the SK2Decompile paper (arXiv:2509.22114, Section 3.5).
Evaluates decompiled C code by:
1. Using tree-sitter to parse C code and extract identifiers (func/var/type/field)
2. Building a naming summary string per code sample
3. Computing cosine similarity between GTE embeddings of the two summaries
4. Squaring the similarity score to sharpen the reward signal
Final score = cosine_similarity^2
Requires:
- A running OpenAI-compatible embedding server (e.g., vLLM serving gte-large-en-v1.5)
- tree-sitter and tree-sitter-c packages
Environment variables:
- GTE_EMBEDDING_MODEL_PATH: Model name/path (default: "gte-large-en-v1.5")
- GTE_EMBEDDING_API_KEY or OPENAI_API_KEY: API key (default: "none")
- GTE_EMBEDDING_API_BASE: API base URL (default: "http://127.0.0.1:8000/v1")
"""
import math
import os
import random
from typing import Dict, List, Optional, Sequence, Tuple
from openai import OpenAI
from tree_sitter import Language, Parser
import tree_sitter_c as tsc
# ---- OpenAI Embedding Client ----
_MODEL_NAME = os.getenv("GTE_EMBEDDING_MODEL_PATH", "gte-large-en-v1.5")
_API_KEY = os.getenv("GTE_EMBEDDING_API_KEY") or os.getenv("OPENAI_API_KEY") or "none"
_API_BASE = os.getenv("GTE_EMBEDDING_API_BASE", "http://127.0.0.1:8000/v1")
_client: Optional[OpenAI] = None
def _get_client() -> OpenAI:
global _client
if _client is None:
if _API_BASE:
_client = OpenAI(api_key=_API_KEY, base_url=_API_BASE)
elif _API_KEY:
_client = OpenAI(api_key=_API_KEY)
else:
_client = OpenAI()
return _client
def _embed_two(text_a: str, text_b: str) -> Tuple[List[float], List[float]]:
"""Embed two texts in a single API call, return their embedding vectors."""
client = _get_client()
resp = client.embeddings.create(model=_MODEL_NAME, input=[text_a, text_b])
emb_a = [float(x) for x in resp.data[0].embedding]
emb_b = [float(x) for x in resp.data[1].embedding]
return emb_a, emb_b
def _cosine_similarity(vec_a: Sequence[float], vec_b: Sequence[float]) -> float:
dot = sum(a * b for a, b in zip(vec_a, vec_b))
norm_a = math.sqrt(sum(a * a for a in vec_a))
norm_b = math.sqrt(sum(b * b for b in vec_b))
if norm_a == 0 or norm_b == 0:
return 0.0
return dot / (norm_a * norm_b)
# ---- Tree-sitter C: Identifier Extraction ----
C_LANG = Language(tsc.language())
_TS_PARSER = Parser(C_LANG)
def _classify_node(node):
"""
Classify a tree-sitter node into identifier categories:
- func: function names (definitions + calls)
- var: variable names (parameters / local / global)
- type: type names
- field: struct field names
"""
node_type = node.type
name = node.text.decode("utf8")
if node_type == "type_identifier":
return "type", name
if node_type == "field_identifier":
return "field", name
if node_type != "identifier":
return None, None
parent = node.parent
if parent:
parent_type = parent.type
if parent_type == "function_declarator" and parent.child_by_field_name("declarator") == node:
return "func", name
if parent_type == "call_expression" and parent.child_by_field_name("function") == node:
return "func", name
if parent_type in ("init_declarator", "parameter_declaration", "declaration", "pointer_declarator"):
return "var", name
return "var", name
def _extract_identifiers_ts(code: str, max_per_type: int = 64) -> Dict[str, List[str]]:
"""Extract identifiers from C code using tree-sitter, classified by type."""
tree = _TS_PARSER.parse(code.encode("utf8"))
result: Dict[str, List[str]] = {"func": [], "var": [], "type": [], "field": []}
stack = [tree.root_node]
while stack:
node = stack.pop()
id_type, name = _classify_node(node)
if id_type in result and len(result[id_type]) < max_per_type:
result[id_type].append(name)
stack.extend(node.children)
return result
# ---- Summary Construction & Similarity ----
def _build_summary_text(identifiers: Dict[str, List[str]], max_per_type: int = 64) -> str:
"""
Build a naming summary string from classified identifiers.
Example: "func: foo bar || type: my_type || field: field1 field2 || var: i j k"
"""
parts: List[str] = []
for kind in ("func", "type", "field", "var"):
names = identifiers.get(kind, [])
if not names:
continue
segment = f"{kind}: " + " ".join(names[:max_per_type])
parts.append(segment)
return " || ".join(parts)
def _identifier_similarity_ts(candidate_text: str, reference_text: str):
"""
Compute identifier-level similarity using embedding cosine similarity.
Steps:
1. Extract identifiers from both texts using tree-sitter
2. Build naming summary strings
3. Embed both summaries in a single API call
4. Return cosine similarity as name_score
Returns:
name_score: float in [0, 1]
"""
cand_ids = _extract_identifiers_ts(candidate_text)
ref_ids = _extract_identifiers_ts(reference_text)
cand_summary = _build_summary_text(cand_ids)
ref_summary = _build_summary_text(ref_ids)
if not cand_summary or not ref_summary:
return 0.0
emb_cand, emb_ref = _embed_two(cand_summary, ref_summary)
return _cosine_similarity(emb_cand, emb_ref)
# ---- Main Reward Function ----
def compute_score(solution_str, ground_truth, extra_info=None):
"""
Compute reward based on identifier naming similarity using GTE embeddings.
Returns score^2 to sharpen the reward signal.
"""
if not isinstance(solution_str, str):
solution_str = "" if solution_str is None else str(solution_str)
if not isinstance(ground_truth, str):
ground_truth = "" if ground_truth is None else str(ground_truth)
candidate_text = solution_str.strip()
reference_text = ground_truth.strip()
if not candidate_text or not reference_text:
return 0.0
name_score = _identifier_similarity_ts(candidate_text, reference_text)
return name_score * name_score

View file

@ -0,0 +1,189 @@
"""
Reference reward function: Qwen3 Embedding-based Identifier Similarity.
This is a reference implementation of the Identifier Naming reward (Eq. 4)
described in the SK2Decompile paper (arXiv:2509.22114, Section 3.5).
Evaluates decompiled C code by:
1. Using tree-sitter to parse C code and extract identifiers (func/var/type/field)
2. Building a naming summary string per code sample
3. Computing cosine similarity between Qwen3 embeddings of the two summaries
4. Squaring the similarity score to sharpen the reward signal
Final score = cosine_similarity^2
Requires:
- A running OpenAI-compatible embedding server (e.g., vLLM serving Qwen3-Embedding-0.6B)
- tree-sitter and tree-sitter-c packages
Environment variables:
- QWEN3_EMBEDDING_MODEL_PATH: Model name/path (default: "Qwen3-Embedding-0.6B")
- QWEN3_EMBEDDING_API_KEY or OPENAI_API_KEY: API key (default: "none")
- QWEN3_EMBEDDING_API_BASE: API base URL (default: "http://127.0.0.1:8000/v1")
"""
import math
import os
import random
from typing import Dict, List, Optional, Sequence, Tuple
from openai import OpenAI
from tree_sitter import Language, Parser
import tree_sitter_c as tsc
# ---- OpenAI Embedding Client ----
_MODEL_NAME = os.getenv("QWEN3_EMBEDDING_MODEL_PATH", "Qwen3-Embedding-0.6B")
_API_KEY = os.getenv("QWEN3_EMBEDDING_API_KEY") or os.getenv("OPENAI_API_KEY") or "none"
_API_BASE = os.getenv("QWEN3_EMBEDDING_API_BASE", "http://127.0.0.1:8000/v1")
_client: Optional[OpenAI] = None
def _get_client() -> OpenAI:
global _client
if _client is None:
if _API_BASE:
_client = OpenAI(api_key=_API_KEY, base_url=_API_BASE)
elif _API_KEY:
_client = OpenAI(api_key=_API_KEY)
else:
_client = OpenAI()
return _client
def _embed_two(text_a: str, text_b: str) -> Tuple[List[float], List[float]]:
"""Embed two texts in a single API call, return their embedding vectors."""
client = _get_client()
resp = client.embeddings.create(model=_MODEL_NAME, input=[text_a, text_b])
emb_a = [float(x) for x in resp.data[0].embedding]
emb_b = [float(x) for x in resp.data[1].embedding]
return emb_a, emb_b
def _cosine_similarity(vec_a: Sequence[float], vec_b: Sequence[float]) -> float:
dot = sum(a * b for a, b in zip(vec_a, vec_b))
norm_a = math.sqrt(sum(a * a for a in vec_a))
norm_b = math.sqrt(sum(b * b for b in vec_b))
if norm_a == 0 or norm_b == 0:
return 0.0
return dot / (norm_a * norm_b)
# ---- Tree-sitter C: Identifier Extraction ----
C_LANG = Language(tsc.language())
_TS_PARSER = Parser(C_LANG)
def _classify_node(node):
"""
Classify a tree-sitter node into identifier categories:
- func: function names (definitions + calls)
- var: variable names (parameters / local / global)
- type: type names
- field: struct field names
"""
node_type = node.type
name = node.text.decode("utf8")
if node_type == "type_identifier":
return "type", name
if node_type == "field_identifier":
return "field", name
if node_type != "identifier":
return None, None
parent = node.parent
if parent:
parent_type = parent.type
if parent_type == "function_declarator" and parent.child_by_field_name("declarator") == node:
return "func", name
if parent_type == "call_expression" and parent.child_by_field_name("function") == node:
return "func", name
if parent_type in ("init_declarator", "parameter_declaration", "declaration", "pointer_declarator"):
return "var", name
return "var", name
def _extract_identifiers_ts(code: str, max_per_type: int = 64) -> Dict[str, List[str]]:
"""Extract identifiers from C code using tree-sitter, classified by type."""
tree = _TS_PARSER.parse(code.encode("utf8"))
result: Dict[str, List[str]] = {"func": [], "var": [], "type": [], "field": []}
stack = [tree.root_node]
while stack:
node = stack.pop()
id_type, name = _classify_node(node)
if id_type in result and len(result[id_type]) < max_per_type:
result[id_type].append(name)
stack.extend(node.children)
return result
# ---- Summary Construction & Similarity ----
def _build_summary_text(identifiers: Dict[str, List[str]], max_per_type: int = 64) -> str:
"""
Build a naming summary string from classified identifiers.
Example: "func: foo bar || type: my_type || field: field1 field2 || var: i j k"
"""
parts: List[str] = []
for kind in ("func", "type", "field", "var"):
names = identifiers.get(kind, [])
if not names:
continue
segment = f"{kind}: " + " ".join(names[:max_per_type])
parts.append(segment)
return " || ".join(parts)
def _identifier_similarity_ts(candidate_text: str, reference_text: str):
"""
Compute identifier-level similarity using embedding cosine similarity.
Steps:
1. Extract identifiers from both texts using tree-sitter
2. Build naming summary strings
3. Embed both summaries in a single API call
4. Return cosine similarity as name_score
Returns:
name_score: float in [0, 1]
"""
cand_ids = _extract_identifiers_ts(candidate_text)
ref_ids = _extract_identifiers_ts(reference_text)
cand_summary = _build_summary_text(cand_ids)
ref_summary = _build_summary_text(ref_ids)
if not cand_summary or not ref_summary:
return 0.0
emb_cand, emb_ref = _embed_two(cand_summary, ref_summary)
return _cosine_similarity(emb_cand, emb_ref)
# ---- Main Reward Function ----
def compute_score(solution_str, ground_truth, extra_info=None):
"""
Compute reward based on identifier naming similarity using Qwen3 embeddings.
Returns score^2 to sharpen the reward signal.
"""
if not isinstance(solution_str, str):
solution_str = "" if solution_str is None else str(solution_str)
if not isinstance(ground_truth, str):
ground_truth = "" if ground_truth is None else str(ground_truth)
candidate_text = solution_str.strip()
reference_text = ground_truth.strip()
if not candidate_text or not reference_text:
return 0.0
name_score = _identifier_similarity_ts(candidate_text, reference_text)
return name_score * name_score

View file

@ -0,0 +1,85 @@
"""
Reference reward function: Compilability + Placeholder Identifier Matching.
This is a reference implementation of the Structure Recovery reward (Eq. 3)
described in the SK2Decompile paper (arXiv:2509.22114, Section 3.5).
Evaluates decompiled C code by:
1. Checking if the code compiles with gcc (compilability score: 0 or 1)
2. Extracting placeholder identifier patterns (func*, type*, var*, field*) from
both candidate and ground truth, computing Jaccard similarity
Final score = type_score + compilability_score if compilable, else 0.
"""
import os
import re
import subprocess
import tempfile
def compute_score(solution_str, ground_truth, extra_info=None):
type_score_value, _ = type_score(solution_str, ground_truth, extra_info)
compileable_score_value = compileable_score(solution_str, ground_truth, extra_info)
if compileable_score_value == 0.0:
return 0.0
return type_score_value + compileable_score_value
def type_score(solution_str, ground_truth, extra_info=None):
"""
Compute Jaccard similarity over identifier patterns (func*, type*, var*, field*)
between candidate and ground truth code.
Returns:
(jaccard_similarity, total_term_count)
"""
patterns = [r'\bfunc\w*\b', r'\btype\w*\b', r'\bvar\w*\b', r'\bfield\w*\b']
def extract_terms(text):
terms = set()
for pattern in patterns:
terms.update(re.findall(pattern, text))
return terms
solution_terms = extract_terms(solution_str)
ground_truth_terms = extract_terms(ground_truth)
intersection = solution_terms.intersection(ground_truth_terms)
union = solution_terms.union(ground_truth_terms)
jaccard_similarity = len(intersection) / len(union) if union else 0.0
return jaccard_similarity, len(solution_terms) + len(ground_truth_terms)
def compileable_score(solution_str, ground_truth, extra_info=None):
"""
Check if the candidate C code compiles with gcc.
Args:
extra_info: Optional dict with 'header' key containing C header declarations.
Returns:
1.0 if compilable, 0.0 otherwise.
"""
with tempfile.TemporaryDirectory() as tmpdir:
try:
source_file = os.path.join(tmpdir, "temp.c")
object_file = os.path.join(tmpdir, "temp.o")
header = extra_info.get('header', '') if extra_info else ''
with open(source_file, 'w') as f:
f.write(f'{header}\n\n{solution_str}')
proc = subprocess.run(
['gcc', '-c', source_file, '-o', object_file],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
timeout=5,
check=True
)
return 1.0 if proc.returncode == 0 else 0.0
except Exception:
return 0.0

View file

@ -0,0 +1,69 @@
"""
Reference reward function: Compilability + Word-level Jaccard Similarity.
This is a reference implementation of an alternative Structure Recovery reward
for the SK2Decompile RL training pipeline (arXiv:2509.22114, Section 3.5).
Evaluates decompiled C code by:
1. Computing word-level Jaccard similarity between candidate and ground truth
2. Checking if the code compiles with gcc (compilability score: 0 or 1)
Final score = jaccard_similarity + compilability_score if jaccard > 0.5, else 0.
"""
import os
import subprocess
import tempfile
def compute_score(solution_str, ground_truth, extra_info=None):
sim_score = jaccard_similarity(solution_str, ground_truth)
compile_score = compileable_score(solution_str, ground_truth, extra_info)
if sim_score > 0.5:
return sim_score + compile_score
return 0
def jaccard_similarity(str1, str2):
"""Compute word-level Jaccard similarity between two strings."""
set1 = set(str1.lower().split())
set2 = set(str2.lower().split())
intersection = len(set1.intersection(set2))
union = len(set1.union(set2))
if union == 0:
return 0.0
return intersection / union
def compileable_score(solution_str, ground_truth, extra_info=None):
"""
Check if the candidate C code compiles with gcc.
Args:
extra_info: Optional dict with 'header' key containing C header declarations.
Returns:
1.0 if compilable, 0.0 otherwise.
"""
with tempfile.TemporaryDirectory() as tmpdir:
try:
source_file = os.path.join(tmpdir, "temp.c")
object_file = os.path.join(tmpdir, "temp.o")
header = extra_info.get('header', '') if extra_info else ''
with open(source_file, 'w') as f:
f.write(f'{header}\n\n{solution_str}')
proc = subprocess.run(
['gcc', '-c', source_file, '-o', object_file],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
timeout=5,
check=True
)
return 1.0 if proc.returncode == 0 else 0.0
except Exception:
return 0.0

View file

@ -0,0 +1,120 @@
#!/usr/bin/env bash
# =============================================================================
# SK2Decompile - Reference Script: Identifier Naming RL Training
# =============================================================================
# Reference GRPO training script for the Identifier Naming model.
# Based on the VERL framework (v0.4.1) with embedding-based rewards.
#
# This is a reference configuration — please adjust parameters according
# to your hardware setup and dataset. See the paper (arXiv:2509.22114,
# Section 3.5) for the reward formulation details.
#
# Prerequisites:
# - VERL framework installed (https://github.com/volcengine/verl)
# - Reward functions integrated into verl/utils/reward_score/ (see README.md)
# - An OpenAI-compatible embedding server running locally
# e.g.: python -m vllm.entrypoints.openai.api_server \
# --model Qwen3-Embedding-0.6B --port 8000
# - tree-sitter, tree-sitter-c, openai packages installed
#
# Usage:
# bash run_ident_rl.sh
# =============================================================================
set -x
# ---- User Configuration ----
EMBEDDING_VARIANT="gte" # Options: "gte" or "qwen3"
VERL_DIR="<YOUR_VERL_DIR>"
VENV_PATH="<YOUR_VENV_PATH>"
MODEL_PATH="<YOUR_MODEL_PATH>" # e.g., path to sk2decompile-ident-6.7b
TRAIN_DATA="<YOUR_DATA_PATH>/train.parquet"
VAL_DATA="<YOUR_DATA_PATH>/valid.parquet"
# WandB configuration
WANDB_API_KEY_VAL="<YOUR_WANDB_API_KEY>"
WANDB_ENTITY_VAL="<YOUR_WANDB_ENTITY>"
WANDB_PROJECT_VAL="<YOUR_WANDB_PROJECT>"
# Training parameters
NUM_NODES=1
GPUS_PER_NODE=8
KL_COEF=0.02
TOTAL_EPOCHS=2
SAVE_FREQ=25
TEST_FREQ=25
# ---- Environment Setup ----
source ${VENV_PATH}/bin/activate
export UCX_IB_PCI_RELAXED_ORDERING=1
export NCCL_IB_PCI_RELAXED_ORDERING=1
export NCCL_IB_TIMEOUT=22
export NCCL_DEBUG=INFO
export TRANSFORMERS_OFFLINE=0
export TORCH_NCCL_AVOID_RECORD_STREAMS=1
export NCCL_NVLS_ENABLE=0
export NCCL_IB_DISABLE=0
export CUDA_DEVICE_MAX_CONNECTIONS=1
# ---- Task & Logging ----
TASK_NAME="sk2decompile_ident-rl-${EMBEDDING_VARIANT}"
LOG_DIR="${VERL_DIR}/logs"
mkdir -p "$LOG_DIR"
LOG_FILE="$LOG_DIR/${TASK_NAME}.log"
ERR_FILE="$LOG_DIR/${TASK_NAME}.err"
# ---- WandB ----
export WANDB_API_KEY=${WANDB_API_KEY_VAL}
export WANDB_ENTITY=${WANDB_ENTITY_VAL}
export WANDB_PROJECT=${WANDB_PROJECT_VAL}
export WANDB_NAME=${TASK_NAME}
export WANDB_MODE='online'
wandb login --relogin $WANDB_API_KEY
# ---- Launch GRPO Training ----
python3 -m verl.trainer.main_ppo --config-path=config \
--config-name='ppo_trainer-lm4dc.yaml' \
algorithm.adv_estimator=grpo \
data.train_files=${TRAIN_DATA} \
data.val_files=${VAL_DATA} \
data.train_batch_size=128 \
data.max_prompt_length=1024 \
data.max_response_length=2048 \
data.filter_overlong_prompts=True \
data.truncation='error' \
actor_rollout_ref.model.path=${MODEL_PATH} \
actor_rollout_ref.actor.optim.lr=1e-6 \
actor_rollout_ref.model.use_remove_padding=True \
actor_rollout_ref.actor.ppo_mini_batch_size=32 \
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
actor_rollout_ref.actor.use_kl_loss=True \
actor_rollout_ref.actor.kl_loss_coef=${KL_COEF} \
actor_rollout_ref.actor.kl_loss_type=low_var_kl \
actor_rollout_ref.actor.entropy_coeff=0 \
actor_rollout_ref.model.enable_gradient_checkpointing=False \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \
actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
actor_rollout_ref.rollout.name=vllm \
actor_rollout_ref.rollout.gpu_memory_utilization=0.80 \
actor_rollout_ref.rollout.n=16 \
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \
actor_rollout_ref.ref.fsdp_config.param_offload=True \
algorithm.use_kl_in_reward=False \
trainer.critic_warmup=0 \
trainer.logger=['console','wandb'] \
trainer.project_name='sk2decompile_rl' \
trainer.experiment_name=$TASK_NAME \
trainer.default_local_dir=${VERL_DIR}/checkpoints/${TASK_NAME} \
trainer.n_gpus_per_node=${GPUS_PER_NODE} \
trainer.nnodes=${NUM_NODES} \
trainer.save_freq=${SAVE_FREQ} \
trainer.test_freq=${TEST_FREQ} \
trainer.total_epochs=${TOTAL_EPOCHS} "$@" \
> >(tee -a "$LOG_FILE") \
2> >(tee -a "$ERR_FILE" >&2)
echo "STDOUT saved to: $LOG_FILE"
echo "STDERR saved to: $ERR_FILE"

View file

@ -0,0 +1,117 @@
#!/usr/bin/env bash
# =============================================================================
# SK2Decompile - Reference Script: Structure Recovery RL Training
# =============================================================================
# Reference GRPO training script for the Structure Recovery model.
# Based on the VERL framework (v0.4.1) with compiler-based rewards.
#
# This is a reference configuration — please adjust parameters according
# to your hardware setup and dataset. See the paper (arXiv:2509.22114,
# Section 3.5) for the reward formulation details.
#
# Prerequisites:
# - VERL framework installed (https://github.com/volcengine/verl)
# - Reward functions integrated into verl/utils/reward_score/ (see README.md)
# - gcc available for compilability checking
#
# Usage:
# bash run_struct_rl.sh
# =============================================================================
set -x
# ---- User Configuration ----
REWARD_VARIANT="exe_type" # Options: "exe_type" or "sim_exe"
VERL_DIR="<YOUR_VERL_DIR>"
VENV_PATH="<YOUR_VENV_PATH>"
MODEL_PATH="<YOUR_MODEL_PATH>" # e.g., path to sk2decompile-struct-6.7b
TRAIN_DATA="<YOUR_DATA_PATH>/train.parquet"
VAL_DATA="<YOUR_DATA_PATH>/valid.parquet"
# WandB configuration
WANDB_API_KEY_VAL="<YOUR_WANDB_API_KEY>"
WANDB_ENTITY_VAL="<YOUR_WANDB_ENTITY>"
WANDB_PROJECT_VAL="<YOUR_WANDB_PROJECT>"
# Training parameters
NUM_NODES=2
GPUS_PER_NODE=8
KL_COEF=0.01
TOTAL_EPOCHS=2
SAVE_FREQ=25
TEST_FREQ=25
# ---- Environment Setup ----
source ${VENV_PATH}/bin/activate
export UCX_IB_PCI_RELAXED_ORDERING=1
export NCCL_IB_PCI_RELAXED_ORDERING=1
export NCCL_IB_TIMEOUT=22
export NCCL_DEBUG=INFO
export TRANSFORMERS_OFFLINE=0
export TORCH_NCCL_AVOID_RECORD_STREAMS=1
export NCCL_NVLS_ENABLE=0
export NCCL_IB_DISABLE=0
export CUDA_DEVICE_MAX_CONNECTIONS=1
# ---- Task & Logging ----
TASK_NAME="sk2decompile_struct-rl-${REWARD_VARIANT}"
LOG_DIR="${VERL_DIR}/logs"
mkdir -p "$LOG_DIR"
LOG_FILE="$LOG_DIR/${TASK_NAME}.log"
ERR_FILE="$LOG_DIR/${TASK_NAME}.err"
# ---- WandB ----
export WANDB_API_KEY=${WANDB_API_KEY_VAL}
export WANDB_ENTITY=${WANDB_ENTITY_VAL}
export WANDB_PROJECT=${WANDB_PROJECT_VAL}
export WANDB_NAME=${TASK_NAME}
export WANDB_MODE='online'
wandb login --relogin $WANDB_API_KEY
# ---- Launch GRPO Training ----
python3 -m verl.trainer.main_ppo --config-path=config \
--config-name='ppo_trainer-lm4dc.yaml' \
algorithm.adv_estimator=grpo \
data.train_files=${TRAIN_DATA} \
data.val_files=${VAL_DATA} \
data.train_batch_size=128 \
data.max_prompt_length=1024 \
data.max_response_length=2048 \
data.filter_overlong_prompts=True \
data.truncation='error' \
actor_rollout_ref.model.path=${MODEL_PATH} \
actor_rollout_ref.actor.optim.lr=1e-6 \
actor_rollout_ref.model.use_remove_padding=True \
actor_rollout_ref.actor.ppo_mini_batch_size=32 \
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
actor_rollout_ref.actor.use_kl_loss=True \
actor_rollout_ref.actor.kl_loss_coef=${KL_COEF} \
actor_rollout_ref.actor.kl_loss_type=low_var_kl \
actor_rollout_ref.actor.entropy_coeff=0 \
actor_rollout_ref.model.enable_gradient_checkpointing=False \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \
actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
actor_rollout_ref.rollout.name=vllm \
actor_rollout_ref.rollout.gpu_memory_utilization=0.80 \
actor_rollout_ref.rollout.n=16 \
actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \
actor_rollout_ref.ref.fsdp_config.param_offload=True \
algorithm.use_kl_in_reward=False \
trainer.critic_warmup=0 \
trainer.logger=['console','wandb'] \
trainer.project_name='sk2decompile_rl' \
trainer.experiment_name=$TASK_NAME \
trainer.default_local_dir=${VERL_DIR}/checkpoints/${TASK_NAME} \
trainer.n_gpus_per_node=${GPUS_PER_NODE} \
trainer.nnodes=${NUM_NODES} \
trainer.save_freq=${SAVE_FREQ} \
trainer.test_freq=${TEST_FREQ} \
trainer.total_epochs=${TOTAL_EPOCHS} "$@" \
> >(tee -a "$LOG_FILE") \
2> >(tee -a "$ERR_FILE" >&2)
echo "STDOUT saved to: $LOG_FILE"
echo "STDERR saved to: $ERR_FILE"