feat(sk2decompile): add BringUpBench evaluation pipeline and results

Integrate BringUpBench evaluation into sk2decompile/evaluation/bringupbench/,
corresponding to Section A.6 of the paper (arXiv:2509.22114).

BringUpBench is a benchmark suite of 90 self-contained C programs (505 functions,
O0-O3). SK2Decompile achieves 42.3% compilation rate and 27.0% re-executability
rate, compared to IDA Pro's 23.6% / 21.7%.

Contents:
- scripts/: 5-step reproduction pipeline (compile, decompile, map, infer, eval)
- data/func_maps/: pre-built function-level mappings (source <-> pseudo <-> asm)
- data/infer_results/: SK2Decompile inference outputs for all opt levels
- reports/: per-opt-level evaluation result summaries (Markdown)
- config.env: template environment configuration
- README.md: comprehensive documentation with reproduction guide

Also updated sk2decompile/README.md to reference BringUpBench evaluation.
This commit is contained in:
BaiRiDreamer 2026-02-12 00:02:25 +08:00
commit 239cba2673
22 changed files with 6313 additions and 0 deletions

View file

@ -41,6 +41,12 @@ SK2Decompile/
│ ├── scripts/ # Training launch scripts
│ └── README.md # Detailed RL documentation
├── evaluation/ # Comprehensive evaluation suite
│ ├── bringupbench/ # BringUpBench evaluation (Section A.6)
│ │ ├── scripts/ # Pipeline scripts (compile, decompile, evaluate)
│ │ ├── data/ # Pre-built function maps and inference results
│ │ ├── reports/ # Evaluation result summaries
│ │ └── README.md # Detailed BringUpBench documentation
│ └── ... # HumanEval, MBPP evaluation scripts
└── README.md # This file
```
@ -198,6 +204,12 @@ python gpt_judge.py --json_file your_json_file_path
--api_key your_openai_api_key
```
**BringUpBench Evaluation** (Section A.6 of the paper)
We also evaluate on [BringUpBench](https://github.com/toddmaustin/bringup-bench) — 90 self-contained C programs with 505 functions across O0O3. SK²Decompile achieves **42.3% compilation rate** and **27.0% re-executability rate**, compared to IDA Pro's 23.6% / 21.7%.
See [`evaluation/bringupbench/README.md`](evaluation/bringupbench/README.md) for the full reproduction pipeline, pre-built data, and detailed results.
## 📊 Results
Our approach achieves state-of-the-art performance:

View file

@ -0,0 +1,249 @@
# SK²Decompile — Evaluation on BringUpBench
This directory contains the evaluation pipeline for SK²Decompile on the [BringUpBench](https://github.com/toddmaustin/bringup-bench) benchmark, as described in **Section A.6** of our paper:
> **SK²Decompile: LLM-based Two-Phase Binary Decompilation from Skeleton to Skin**
> [[arXiv:2509.22114]](https://arxiv.org/abs/2509.22114)
## Overview
[BringUpBench](https://github.com/toddmaustin/bringup-bench) (Austin, 2024) is a benchmark suite of **90 self-contained C programs** designed for bringing up newly designed CPUs, accelerators, compilers, and operating systems. It has **zero library dependencies** — all programs rely solely on a built-in `libmin` library and only 4 system calls — making it an ideal, standardized test bed for decompilation evaluation on complex, real-world binaries.
We compiled, decompiled, and executed all projects across optimization levels O0O3, yielding **505 functions** in total. We compared SK²Decompile against the industry-standard rule-based decompiler, **IDA Pro** (Hex-Rays).
## Results
### SK²Decompile vs IDA Pro
| Opt Level | Functions | SK²Decompile Compilable | SK²Decompile Executable | IDA Compilable | IDA Executable |
|:---------:|:---------:|:-----------------------:|:-----------------------:|:--------------:|:--------------:|
| O0 | 382 | **50.26%** | **49.48%** | — | — |
| O1 | 379 | **40.90%** | **39.05%** | — | — |
| O2 | 368 | **37.77%** | **34.24%** | — | — |
| O3 | 359 | **31.75%** | **29.53%** | — | — |
| **Avg** | **1488** | **42.3%** | **27.0%** | **23.6%** | **21.7%** |
> The average row reports the paper's aggregate numbers (Table 8 in Section A.6). Per-opt-level IDA baselines are not separately reported in the paper. Detailed per-benchmark breakdowns are available in `reports/`.
## Directory Structure
```
bringupbench/
├── README.md # This file
├── config.env # Environment configuration (paths)
├── scripts/
│ ├── build-host-opt-levels.sh # Step 1: Compile benchmarks at O0-O3
│ ├── decompile-all-pseudo.sh # Step 2: IDA Pro batch decompilation
│ ├── dump_pseudo.py # IDA headless decompilation helper
│ ├── disasm-all-objdump.sh # Step 3: objdump batch disassembly
│ ├── build-func-maps.py # Step 4: Build function-level mappings
│ ├── clean-all-benchmarks.sh # Utility: clean all build artifacts
│ └── eval_infer_out.py # Step 5: Automated evaluation
├── data/
│ ├── func_maps/ # Pre-built function mappings (JSONL)
│ │ ├── merged.O0.func_map.jsonl # O0: 493 functions
│ │ ├── merged.O1.func_map.jsonl # O1: 449 functions
│ │ ├── merged.O2.func_map.jsonl # O2: 441 functions
│ │ └── merged.O3.func_map.jsonl # O3: 439 functions
│ └── infer_results/ # SK²Decompile inference results
│ ├── merged.O0.func_map.infer.jsonl # O0: 382 evaluated functions
│ ├── merged.O1.func_map.infer.jsonl # O1: 379 evaluated functions
│ ├── merged.O2.func_map.infer.jsonl # O2: 368 evaluated functions
│ └── merged.O3.func_map.infer.jsonl # O3: 359 evaluated functions
└── reports/ # Evaluation result summaries
├── O0_results.md
├── O1_results.md
├── O2_results.md
└── O3_results.md
```
## Reproduction Pipeline
Our evaluation pipeline consists of five steps, as described in the paper:
```
Source (.c)
▼ Step 1: Compilation
Binary (.host.O0 ~ .host.O3)
├──▶ Step 2: Baseline Extraction (IDA Pro) ──▶ Pseudocode (.pseudo)
├──▶ Step 3: Ground Truth Mapping ──▶ Function Maps (.func_map.jsonl)
▼ Step 4: Decompilation (SK²Decompile)
Inferred C code (.func_map.infer.jsonl)
▼ Step 5: Validation
Evaluation Reports (reports/)
```
### Prerequisites
| Dependency | Purpose | Installation |
|------------|---------|-------------|
| [Bringup-Bench](https://github.com/toddmaustin/bringup-bench) | Upstream benchmark suite (90 C programs) | `git clone https://github.com/toddmaustin/bringup-bench.git` |
| GCC | Compile benchmarks | `apt install gcc` |
| IDA Pro + Hex-Rays | Decompile binaries to pseudocode | Commercial software |
| objdump (binutils) | Disassemble binaries | `apt install binutils` |
| clang-format | Pseudocode normalization | `apt install clang-format` |
| Python >= 3.10 | Run evaluation scripts | `apt install python3` |
### Quick Start (Evaluation Only)
If you only want to reproduce the evaluation step (Step 5), the pre-built data is included in `data/`. You only need the Bringup-Bench source repository:
```bash
# 1. Clone Bringup-Bench
git clone https://github.com/toddmaustin/bringup-bench.git
# 2. Configure paths
cd bringupbench
vim config.env # Set BENCH_REPO_ROOT to your bringup-bench path
# 3. Run evaluation (e.g., O0)
python3 scripts/eval_infer_out.py data/infer_results/merged.O0.func_map.infer.jsonl
# 4. Check results
cat reports/O0_results.md
```
### Full Pipeline (From Scratch)
To reproduce the entire pipeline from compilation to evaluation:
```bash
cd bringupbench
vim config.env # Set BENCH_REPO_ROOT and IDA_BIN
```
**Step 1: Compile benchmarks at O0O3**
Build all 90 Bringup-Bench programs at four optimization levels, producing `<name>.host.O{0,1,2,3}` binaries.
```bash
scripts/build-host-opt-levels.sh
```
**Step 2: Baseline Extraction (IDA Pro)**
Use IDA Pro in headless mode to decompile all binaries, producing `.pseudo` files with Hex-Rays pseudocode.
```bash
scripts/decompile-all-pseudo.sh
```
Each function is delimited by `/* function_name @ 0xADDRESS */` in the output.
**Step 3: Ground Truth Mapping**
Parse source code, pseudocode, and assembly; match functions by name across all three representations; normalize pseudocode (remove IDA-specific types, hex-to-decimal conversion, clang-format).
```bash
# Disassemble (optional, for assembly mapping)
scripts/disasm-all-objdump.sh
# Build function-level mappings
python3 scripts/build-func-maps.py
```
Output: per-binary `.func_map.jsonl` files. Merge them per optimization level:
```bash
cat $BENCH_REPO_ROOT/*/*.host.O0.func_map.jsonl > data/func_maps/merged.O0.func_map.jsonl
cat $BENCH_REPO_ROOT/*/*.host.O1.func_map.jsonl > data/func_maps/merged.O1.func_map.jsonl
cat $BENCH_REPO_ROOT/*/*.host.O2.func_map.jsonl > data/func_maps/merged.O2.func_map.jsonl
cat $BENCH_REPO_ROOT/*/*.host.O3.func_map.jsonl > data/func_maps/merged.O3.func_map.jsonl
```
**Step 4: Decompilation (SK²Decompile Inference)**
Feed the `pseudo_normalize` field from the function maps to SK²Decompile. The two-phase inference pipeline (see `../sk2decompile_inf.py`) produces C code for each function. Results should be written into the JSONL with the `pseudo.content-fix` field containing the final decompiled function body.
```bash
# Example: use the main SK²Decompile inference pipeline
cd ../ # back to sk2decompile/evaluation/
python3 sk2decompile_inf.py \
--dataset_path bringupbench/data/func_maps/merged.O0.func_map.jsonl \
--model_path LLM4Binary/sk2decompile-struct-6.7b \
--recover_model_path LLM4Binary/sk2decompile-ident-6.7b
```
**Step 5: Validation**
For each function, replace the original source with the decompiled output, rebuild in an isolated workspace, and run the project's test suite.
```bash
python3 scripts/eval_infer_out.py data/infer_results/merged.O0.func_map.infer.jsonl \
--jobs 16 \
--command-timeout 20
```
Common options:
```bash
--jobs N # Parallel workers (default: 96)
--command-timeout S # Timeout per make command in seconds (default: 20)
--limit N # Process only first N cases (for debugging)
--keep-workspaces # Keep temporary build directories
```
## Data Format
### func_map.jsonl (Function Mappings)
Each line is a JSON object containing the source, pseudocode, and assembly for one function:
```jsonc
{
"source": {
"path": "ackermann/ackermann.c", // Source file (relative to BENCH_REPO_ROOT)
"function_name": "ackermann", // Function name
"content": "int ackermann(int m, ...) { ... }\n" // Complete function body
},
"pseudo": {
"path": "ackermann/ackermann.host.O0.pseudo",
"function_name": "ackermann",
"address": "0x11e9", // Function address in binary
"label": "ackermann",
"content": "__int64 __fastcall ackermann(...) { ... }\n" // Raw IDA pseudocode
},
"pseudo_normalize": "int ackermann(...) { ... }", // Normalized pseudocode
"binary": "ackermann/ackermann.host.O0", // Binary file path
"assembly": "<ackermann>:\npush %rbp\n..." // Cleaned objdump output
}
```
### func_map.infer.jsonl (Inference Results)
Extends `func_map.jsonl` with SK²Decompile inference outputs:
```jsonc
{
// ... all fields from func_map.jsonl ...
"pseudo": {
// ... all fields above, plus:
"content-fix": "..." // Final decompiled function (used for source replacement)
},
"infer-out-model1": "...", // Phase 1 (Structure Recovery) raw output
"infer-out-model2": "...", // Phase 2 (Identifier Naming) raw output
"pseudo_normalize-fix": "..." // Corrected normalized pseudocode
}
```
## Evaluation Metrics
| Metric | Definition |
|--------|-----------|
| **Replacement Rate** | Fraction of functions where the decompiled output can be located and substituted into the original source file |
| **Compilable Rate** | Fraction of functions where the modified source compiles successfully (`make build`) |
| **Executable Rate** | Fraction of functions where the compiled program passes its test suite (`make test`, output matches reference) |
The evaluation uses BringUpBench's own build infrastructure (`Makefile`, `libmin`, `libtarg`) to compile and validate. Each function is tested in an isolated workspace to prevent cross-contamination.
## Notes
- BringUpBench programs are self-contained with zero external dependencies, making them ideal for evaluating decompilation without the confounding factor of missing headers or libraries.
- The `func_maps/` data contains more functions than `infer_results/` because some functions are filtered during inference (e.g., exceeding token limits).
- All scripts load paths from `config.env`. You can also override via environment variables or CLI arguments (priority: CLI > env > config.env).
- For the complete SK²Decompile methodology and other benchmark results (HumanEval, MBPP, ExeBench, GitHub2025), see the [main README](../../README.md).

View file

@ -0,0 +1,14 @@
# BringUpBench Evaluation — Environment Configuration
# All scripts resolve paths from this file.
# Values can be overridden by same-named environment variables or CLI arguments.
# Priority: CLI args > environment variables > config.env
# Absolute path to the Bringup-Bench repository
# Clone from: https://github.com/toddmaustin/bringup-bench.git
BENCH_REPO_ROOT=/path/to/bringup-bench
# IDA Pro command-line executable (required for Step 2: decompilation)
IDA_BIN=/path/to/idat
# Default build target (host = native x86-64 Linux)
DEFAULT_TARGET=host

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View file

@ -0,0 +1,296 @@
# Infer-Out Model 2 Evaluation (merged.O0.func_map.infer-host)
- Timestamp: 20251119-171008
- Source JSONL: merged.O0.func_map.infer.jsonl
- Target: host
- Total cases: 382
- Replacement success: 382 (100.00%)
- Compilable: 192 (50.26%)
- Executable: 189 (49.48%)
## Benchmark Breakdown
| Benchmark | Cases | Replacement% | Build% | Exec% |
| --- | --- | --- | --- | --- |
| ackermann | 2 | 100.00% | 50.00% | 50.00% |
| aes | 9 | 100.00% | 33.33% | 33.33% |
| anagram | 12 | 100.00% | 58.33% | 58.33% |
| audio-codec | 4 | 100.00% | 50.00% | 50.00% |
| avl-tree | 14 | 100.00% | 35.71% | 35.71% |
| banner | 1 | 100.00% | 0.00% | 0.00% |
| bit-kernels | 5 | 100.00% | 100.00% | 100.00% |
| blake2b | 6 | 100.00% | 16.67% | 16.67% |
| bloom-filter | 3 | 100.00% | 33.33% | 33.33% |
| boyer-moore-search | 3 | 100.00% | 0.00% | 0.00% |
| bubble-sort | 2 | 100.00% | 100.00% | 100.00% |
| c-interp | 10 | 100.00% | 70.00% | 70.00% |
| ccmac | 2 | 100.00% | 50.00% | 50.00% |
| checkers | 15 | 100.00% | 80.00% | 80.00% |
| cipher | 3 | 100.00% | 33.33% | 33.33% |
| congrad | 6 | 100.00% | 66.67% | 66.67% |
| connect4-minimax | 13 | 100.00% | 61.54% | 61.54% |
| convex-hull | 4 | 100.00% | 75.00% | 75.00% |
| dhrystone | 5 | 100.00% | 60.00% | 60.00% |
| distinctness | 2 | 100.00% | 0.00% | 0.00% |
| fft-int | 4 | 100.00% | 50.00% | 50.00% |
| flood-fill | 2 | 100.00% | 50.00% | 50.00% |
| frac-calc | 10 | 100.00% | 60.00% | 60.00% |
| fuzzy-match | 4 | 100.00% | 25.00% | 25.00% |
| fy-shuffle | 4 | 100.00% | 50.00% | 50.00% |
| gcd-list | 2 | 100.00% | 0.00% | 0.00% |
| grad-descent | 4 | 100.00% | 75.00% | 75.00% |
| graph-tests | 19 | 100.00% | 21.05% | 21.05% |
| hanoi | 2 | 100.00% | 50.00% | 50.00% |
| heapsort | 2 | 100.00% | 50.00% | 50.00% |
| heat-calc | 1 | 100.00% | 0.00% | 0.00% |
| huff-encode | 12 | 100.00% | 91.67% | 91.67% |
| idct-alg | 4 | 100.00% | 50.00% | 50.00% |
| indirect-test | 2 | 100.00% | 50.00% | 50.00% |
| k-means | 6 | 100.00% | 100.00% | 100.00% |
| kadane | 2 | 100.00% | 50.00% | 50.00% |
| kepler | 7 | 100.00% | 28.57% | 28.57% |
| knapsack | 3 | 100.00% | 33.33% | 33.33% |
| knights-tour | 3 | 100.00% | 66.67% | 66.67% |
| life | 14 | 100.00% | 78.57% | 71.43% |
| longdiv | 7 | 100.00% | 71.43% | 71.43% |
| lu-decomp | 3 | 100.00% | 33.33% | 33.33% |
| lz-compress | 2 | 100.00% | 100.00% | 100.00% |
| mandelbrot | 1 | 100.00% | 0.00% | 0.00% |
| matmult | 1 | 100.00% | 0.00% | 0.00% |
| max-subseq | 2 | 100.00% | 0.00% | 0.00% |
| mersenne | 3 | 100.00% | 0.00% | 0.00% |
| minspan | 8 | 100.00% | 62.50% | 62.50% |
| monte-carlo | 1 | 100.00% | 0.00% | 0.00% |
| murmur-hash | 2 | 100.00% | 0.00% | 0.00% |
| n-queens | 3 | 100.00% | 66.67% | 66.67% |
| natlog | 1 | 100.00% | 0.00% | 0.00% |
| nbody-sim | 1 | 100.00% | 0.00% | 0.00% |
| nr-solver | 1 | 100.00% | 100.00% | 100.00% |
| packet-filter | 3 | 100.00% | 33.33% | 33.33% |
| parrondo | 3 | 100.00% | 33.33% | 33.33% |
| pascal | 3 | 100.00% | 100.00% | 100.00% |
| pi-calc | 1 | 100.00% | 0.00% | 0.00% |
| primal-test | 3 | 100.00% | 0.00% | 0.00% |
| priority-queue | 5 | 100.00% | 80.00% | 80.00% |
| qsort-demo | 5 | 100.00% | 0.00% | 0.00% |
| qsort-test | 3 | 100.00% | 66.67% | 66.67% |
| quaternions | 4 | 100.00% | 0.00% | 0.00% |
| rabinkarp-search | 2 | 100.00% | 50.00% | 50.00% |
| rand-test | 2 | 100.00% | 0.00% | 0.00% |
| ransac | 2 | 100.00% | 50.00% | 50.00% |
| regex-parser | 11 | 100.00% | 72.73% | 63.64% |
| rho-factor | 4 | 100.00% | 75.00% | 75.00% |
| rle-compress | 2 | 100.00% | 50.00% | 50.00% |
| rsa-cipher | 4 | 100.00% | 0.00% | 0.00% |
| sat-solver | 5 | 100.00% | 60.00% | 60.00% |
| shortest-path | 3 | 100.00% | 66.67% | 66.67% |
| sieve | 2 | 100.00% | 50.00% | 50.00% |
| simple-grep | 1 | 100.00% | 0.00% | 0.00% |
| spelt2num | 1 | 100.00% | 0.00% | 0.00% |
| spirograph | 2 | 100.00% | 50.00% | 50.00% |
| sudoku-solver | 4 | 100.00% | 75.00% | 75.00% |
| tetris-sim | 12 | 100.00% | 75.00% | 75.00% |
| tiny-NN | 2 | 100.00% | 50.00% | 50.00% |
| topo-sort | 7 | 100.00% | 0.00% | 0.00% |
| totient | 4 | 100.00% | 75.00% | 75.00% |
| transcend | 3 | 100.00% | 66.67% | 66.67% |
| uniquify | 1 | 100.00% | 0.00% | 0.00% |
| vectors-3d | 8 | 100.00% | 12.50% | 12.50% |
| verlet | 4 | 100.00% | 25.00% | 0.00% |
| weekday | 2 | 100.00% | 0.00% | 0.00% |
## Compilation Failures
- ackermann/ackermann.c::main@0x13b9
- aes/aes.c::aes_decrypt@0x1a65
- aes/aes.c::aes_encrypt@0x1943
- aes/aes.c::inv_shift_rows@0x1396
- aes/aes.c::key_expansion@0x179a
- aes/aes.c::main@0x1b87
- aes/aes.c::shift_rows@0x12e5
- anagram/anagram.c::BuildMask@0x13e7
- anagram/anagram.c::BuildWord@0x17e5
- anagram/anagram.c::FindAnagram@0x1ba6
- anagram/anagram.c::ReadDict@0x121f
- anagram/anagram.c::main@0x1f71
- audio-codec/audio-codec.c::decode@0x12f5
- audio-codec/audio-codec.c::main@0x14b3
- avl-tree/avlcore.c::DeleteByElement@0x240f
- avl-tree/avlcore.c::DeleteByElementRecursive@0x21af
- avl-tree/avlcore.c::DeleteLeftMost@0x2086
- avl-tree/avlcore.c::FindByElement@0x1a46
- avl-tree/avlcore.c::Height@0x2475
- avl-tree/avlcore.c::Insert@0x1fc4
- avl-tree/avlcore.c::SingleLeftRotation@0x1b3a
- avl-tree/avl-tree.c::main@0x1399
- avl-tree/avl-tree.c::printTree@0x11e9
- banner/banner.c::main@0x11e9
- blake2b/blake2b.c::BLAKE2B@0x1a9b
- blake2b/blake2b.c::F@0x1502
- blake2b/blake2b.c::G@0x1258
- blake2b/blake2b.c::blake2b@0x1cd3
- blake2b/blake2b.c::test@0x2071
- bloom-filter/bloom-filter.c::bad_search@0x11e9
- bloom-filter/bloom-filter.c::main@0x123d
- boyer-moore-search/boyer-moore-search.c::badCharHeuristic@0x11e9
- boyer-moore-search/boyer-moore-search.c::main@0x146d
- boyer-moore-search/boyer-moore-search.c::search@0x126d
- c-interp/c-interp.c::eval@0x457c
- c-interp/c-interp.c::main@0x4e03
- c-interp/c-interp.c::next@0x11e9
- ccmac/ccmac.c::main@0x127e
- checkers/functions.c::fill_print_initial@0x1793
- checkers/functions.c::generate_node_children@0x29ff
- checkers/checkers.c::main@0x11e9
- cipher/cipher.c::encipher@0x11e9
- cipher/cipher.c::main@0x13cd
- congrad/congrad.c::cg_solve@0x1643
- congrad/congrad.c::main@0x199b
- connect4-minimax/connect4-minimax.c::init_board@0x11e9
- connect4-minimax/connect4-minimax.c::main@0x2299
- connect4-minimax/connect4-minimax.c::minimax@0x1d07
- connect4-minimax/connect4-minimax.c::play_game@0x20d1
- connect4-minimax/connect4-minimax.c::score_position@0x1a02
- convex-hull/convex-hull.c::main@0x13e7
- dhrystone/dhrystone.c::Proc_1@0x199f
- dhrystone/dhrystone.c::main@0x11e9
- distinctness/distinctness.c::isDistinct@0x11e9
- distinctness/distinctness.c::main@0x15d8
- fft-int/fft-int.c::db_from_ampl@0x1807
- fft-int/fft-int.c::fix_fft@0x11e9
- flood-fill/flood-fill.c::main@0x144d
- frac-calc/frac-calc.c::copyr@0x14d4
- frac-calc/frac-calc.c::divtokens@0x15b8
- frac-calc/frac-calc.c::help@0x13d9
- frac-calc/frac-calc.c::main@0x11e9
- fuzzy-match/fuzzy-match.c::compute_score@0x2379
- fuzzy-match/fuzzy-match.c::fuzzy_match_recurse@0x2283
- fuzzy-match/fuzzy-match.c::main@0x24b3
- fy-shuffle/fy-shuffle.c::main@0x1378
- fy-shuffle/fy-shuffle.c::rand_int@0x11e9
- gcd-list/gcd-list.c::gcd@0x11e9
- gcd-list/gcd-list.c::main@0x125e
- grad-descent/grad-descent.c::main@0x1413
- graph-tests/graph-tests.c::addEdge@0x12c9
- graph-tests/graph-tests.c::addVertex@0x19f6
- graph-tests/graph-tests.c::bfs@0x15ce
- graph-tests/graph-tests.c::bfs_test@0x16e9
- graph-tests/graph-tests.c::bubbleSort@0x1829
- graph-tests/graph-tests.c::createGraph@0x1221
- graph-tests/graph-tests.c::createNode@0x11e9
- graph-tests/graph-tests.c::createQueue@0x1372
- graph-tests/graph-tests.c::dequeue@0x145d
- graph-tests/graph-tests.c::enqueue@0x13d7
- graph-tests/graph-tests.c::insertAtTheBegin@0x17b1
- graph-tests/graph-tests.c::link_list@0x18b8
- graph-tests/graph-tests.c::main@0x1d6c
- graph-tests/graph-tests.c::printQueue@0x151b
- graph-tests/graph-tests.c::swap@0x17f8
- hanoi/hanoi.c::main@0x12d4
- heapsort/heapsort.c::main@0x155f
- heat-calc/heat-calc.c::main@0x11e9
- huff-encode/huff-encode.c::main@0x192d
- idct-alg/idct-alg.c::C@0x11e9
- idct-alg/idct-alg.c::main@0x1472
- indirect-test/indirect-test.c::main@0x12c9
- kadane/kadane.c::main@0x1276
- kepler/kepler.c::bin_fact@0x1b3e
- kepler/kepler.c::binary@0x12c6
- kepler/kepler.c::e_series@0x1389
- kepler/kepler.c::j_series@0x1501
- kepler/kepler.c::main@0x1608
- knapsack/knapsack.c::main@0x138e
- knapsack/knapsack.c::max@0x11e9
- knights-tour/knights-tour.c::solveKT@0x12d6
- life/life.c::getNumNeigbors@0x156f
- life/life.c::main@0x11e9
- life/life.c::process@0x1426
- longdiv/longdiv.c::main@0x18fd
- longdiv/longdiv.c::sub@0x11e9
- lu-decomp/lu-decomp.c::main@0x1520
- lu-decomp/lu-decomp.c::print_matrix@0x11e9
- mandelbrot/mandelbrot.c::main@0x1220
- matmult/matmult.c::main@0x11e9
- max-subseq/max-subseq.c::lcsAlgo@0x11e9
- max-subseq/max-subseq.c::main@0x171a
- mersenne/mersenne.c::genrand@0x12ee
- mersenne/mersenne.c::main@0x153a
- mersenne/mersenne.c::sgenrand@0x11e9
- minspan/minspan.c::displayPath@0x1af2
- minspan/minspan.c::main@0x1d8f
- minspan/minspan.c::minSpanTree@0x1297
- monte-carlo/monte-carlo.c::main@0x11e9
- murmur-hash/murmur-hash.c::main@0x13a9
- murmur-hash/murmur-hash.c::murmurhash@0x11e9
- n-queens/n-queens.c::main@0x12ec
- natlog/natlog.c::main@0x11e9
- nbody-sim/nbody-sim.c::main@0x11e9
- packet-filter/packet-filter.c::generate_packet@0x11e9
- packet-filter/packet-filter.c::main@0x14c3
- parrondo/parrondo.c::cointoss@0x11e9
- parrondo/parrondo.c::main@0x12cb
- pi-calc/pi-calc.c::main@0x11e9
- primal-test/primal-test.c::main@0x1459
- primal-test/primal-test.c::miller_rabin_int@0x12fd
- primal-test/primal-test.c::powm@0x11e9
- priority-queue/priority-queue.c::main@0x13ee
- qsort-demo/qsort-demo.c::main@0x17bf
- qsort-demo/qsort-demo.c::print_struct_array@0x155e
- qsort-demo/qsort-demo.c::sort_cstrings_example@0x1401
- qsort-demo/qsort-demo.c::sort_integers_example@0x1280
- qsort-demo/qsort-demo.c::sort_structs_example@0x1603
- qsort-test/qsort-test.c::main@0x1415
- quaternions/quaternions.c::euler_from_quat@0x1447
- quaternions/quaternions.c::quat_from_euler@0x11e9
- quaternions/quaternions.c::quaternion_multiply@0x1655
- quaternions/quaternions.c::test@0x18b2
- rabinkarp-search/rabinkarp-search.c::main@0x1341
- rand-test/rand-test.c::main@0x1913
- rand-test/rand-test.c::run_tests@0x1258
- ransac/ransac.c::main@0x1466
- regex-parser/regex-parser.c::main@0x32b9
- regex-parser/regex-parser.c::re_compile@0x22e1
- regex-parser/regex-parser.c::re_print@0x278f
- rho-factor/rho-factor.c::main@0x5c7d
- rle-compress/rle-compress.c::run_length_encode@0x11e9
- rsa-cipher/rsa-cipher.c::main@0x1634
- rsa-cipher/rsa-cipher.c::mod_inverse@0x1363
- rsa-cipher/rsa-cipher.c::mod_pow@0x11e9
- rsa-cipher/rsa-cipher.c::print_hex_int128@0x14ef
- sat-solver/sat-solver.c::main@0x1518
- sat-solver/sat-solver.c::printFormula@0x1391
- shortest-path/shortest-path.c::main@0x1469
- sieve/sieve.c::main@0x1300
- simple-grep/simple-grep.c::main@0x11e9
- spelt2num/spelt2num.c::main@0x11e9
- spirograph/spirograph.c::spirograph@0x11e9
- sudoku-solver/sudoku-solver.c::main@0x1532
- tetris-sim/tetris-sim.c::best_move@0x1810
- tetris-sim/tetris-sim.c::evaluate_board@0x1686
- tetris-sim/tetris-sim.c::main@0x1ba5
- tiny-NN/tiny-NN.c::train@0x1485
- topo-sort/topo-sort.c::addEdge@0x12cf
- topo-sort/topo-sort.c::createGraph@0x1259
- topo-sort/topo-sort.c::createListNode@0x1221
- topo-sort/topo-sort.c::createStackNode@0x11e9
- topo-sort/topo-sort.c::main@0x153d
- topo-sort/topo-sort.c::topologicalSort@0x13fd
- topo-sort/topo-sort.c::topologicalSortUtil@0x1332
- totient/totient.c::my_gcd@0x11e9
- transcend/transcend.c::init_inputs_f64@0x1235
- uniquify/uniquify.c::main@0x1228
- vectors-3d/vectors-3d.c::get_cross_matrix@0x1601
- vectors-3d/vectors-3d.c::print_vector@0x144f
- vectors-3d/vectors-3d.c::test@0x17fb
- vectors-3d/vectors-3d.c::unit_vec@0x1510
- vectors-3d/vectors-3d.c::vector_add@0x126d
- vectors-3d/vectors-3d.c::vector_prod@0x1373
- vectors-3d/vectors-3d.c::vector_sub@0x11e9
- verlet/verlet.c::main@0x170b
- verlet/verlet.c::vb_init@0x1271
- verlet/verlet.c::vb_step_avg@0x13aa
- weekday/weekday.c::dayOfWeek@0x11e9
- weekday/weekday.c::main@0x130d
## Execution Failures
- life/life.c::init@0x1237
- regex-parser/regex-parser.c::matchpattern@0x313f
- verlet/verlet.c::vb_checksum@0x160b

View file

@ -0,0 +1,334 @@
# Infer-Out Model 2 Evaluation (merged.O1.func_map.infer-host)
- Timestamp: 20251119-171212
- Source JSONL: merged.O1.func_map.infer.jsonl
- Target: host
- Total cases: 379
- Replacement success: 379 (100.00%)
- Compilable: 155 (40.90%)
- Executable: 148 (39.05%)
## Benchmark Breakdown
| Benchmark | Cases | Replacement% | Build% | Exec% |
| --- | --- | --- | --- | --- |
| ackermann | 2 | 100.00% | 50.00% | 50.00% |
| aes | 9 | 100.00% | 33.33% | 33.33% |
| anagram | 13 | 100.00% | 53.85% | 53.85% |
| audio-codec | 3 | 100.00% | 0.00% | 0.00% |
| avl-tree | 17 | 100.00% | 29.41% | 29.41% |
| banner | 1 | 100.00% | 0.00% | 0.00% |
| bit-kernels | 5 | 100.00% | 80.00% | 80.00% |
| blake2b | 5 | 100.00% | 20.00% | 20.00% |
| bloom-filter | 4 | 100.00% | 50.00% | 50.00% |
| boyer-moore-search | 3 | 100.00% | 0.00% | 0.00% |
| bubble-sort | 3 | 100.00% | 100.00% | 100.00% |
| c-interp | 10 | 100.00% | 60.00% | 60.00% |
| ccmac | 1 | 100.00% | 0.00% | 0.00% |
| checkers | 16 | 100.00% | 81.25% | 81.25% |
| cipher | 3 | 100.00% | 33.33% | 0.00% |
| congrad | 2 | 100.00% | 0.00% | 0.00% |
| connect4-minimax | 13 | 100.00% | 61.54% | 61.54% |
| convex-hull | 4 | 100.00% | 75.00% | 75.00% |
| dhrystone | 5 | 100.00% | 40.00% | 40.00% |
| distinctness | 2 | 100.00% | 0.00% | 0.00% |
| fft-int | 4 | 100.00% | 75.00% | 75.00% |
| flood-fill | 2 | 100.00% | 50.00% | 50.00% |
| frac-calc | 10 | 100.00% | 40.00% | 40.00% |
| fuzzy-match | 3 | 100.00% | 33.33% | 33.33% |
| fy-shuffle | 3 | 100.00% | 33.33% | 33.33% |
| gcd-list | 2 | 100.00% | 0.00% | 0.00% |
| grad-descent | 4 | 100.00% | 0.00% | 0.00% |
| graph-tests | 19 | 100.00% | 21.05% | 21.05% |
| hanoi | 2 | 100.00% | 50.00% | 50.00% |
| heapsort | 2 | 100.00% | 50.00% | 50.00% |
| heat-calc | 1 | 100.00% | 0.00% | 0.00% |
| huff-encode | 13 | 100.00% | 92.31% | 92.31% |
| idct-alg | 3 | 100.00% | 66.67% | 33.33% |
| indirect-test | 2 | 100.00% | 50.00% | 50.00% |
| k-means | 6 | 100.00% | 50.00% | 50.00% |
| kadane | 2 | 100.00% | 50.00% | 50.00% |
| kepler | 7 | 100.00% | 14.29% | 14.29% |
| knapsack | 3 | 100.00% | 33.33% | 33.33% |
| knights-tour | 3 | 100.00% | 66.67% | 66.67% |
| life | 14 | 100.00% | 21.43% | 14.29% |
| longdiv | 7 | 100.00% | 71.43% | 71.43% |
| lu-decomp | 3 | 100.00% | 33.33% | 33.33% |
| lz-compress | 2 | 100.00% | 100.00% | 100.00% |
| mandelbrot | 1 | 100.00% | 0.00% | 0.00% |
| matmult | 1 | 100.00% | 0.00% | 0.00% |
| max-subseq | 2 | 100.00% | 0.00% | 0.00% |
| mersenne | 3 | 100.00% | 0.00% | 0.00% |
| minspan | 8 | 100.00% | 37.50% | 25.00% |
| monte-carlo | 1 | 100.00% | 0.00% | 0.00% |
| murmur-hash | 2 | 100.00% | 0.00% | 0.00% |
| n-queens | 3 | 100.00% | 66.67% | 66.67% |
| natlog | 1 | 100.00% | 0.00% | 0.00% |
| nbody-sim | 1 | 100.00% | 0.00% | 0.00% |
| nr-solver | 1 | 100.00% | 100.00% | 100.00% |
| packet-filter | 4 | 100.00% | 25.00% | 25.00% |
| parrondo | 2 | 100.00% | 0.00% | 0.00% |
| pascal | 3 | 100.00% | 33.33% | 33.33% |
| pi-calc | 1 | 100.00% | 0.00% | 0.00% |
| primal-test | 3 | 100.00% | 33.33% | 33.33% |
| priority-queue | 5 | 100.00% | 80.00% | 80.00% |
| qsort-demo | 7 | 100.00% | 28.57% | 28.57% |
| qsort-test | 5 | 100.00% | 80.00% | 80.00% |
| quaternions | 4 | 100.00% | 0.00% | 0.00% |
| rabinkarp-search | 2 | 100.00% | 0.00% | 0.00% |
| rand-test | 3 | 100.00% | 0.00% | 0.00% |
| ransac | 2 | 100.00% | 0.00% | 0.00% |
| regex-parser | 8 | 100.00% | 25.00% | 12.50% |
| rho-factor | 4 | 100.00% | 75.00% | 75.00% |
| rle-compress | 2 | 100.00% | 0.00% | 0.00% |
| rsa-cipher | 4 | 100.00% | 0.00% | 0.00% |
| sat-solver | 5 | 100.00% | 60.00% | 60.00% |
| shortest-path | 3 | 100.00% | 66.67% | 66.67% |
| sieve | 1 | 100.00% | 0.00% | 0.00% |
| simple-grep | 1 | 100.00% | 0.00% | 0.00% |
| spelt2num | 1 | 100.00% | 0.00% | 0.00% |
| spirograph | 2 | 100.00% | 50.00% | 50.00% |
| sudoku-solver | 4 | 100.00% | 50.00% | 50.00% |
| tetris-sim | 12 | 100.00% | 75.00% | 66.67% |
| tiny-NN | 5 | 100.00% | 40.00% | 40.00% |
| topo-sort | 7 | 100.00% | 0.00% | 0.00% |
| totient | 4 | 100.00% | 50.00% | 50.00% |
| transcend | 1 | 100.00% | 0.00% | 0.00% |
| uniquify | 1 | 100.00% | 0.00% | 0.00% |
| vectors-3d | 8 | 100.00% | 12.50% | 0.00% |
| verlet | 1 | 100.00% | 0.00% | 0.00% |
| weekday | 2 | 100.00% | 0.00% | 0.00% |
## Compilation Failures
- ackermann/ackermann.c::main@0x131c
- aes/aes.c::aes_decrypt@0x161b
- aes/aes.c::aes_encrypt@0x1560
- aes/aes.c::inv_shift_rows@0x12cd
- aes/aes.c::key_expansion@0x14c3
- aes/aes.c::main@0x16d1
- aes/aes.c::shift_rows@0x1248
- anagram/anagram.c::BuildMask@0x1372
- anagram/anagram.c::BuildWord@0x15cd
- anagram/anagram.c::DumpWords@0x17e8
- anagram/anagram.c::FindAnagram@0x1839
- anagram/anagram.c::ReadDict@0x1233
- anagram/anagram.c::main@0x1a93
- audio-codec/audio-codec.c::decode@0x1271
- audio-codec/audio-codec.c::encode@0x11e9
- audio-codec/audio-codec.c::main@0x12d7
- avl-tree/avlcore.c::CheckTreeNodeRotation@0x186a
- avl-tree/element.c::Compare@0x1764
- avl-tree/avlcore.c::DeleteByElement@0x1d2b
- avl-tree/avlcore.c::DeleteByElementRecursive@0x1b8b
- avl-tree/avlcore.c::DoubleLeftRotation@0x1845
- avl-tree/avlcore.c::DoubleRightRotation@0x1821
- avl-tree/avlcore.c::FindByElement@0x1790
- avl-tree/avlcore.c::Height@0x1d6e
- avl-tree/avlcore.c::Insert@0x1a73
- avl-tree/avlcore.c::InsertNode@0x199b
- avl-tree/avl-tree.c::main@0x1380
- avl-tree/avl-tree.c::printTree@0x11e9
- banner/banner.c::main@0x11e9
- bit-kernels/bit-kernels.c::main@0x12e8
- blake2b/blake2b.c::F@0x1258
- blake2b/blake2b.c::G@0x11e9
- blake2b/blake2b.c::blake2b@0x1616
- blake2b/blake2b.c::test@0x1982
- bloom-filter/bloom-filter.c::bad_search@0x11e9
- bloom-filter/bloom-filter.c::main@0x1217
- boyer-moore-search/boyer-moore-search.c::badCharHeuristic@0x11e9
- boyer-moore-search/boyer-moore-search.c::main@0x1329
- boyer-moore-search/boyer-moore-search.c::search@0x1223
- c-interp/c-interp.c::eval@0x35d3
- c-interp/c-interp.c::function_body@0x310b
- c-interp/c-interp.c::main@0x3c45
- c-interp/c-interp.c::next@0x11e9
- ccmac/ccmac.c::main@0x11e9
- checkers/functions.c::fill_print_initial@0x15dd
- checkers/functions.c::link_new_node@0x204d
- checkers/checkers.c::main@0x11e9
- cipher/cipher.c::encipher@0x11e9
- cipher/cipher.c::main@0x12b3
- congrad/congrad.c::cg_spmv@0x11e9
- congrad/congrad.c::main@0x125a
- connect4-minimax/connect4-minimax.c::init_board@0x11e9
- connect4-minimax/connect4-minimax.c::main@0x1c5d
- connect4-minimax/connect4-minimax.c::minimax@0x17ed
- connect4-minimax/connect4-minimax.c::play_game@0x1b13
- connect4-minimax/connect4-minimax.c::score_position@0x158e
- convex-hull/convex-hull.c::main@0x130d
- dhrystone/dhrystone.c::PFunc_1@0x12ab
- dhrystone/dhrystone.c::PFunc_2@0x12c8
- dhrystone/dhrystone.c::main@0x1311
- distinctness/distinctness.c::isDistinct@0x11e9
- distinctness/distinctness.c::main@0x1342
- fft-int/fft-int.c::db_from_ampl@0x1513
- flood-fill/flood-fill.c::main@0x130f
- frac-calc/frac-calc.c::avaliatokens@0x1421
- frac-calc/frac-calc.c::calcula@0x172a
- frac-calc/frac-calc.c::copyr@0x12b5
- frac-calc/frac-calc.c::divtokens@0x1636
- frac-calc/frac-calc.c::help@0x11e9
- frac-calc/frac-calc.c::main@0x18c1
- fuzzy-match/fuzzy-match.c::fuzzy_match_recurse@0x21e9
- fuzzy-match/fuzzy-match.c::main@0x2391
- fy-shuffle/fy-shuffle.c::fy_shuffle@0x11e9
- fy-shuffle/fy-shuffle.c::main@0x12de
- gcd-list/gcd-list.c::gcd@0x11e9
- gcd-list/gcd-list.c::main@0x121c
- grad-descent/grad-descent.c::derivateWRTBias@0x1247
- grad-descent/grad-descent.c::derivateWRTWeight@0x11e9
- grad-descent/grad-descent.c::gradientDescent@0x129d
- grad-descent/grad-descent.c::main@0x1312
- graph-tests/graph-tests.c::addEdge@0x127b
- graph-tests/graph-tests.c::addVertex@0x1743
- graph-tests/graph-tests.c::bfs@0x144f
- graph-tests/graph-tests.c::bfs_test@0x150f
- graph-tests/graph-tests.c::bubbleSort@0x15e7
- graph-tests/graph-tests.c::createGraph@0x1206
- graph-tests/graph-tests.c::createNode@0x11e9
- graph-tests/graph-tests.c::createQueue@0x12cd
- graph-tests/graph-tests.c::dequeue@0x1357
- graph-tests/graph-tests.c::enqueue@0x130a
- graph-tests/graph-tests.c::insertAtTheBegin@0x15ae
- graph-tests/graph-tests.c::link_list@0x163c
- graph-tests/graph-tests.c::main@0x1a0e
- graph-tests/graph-tests.c::printQueue@0x13cc
- graph-tests/graph-tests.c::swap@0x15da
- hanoi/hanoi.c::main@0x1261
- heapsort/heapsort.c::main@0x13d4
- heat-calc/heat-calc.c::main@0x11e9
- huff-encode/huff-encode.c::main@0x15ef
- idct-alg/idct-alg.c::main@0x140e
- indirect-test/indirect-test.c::main@0x1257
- k-means/k-means.c::calculateNearst@0x11e9
- k-means/k-means.c::main@0x1922
- k-means/k-means.c::printEPS@0x1546
- kadane/kadane.c::main@0x123b
- kepler/kepler.c::J@0x18c0
- kepler/kepler.c::bin_fact@0x1718
- kepler/kepler.c::binary@0x121d
- kepler/kepler.c::e_series@0x17a2
- kepler/kepler.c::j_series@0x19bb
- kepler/kepler.c::main@0x131f
- knapsack/knapsack.c::main@0x128b
- knapsack/knapsack.c::max@0x11e9
- knights-tour/knights-tour.c::solveKT@0x1341
- life/life.c::getDown@0x1406
- life/life.c::getDownLeft@0x1487
- life/life.c::getDownRight@0x14b4
- life/life.c::getLeft@0x1390
- life/life.c::getNumNeigbors@0x14e2
- life/life.c::getRight@0x13b7
- life/life.c::getUp@0x13df
- life/life.c::getUpLeft@0x142e
- life/life.c::getUpRight@0x145a
- life/life.c::main@0x1664
- life/life.c::process@0x15a3
- longdiv/longdiv.c::main@0x1691
- longdiv/longdiv.c::sub@0x11e9
- lu-decomp/lu-decomp.c::main@0x13ad
- lu-decomp/lu-decomp.c::print_matrix@0x11e9
- mandelbrot/mandelbrot.c::main@0x120d
- matmult/matmult.c::main@0x11e9
- max-subseq/max-subseq.c::lcsAlgo@0x11e9
- max-subseq/max-subseq.c::main@0x14c4
- mersenne/mersenne.c::genrand@0x125b
- mersenne/mersenne.c::main@0x1398
- mersenne/mersenne.c::sgenrand@0x11e9
- minspan/minspan.c::displayGraph@0x13f5
- minspan/minspan.c::displayGraph1@0x14f3
- minspan/minspan.c::displayPath@0x15fa
- minspan/minspan.c::main@0x175b
- minspan/minspan.c::minSpanTree@0x1231
- monte-carlo/monte-carlo.c::main@0x11e9
- murmur-hash/murmur-hash.c::main@0x12a3
- murmur-hash/murmur-hash.c::murmurhash@0x11e9
- n-queens/n-queens.c::main@0x12b1
- natlog/natlog.c::main@0x11e9
- nbody-sim/nbody-sim.c::main@0x11e9
- packet-filter/packet-filter.c::check_packet_filter@0x133d
- packet-filter/packet-filter.c::generate_packet@0x11e9
- packet-filter/packet-filter.c::main@0x145c
- parrondo/parrondo.c::main@0x127d
- parrondo/parrondo.c::play_c@0x1238
- pascal/pascal.c::main@0x12d1
- pascal/pascal.c::print_centered@0x122b
- pi-calc/pi-calc.c::main@0x11e9
- primal-test/primal-test.c::main@0x13ea
- primal-test/primal-test.c::miller_rabin_int@0x1243
- priority-queue/priority-queue.c::main@0x130a
- qsort-demo/qsort-demo.c::main@0x163f
- qsort-demo/qsort-demo.c::print_struct_array@0x1470
- qsort-demo/qsort-demo.c::sort_cstrings_example@0x13b3
- qsort-demo/qsort-demo.c::sort_integers_example@0x1292
- qsort-demo/qsort-demo.c::sort_structs_example@0x14d2
- qsort-test/qsort-test.c::main@0x133f
- quaternions/quaternions.c::euler_from_quat@0x136c
- quaternions/quaternions.c::main@0x15bf
- quaternions/quaternions.c::quat_from_euler@0x11e9
- quaternions/quaternions.c::quaternion_multiply@0x1487
- rabinkarp-search/rabinkarp-search.c::main@0x1366
- rabinkarp-search/rabinkarp-search.c::search@0x11e9
- rand-test/rand-test.c::bad_rand@0x11e9
- rand-test/rand-test.c::main@0x1514
- rand-test/rand-test.c::run_tests@0x1220
- ransac/ransac.c::main@0x13cf
- ransac/ransac.c::ransac_line_fitting@0x1238
- regex-parser/regex-parser.c::main@0x2b4b
- regex-parser/regex-parser.c::matchalphanum@0x21fc
- regex-parser/regex-parser.c::matchcharclass@0x222a
- regex-parser/regex-parser.c::matchone@0x23e1
- regex-parser/regex-parser.c::re_compile@0x270b
- regex-parser/regex-parser.c::re_print@0x2964
- rho-factor/rho-factor.c::main@0x3ef0
- rle-compress/rle-compress.c::main@0x1318
- rle-compress/rle-compress.c::run_length_encode@0x11e9
- rsa-cipher/rsa-cipher.c::main@0x1527
- rsa-cipher/rsa-cipher.c::mod_inverse@0x12f3
- rsa-cipher/rsa-cipher.c::mod_pow@0x11e9
- rsa-cipher/rsa-cipher.c::print_hex_int128@0x1444
- sat-solver/sat-solver.c::main@0x141e
- sat-solver/sat-solver.c::printFormula@0x12ff
- shortest-path/shortest-path.c::main@0x1333
- sieve/sieve.c::main@0x11e9
- simple-grep/simple-grep.c::main@0x11e9
- spelt2num/spelt2num.c::main@0x11e9
- spirograph/spirograph.c::spirograph@0x11e9
- sudoku-solver/sudoku-solver.c::isSafe@0x11e9
- sudoku-solver/sudoku-solver.c::main@0x13e5
- tetris-sim/tetris-sim.c::best_move@0x157c
- tetris-sim/tetris-sim.c::evaluate_board@0x144b
- tetris-sim/tetris-sim.c::main@0x180d
- tiny-NN/tiny-NN.c::main@0x16a4
- tiny-NN/tiny-NN.c::sampleSine@0x1251
- tiny-NN/tiny-NN.c::train@0x133c
- topo-sort/topo-sort.c::addEdge@0x127d
- topo-sort/topo-sort.c::createGraph@0x1223
- topo-sort/topo-sort.c::createListNode@0x1206
- topo-sort/topo-sort.c::createStackNode@0x11e9
- topo-sort/topo-sort.c::main@0x1424
- topo-sort/topo-sort.c::topologicalSort@0x132c
- topo-sort/topo-sort.c::topologicalSortUtil@0x12b7
- totient/totient.c::main@0x12bf
- totient/totient.c::my_gcd@0x11e9
- transcend/transcend.c::main@0x11e9
- uniquify/uniquify.c::main@0x1201
- vectors-3d/vectors-3d.c::get_cross_matrix@0x13c2
- vectors-3d/vectors-3d.c::main@0x14cb
- vectors-3d/vectors-3d.c::print_vector@0x12dc
- vectors-3d/vectors-3d.c::unit_vec@0x1331
- vectors-3d/vectors-3d.c::vector_add@0x121f
- vectors-3d/vectors-3d.c::vector_prod@0x127e
- vectors-3d/vectors-3d.c::vector_sub@0x11e9
- verlet/verlet.c::main@0x11e9
- weekday/weekday.c::dayOfWeek@0x11e9
- weekday/weekday.c::main@0x12ea
## Execution Failures
- cipher/cipher.c::decipher@0x1251
- idct-alg/idct-alg.c::idct_2d@0x1216
- life/life.c::init@0x11e9
- minspan/minspan.c::displayTree@0x16b7
- regex-parser/regex-parser.c::matchpattern@0x2491
- tetris-sim/tetris-sim.c::clear_lines@0x12b6
- vectors-3d/vectors-3d.c::get_angle@0x1429

View file

@ -0,0 +1,345 @@
# Infer-Out Model 2 Evaluation (merged.O2.func_map.infer-host)
- Timestamp: 20251119-170633
- Source JSONL: merged.O2.func_map.infer.jsonl
- Target: host
- Total cases: 368
- Replacement success: 368 (100.00%)
- Compilable: 139 (37.77%)
- Executable: 126 (34.24%)
## Benchmark Breakdown
| Benchmark | Cases | Replacement% | Build% | Exec% |
| --- | --- | --- | --- | --- |
| ackermann | 2 | 100.00% | 50.00% | 50.00% |
| aes | 10 | 100.00% | 20.00% | 20.00% |
| anagram | 13 | 100.00% | 46.15% | 46.15% |
| audio-codec | 3 | 100.00% | 33.33% | 33.33% |
| avl-tree | 15 | 100.00% | 20.00% | 20.00% |
| banner | 1 | 100.00% | 0.00% | 0.00% |
| bit-kernels | 3 | 100.00% | 66.67% | 66.67% |
| blake2b | 4 | 100.00% | 0.00% | 0.00% |
| bloom-filter | 4 | 100.00% | 50.00% | 50.00% |
| boyer-moore-search | 3 | 100.00% | 0.00% | 0.00% |
| bubble-sort | 3 | 100.00% | 100.00% | 100.00% |
| c-interp | 10 | 100.00% | 50.00% | 50.00% |
| ccmac | 1 | 100.00% | 0.00% | 0.00% |
| checkers | 16 | 100.00% | 68.75% | 62.50% |
| cipher | 3 | 100.00% | 66.67% | 0.00% |
| congrad | 1 | 100.00% | 0.00% | 0.00% |
| connect4-minimax | 13 | 100.00% | 61.54% | 53.85% |
| convex-hull | 4 | 100.00% | 75.00% | 75.00% |
| dhrystone | 5 | 100.00% | 20.00% | 20.00% |
| distinctness | 2 | 100.00% | 0.00% | 0.00% |
| fft-int | 4 | 100.00% | 50.00% | 50.00% |
| flood-fill | 2 | 100.00% | 50.00% | 50.00% |
| frac-calc | 10 | 100.00% | 50.00% | 50.00% |
| fuzzy-match | 3 | 100.00% | 33.33% | 33.33% |
| fy-shuffle | 3 | 100.00% | 33.33% | 33.33% |
| gcd-list | 2 | 100.00% | 50.00% | 0.00% |
| grad-descent | 4 | 100.00% | 25.00% | 25.00% |
| graph-tests | 20 | 100.00% | 10.00% | 10.00% |
| hanoi | 1 | 100.00% | 0.00% | 0.00% |
| heapsort | 2 | 100.00% | 50.00% | 50.00% |
| heat-calc | 1 | 100.00% | 0.00% | 0.00% |
| huff-encode | 13 | 100.00% | 92.31% | 92.31% |
| idct-alg | 3 | 100.00% | 66.67% | 33.33% |
| indirect-test | 2 | 100.00% | 50.00% | 50.00% |
| k-means | 6 | 100.00% | 33.33% | 33.33% |
| kadane | 2 | 100.00% | 50.00% | 50.00% |
| kepler | 7 | 100.00% | 14.29% | 14.29% |
| knapsack | 3 | 100.00% | 33.33% | 33.33% |
| knights-tour | 3 | 100.00% | 33.33% | 33.33% |
| life | 14 | 100.00% | 21.43% | 14.29% |
| longdiv | 6 | 100.00% | 50.00% | 50.00% |
| lu-decomp | 3 | 100.00% | 33.33% | 33.33% |
| lz-compress | 2 | 100.00% | 100.00% | 100.00% |
| mandelbrot | 1 | 100.00% | 0.00% | 0.00% |
| matmult | 1 | 100.00% | 0.00% | 0.00% |
| max-subseq | 2 | 100.00% | 0.00% | 0.00% |
| mersenne | 3 | 100.00% | 0.00% | 0.00% |
| minspan | 8 | 100.00% | 25.00% | 25.00% |
| monte-carlo | 1 | 100.00% | 0.00% | 0.00% |
| murmur-hash | 2 | 100.00% | 0.00% | 0.00% |
| n-queens | 3 | 100.00% | 66.67% | 66.67% |
| natlog | 1 | 100.00% | 0.00% | 0.00% |
| nbody-sim | 1 | 100.00% | 0.00% | 0.00% |
| nr-solver | 1 | 100.00% | 100.00% | 100.00% |
| packet-filter | 4 | 100.00% | 0.00% | 0.00% |
| parrondo | 2 | 100.00% | 50.00% | 50.00% |
| pascal | 3 | 100.00% | 66.67% | 66.67% |
| pi-calc | 1 | 100.00% | 0.00% | 0.00% |
| primal-test | 3 | 100.00% | 33.33% | 33.33% |
| priority-queue | 5 | 100.00% | 80.00% | 80.00% |
| qsort-demo | 7 | 100.00% | 28.57% | 28.57% |
| qsort-test | 5 | 100.00% | 80.00% | 80.00% |
| quaternions | 4 | 100.00% | 0.00% | 0.00% |
| rabinkarp-search | 2 | 100.00% | 0.00% | 0.00% |
| rand-test | 3 | 100.00% | 0.00% | 0.00% |
| ransac | 2 | 100.00% | 50.00% | 0.00% |
| regex-parser | 7 | 100.00% | 28.57% | 14.29% |
| rho-factor | 3 | 100.00% | 66.67% | 66.67% |
| rle-compress | 2 | 100.00% | 0.00% | 0.00% |
| rsa-cipher | 4 | 100.00% | 0.00% | 0.00% |
| sat-solver | 5 | 100.00% | 60.00% | 60.00% |
| shortest-path | 3 | 100.00% | 66.67% | 66.67% |
| sieve | 1 | 100.00% | 0.00% | 0.00% |
| simple-grep | 1 | 100.00% | 0.00% | 0.00% |
| spelt2num | 1 | 100.00% | 0.00% | 0.00% |
| spirograph | 2 | 100.00% | 50.00% | 0.00% |
| sudoku-solver | 4 | 100.00% | 50.00% | 50.00% |
| tetris-sim | 12 | 100.00% | 75.00% | 58.33% |
| tiny-NN | 4 | 100.00% | 25.00% | 25.00% |
| topo-sort | 7 | 100.00% | 0.00% | 0.00% |
| totient | 2 | 100.00% | 50.00% | 50.00% |
| transcend | 1 | 100.00% | 0.00% | 0.00% |
| uniquify | 1 | 100.00% | 0.00% | 0.00% |
| vectors-3d | 8 | 100.00% | 12.50% | 0.00% |
| verlet | 1 | 100.00% | 0.00% | 0.00% |
| weekday | 2 | 100.00% | 0.00% | 0.00% |
## Compilation Failures
- ackermann/ackermann.c::main@0x1100
- aes/aes.c::aes_decrypt@0x18c0
- aes/aes.c::aes_encrypt@0x1780
- aes/aes.c::inv_mix_columns@0x1640
- aes/aes.c::inv_shift_rows@0x14f0
- aes/aes.c::key_expansion@0x16d0
- aes/aes.c::main@0x1100
- aes/aes.c::mix_columns@0x1580
- aes/aes.c::shift_rows@0x1480
- anagram/anagram.c::BuildMask@0x14c0
- anagram/anagram.c::BuildWord@0x17d0
- anagram/anagram.c::DumpCandidates@0x19a0
- anagram/anagram.c::DumpWords@0x1a30
- anagram/anagram.c::FindAnagram@0x1a90
- anagram/anagram.c::ReadDict@0x1360
- anagram/anagram.c::main@0x1120
- audio-codec/audio-codec.c::decode@0x1440
- audio-codec/audio-codec.c::main@0x1100
- avl-tree/avlcore.c::CheckTreeNodeRotation@0x1c30
- avl-tree/element.c::Compare@0x1ad0
- avl-tree/avlcore.c::DeleteByElement@0x2860
- avl-tree/avlcore.c::DeleteByElementRecursive@0x26d0
- avl-tree/avlcore.c::DeleteLeftMost@0x2610
- avl-tree/avlcore.c::DoubleLeftRotation@0x1c00
- avl-tree/avlcore.c::DoubleRightRotation@0x1bd0
- avl-tree/avlcore.c::FindByElement@0x1b00
- avl-tree/avlcore.c::Insert@0x1f30
- avl-tree/avlcore.c::MakeEmpty@0x1f80
- avl-tree/avl-tree.c::breadth@0x1760
- avl-tree/avl-tree.c::main@0x1120
- banner/banner.c::main@0x1120
- bit-kernels/bit-kernels.c::main@0x1120
- blake2b/blake2b.c::F@0x12a0
- blake2b/blake2b.c::G@0x1230
- blake2b/blake2b.c::blake2b@0x1620
- blake2b/blake2b.c::test@0x19d0
- bloom-filter/bloom-filter.c::bad_search@0x1430
- bloom-filter/bloom-filter.c::main@0x1120
- boyer-moore-search/boyer-moore-search.c::badCharHeuristic@0x15d0
- boyer-moore-search/boyer-moore-search.c::main@0x1140
- boyer-moore-search/boyer-moore-search.c::search@0x1630
- c-interp/c-interp.c::eval@0x3e90
- c-interp/c-interp.c::function_body@0x37f0
- c-interp/c-interp.c::function_declaration@0x3a10
- c-interp/c-interp.c::main@0x1120
- c-interp/c-interp.c::next@0x1580
- ccmac/ccmac.c::main@0x1120
- checkers/functions.c::fill_print_initial@0x1630
- checkers/functions.c::free_tree@0x2460
- checkers/functions.c::generate_node_children@0x21c0
- checkers/functions.c::link_new_node@0x20e0
- checkers/checkers.c::main@0x1150
- cipher/cipher.c::main@0x1100
- congrad/congrad.c::main@0x1100
- connect4-minimax/connect4-minimax.c::init_board@0x1230
- connect4-minimax/connect4-minimax.c::main@0x1100
- connect4-minimax/connect4-minimax.c::minimax@0x1840
- connect4-minimax/connect4-minimax.c::play_game@0x1c90
- connect4-minimax/connect4-minimax.c::score_position@0x1620
- convex-hull/convex-hull.c::main@0x1100
- dhrystone/dhrystone.c::PFunc_1@0x1970
- dhrystone/dhrystone.c::PFunc_2@0x1990
- dhrystone/dhrystone.c::PProc_8@0x1900
- dhrystone/dhrystone.c::main@0x1100
- distinctness/distinctness.c::isDistinct@0x12a0
- distinctness/distinctness.c::main@0x1100
- fft-int/fft-int.c::db_from_ampl@0x1670
- fft-int/fft-int.c::fix_fft@0x1320
- flood-fill/flood-fill.c::main@0x1100
- frac-calc/frac-calc.c::avaliatokens@0x15f0
- frac-calc/frac-calc.c::copyr@0x1460
- frac-calc/frac-calc.c::divtokens@0x1840
- frac-calc/frac-calc.c::help@0x13b0
- frac-calc/frac-calc.c::main@0x1120
- fuzzy-match/fuzzy-match.c::fuzzy_match_recurse@0x2360
- fuzzy-match/fuzzy-match.c::main@0x2100
- fy-shuffle/fy-shuffle.c::fy_shuffle@0x1440
- fy-shuffle/fy-shuffle.c::main@0x1100
- gcd-list/gcd-list.c::main@0x1120
- grad-descent/grad-descent.c::derivateWRTBias@0x12d0
- grad-descent/grad-descent.c::derivateWRTWeight@0x1270
- grad-descent/grad-descent.c::main@0x1100
- graph-tests/graph-tests.c::DFS_test@0x1c20
- graph-tests/graph-tests.c::addEdge@0x1320
- graph-tests/graph-tests.c::addVertex@0x1a50
- graph-tests/graph-tests.c::bfs@0x1540
- graph-tests/graph-tests.c::bfs_test@0x1720
- graph-tests/graph-tests.c::bubbleSort@0x1880
- graph-tests/graph-tests.c::createGraph@0x1260
- graph-tests/graph-tests.c::createNode@0x1240
- graph-tests/graph-tests.c::createQueue@0x1390
- graph-tests/graph-tests.c::depthFirstSearch@0x1b20
- graph-tests/graph-tests.c::dequeue@0x1430
- graph-tests/graph-tests.c::enqueue@0x13e0
- graph-tests/graph-tests.c::getAdjUnvisitedVertex@0x1ac0
- graph-tests/graph-tests.c::insertAtTheBegin@0x1840
- graph-tests/graph-tests.c::link_list@0x18e0
- graph-tests/graph-tests.c::main@0x1120
- graph-tests/graph-tests.c::printQueue@0x14c0
- graph-tests/graph-tests.c::swap@0x1870
- hanoi/hanoi.c::main@0x1100
- heapsort/heapsort.c::main@0x1100
- heat-calc/heat-calc.c::main@0x1100
- huff-encode/huff-encode.c::main@0x1120
- idct-alg/idct-alg.c::main@0x1100
- indirect-test/indirect-test.c::main@0x1100
- k-means/k-means.c::calculateNearst@0x1310
- k-means/k-means.c::kMeans@0x1420
- k-means/k-means.c::main@0x1120
- k-means/k-means.c::printEPS@0x16b0
- kadane/kadane.c::main@0x1100
- kepler/kepler.c::J@0x1920
- kepler/kepler.c::bin_fact@0x1740
- kepler/kepler.c::binary@0x16a0
- kepler/kepler.c::e_series@0x17e0
- kepler/kepler.c::j_series@0x1a20
- kepler/kepler.c::main@0x1100
- knapsack/knapsack.c::main@0x1100
- knapsack/knapsack.c::max@0x1310
- knights-tour/knights-tour.c::solveKT@0x1390
- knights-tour/knights-tour.c::solveKTUtil@0x14f0
- life/life.c::getDown@0x16e0
- life/life.c::getDownLeft@0x1770
- life/life.c::getDownRight@0x17a0
- life/life.c::getLeft@0x1650
- life/life.c::getNumNeigbors@0x1390
- life/life.c::getRight@0x1680
- life/life.c::getUp@0x16b0
- life/life.c::getUpLeft@0x1710
- life/life.c::getUpRight@0x1740
- life/life.c::main@0x1100
- life/life.c::process@0x1550
- longdiv/longdiv.c::main@0x1120
- longdiv/longdiv.c::sbc@0x1a20
- longdiv/longdiv.c::sub@0x19c0
- lu-decomp/lu-decomp.c::main@0x1100
- lu-decomp/lu-decomp.c::print_matrix@0x13a0
- mandelbrot/mandelbrot.c::main@0x1100
- matmult/matmult.c::main@0x1100
- max-subseq/max-subseq.c::lcsAlgo@0x1290
- max-subseq/max-subseq.c::main@0x1120
- mersenne/mersenne.c::genrand@0x1310
- mersenne/mersenne.c::main@0x1100
- mersenne/mersenne.c::sgenrand@0x1290
- minspan/minspan.c::displayGraph@0x14f0
- minspan/minspan.c::displayGraph1@0x15f0
- minspan/minspan.c::displayPath@0x1700
- minspan/minspan.c::displayTree@0x17a0
- minspan/minspan.c::main@0x1100
- minspan/minspan.c::minSpanTree@0x12f0
- monte-carlo/monte-carlo.c::main@0x1100
- murmur-hash/murmur-hash.c::main@0x1100
- murmur-hash/murmur-hash.c::murmurhash@0x1290
- n-queens/n-queens.c::main@0x1120
- natlog/natlog.c::main@0x1100
- nbody-sim/nbody-sim.c::main@0x1100
- packet-filter/packet-filter.c::check_packet_filter@0x1430
- packet-filter/packet-filter.c::generate_packet@0x12d0
- packet-filter/packet-filter.c::main@0x1100
- packet-filter/packet-filter.c::print_packet@0x1490
- parrondo/parrondo.c::main@0x1100
- pascal/pascal.c::main@0x1100
- pi-calc/pi-calc.c::main@0x1100
- primal-test/primal-test.c::main@0x1100
- primal-test/primal-test.c::miller_rabin_int@0x1510
- priority-queue/priority-queue.c::main@0x1120
- qsort-demo/qsort-demo.c::main@0x1120
- qsort-demo/qsort-demo.c::print_struct_array@0x15c0
- qsort-demo/qsort-demo.c::sort_cstrings_example@0x14a0
- qsort-demo/qsort-demo.c::sort_integers_example@0x1310
- qsort-demo/qsort-demo.c::sort_structs_example@0x1640
- qsort-test/qsort-test.c::main@0x1120
- quaternions/quaternions.c::euler_from_quat@0x1580
- quaternions/quaternions.c::main@0x1100
- quaternions/quaternions.c::quat_from_euler@0x13f0
- quaternions/quaternions.c::quaternion_multiply@0x16b0
- rabinkarp-search/rabinkarp-search.c::main@0x1120
- rabinkarp-search/rabinkarp-search.c::search@0x13a0
- rand-test/rand-test.c::bad_rand@0x1240
- rand-test/rand-test.c::main@0x1100
- rand-test/rand-test.c::run_tests@0x1280
- ransac/ransac.c::main@0x1100
- regex-parser/regex-parser.c::main@0x2100
- regex-parser/regex-parser.c::matchcharclass@0x23b0
- regex-parser/regex-parser.c::matchone@0x2560
- regex-parser/regex-parser.c::re_compile@0x2930
- regex-parser/regex-parser.c::re_print@0x2bf0
- rho-factor/rho-factor.c::main@0x1120
- rle-compress/rle-compress.c::main@0x1120
- rle-compress/rle-compress.c::run_length_encode@0x1330
- rsa-cipher/rsa-cipher.c::main@0x1100
- rsa-cipher/rsa-cipher.c::mod_inverse@0x1670
- rsa-cipher/rsa-cipher.c::mod_pow@0x1580
- rsa-cipher/rsa-cipher.c::print_hex_int128@0x1790
- sat-solver/sat-solver.c::main@0x1100
- sat-solver/sat-solver.c::printFormula@0x1390
- shortest-path/shortest-path.c::main@0x1100
- sieve/sieve.c::main@0x1100
- simple-grep/simple-grep.c::main@0x1120
- spelt2num/spelt2num.c::main@0x1100
- spirograph/spirograph.c::spirograph@0x1230
- sudoku-solver/sudoku-solver.c::isSafe@0x1250
- sudoku-solver/sudoku-solver.c::main@0x1100
- tetris-sim/tetris-sim.c::best_move@0x1860
- tetris-sim/tetris-sim.c::evaluate_board@0x1640
- tetris-sim/tetris-sim.c::main@0x1120
- tiny-NN/tiny-NN.c::main@0x1120
- tiny-NN/tiny-NN.c::sampleSine@0x12d0
- tiny-NN/tiny-NN.c::train@0x13e0
- topo-sort/topo-sort.c::addEdge@0x1370
- topo-sort/topo-sort.c::createGraph@0x1300
- topo-sort/topo-sort.c::createListNode@0x12e0
- topo-sort/topo-sort.c::createStackNode@0x12c0
- topo-sort/topo-sort.c::main@0x1120
- topo-sort/topo-sort.c::topologicalSort@0x1450
- topo-sort/topo-sort.c::topologicalSortUtil@0x13c0
- totient/totient.c::main@0x1100
- transcend/transcend.c::main@0x1120
- uniquify/uniquify.c::main@0x1120
- vectors-3d/vectors-3d.c::get_cross_matrix@0x1760
- vectors-3d/vectors-3d.c::main@0x1100
- vectors-3d/vectors-3d.c::print_vector@0x1620
- vectors-3d/vectors-3d.c::unit_vec@0x1690
- vectors-3d/vectors-3d.c::vector_add@0x1550
- vectors-3d/vectors-3d.c::vector_prod@0x15c0
- vectors-3d/vectors-3d.c::vector_sub@0x1510
- verlet/verlet.c::main@0x1100
- weekday/weekday.c::dayOfWeek@0x1350
- weekday/weekday.c::main@0x1100
## Execution Failures
- checkers/functions.c::all_possible_moves@0x1a60
- cipher/cipher.c::decipher@0x1360
- cipher/cipher.c::encipher@0x12f0
- connect4-minimax/connect4-minimax.c::terminal_score@0x1800
- gcd-list/gcd-list.c::gcd@0x1310
- idct-alg/idct-alg.c::idct_2d@0x12f0
- life/life.c::init@0x1220
- ransac/ransac.c::ransac_line_fitting@0x1410
- regex-parser/regex-parser.c::matchpattern@0x2670
- spirograph/spirograph.c::test@0x1390
- tetris-sim/tetris-sim.c::clear_lines@0x1480
- tetris-sim/tetris-sim.c::simulate_board@0x17c0
- vectors-3d/vectors-3d.c::get_angle@0x17d0

View file

@ -0,0 +1,355 @@
# Infer-Out Model 2 Evaluation (merged.O3.func_map.infer-host)
- Timestamp: 20251119-171533
- Source JSONL: merged.O3.func_map.infer.jsonl
- Target: host
- Total cases: 359
- Replacement success: 359 (100.00%)
- Compilable: 114 (31.75%)
- Executable: 106 (29.53%)
## Benchmark Breakdown
| Benchmark | Cases | Replacement% | Build% | Exec% |
| --- | --- | --- | --- | --- |
| ackermann | 2 | 100.00% | 50.00% | 50.00% |
| aes | 11 | 100.00% | 27.27% | 27.27% |
| anagram | 13 | 100.00% | 38.46% | 38.46% |
| audio-codec | 3 | 100.00% | 33.33% | 33.33% |
| avl-tree | 15 | 100.00% | 13.33% | 13.33% |
| banner | 1 | 100.00% | 0.00% | 0.00% |
| bit-kernels | 3 | 100.00% | 66.67% | 66.67% |
| blake2b | 3 | 100.00% | 0.00% | 0.00% |
| bloom-filter | 4 | 100.00% | 25.00% | 25.00% |
| boyer-moore-search | 3 | 100.00% | 0.00% | 0.00% |
| bubble-sort | 3 | 100.00% | 100.00% | 100.00% |
| c-interp | 10 | 100.00% | 40.00% | 40.00% |
| ccmac | 1 | 100.00% | 0.00% | 0.00% |
| checkers | 13 | 100.00% | 61.54% | 61.54% |
| cipher | 3 | 100.00% | 33.33% | 0.00% |
| congrad | 1 | 100.00% | 0.00% | 0.00% |
| connect4-minimax | 11 | 100.00% | 45.45% | 45.45% |
| convex-hull | 4 | 100.00% | 50.00% | 50.00% |
| dhrystone | 5 | 100.00% | 40.00% | 40.00% |
| distinctness | 2 | 100.00% | 0.00% | 0.00% |
| fft-int | 4 | 100.00% | 0.00% | 0.00% |
| flood-fill | 2 | 100.00% | 50.00% | 50.00% |
| frac-calc | 9 | 100.00% | 22.22% | 22.22% |
| fuzzy-match | 3 | 100.00% | 33.33% | 33.33% |
| fy-shuffle | 3 | 100.00% | 33.33% | 33.33% |
| gcd-list | 2 | 100.00% | 0.00% | 0.00% |
| grad-descent | 4 | 100.00% | 0.00% | 0.00% |
| graph-tests | 19 | 100.00% | 5.26% | 5.26% |
| hanoi | 1 | 100.00% | 0.00% | 0.00% |
| heapsort | 2 | 100.00% | 0.00% | 0.00% |
| heat-calc | 1 | 100.00% | 0.00% | 0.00% |
| huff-encode | 12 | 100.00% | 83.33% | 83.33% |
| idct-alg | 3 | 100.00% | 66.67% | 33.33% |
| indirect-test | 2 | 100.00% | 50.00% | 50.00% |
| k-means | 5 | 100.00% | 0.00% | 0.00% |
| kadane | 2 | 100.00% | 50.00% | 50.00% |
| kepler | 7 | 100.00% | 14.29% | 14.29% |
| knapsack | 3 | 100.00% | 33.33% | 33.33% |
| knights-tour | 3 | 100.00% | 33.33% | 33.33% |
| life | 14 | 100.00% | 21.43% | 14.29% |
| longdiv | 7 | 100.00% | 71.43% | 71.43% |
| lu-decomp | 3 | 100.00% | 33.33% | 33.33% |
| lz-compress | 2 | 100.00% | 100.00% | 100.00% |
| mandelbrot | 1 | 100.00% | 0.00% | 0.00% |
| max-subseq | 2 | 100.00% | 0.00% | 0.00% |
| mersenne | 4 | 100.00% | 0.00% | 0.00% |
| minspan | 8 | 100.00% | 25.00% | 25.00% |
| monte-carlo | 1 | 100.00% | 0.00% | 0.00% |
| murmur-hash | 2 | 100.00% | 0.00% | 0.00% |
| n-queens | 3 | 100.00% | 66.67% | 66.67% |
| natlog | 1 | 100.00% | 0.00% | 0.00% |
| nbody-sim | 1 | 100.00% | 0.00% | 0.00% |
| nr-solver | 1 | 100.00% | 100.00% | 100.00% |
| packet-filter | 4 | 100.00% | 0.00% | 0.00% |
| parrondo | 2 | 100.00% | 50.00% | 50.00% |
| pascal | 3 | 100.00% | 66.67% | 66.67% |
| pi-calc | 1 | 100.00% | 0.00% | 0.00% |
| primal-test | 3 | 100.00% | 66.67% | 66.67% |
| priority-queue | 5 | 100.00% | 40.00% | 40.00% |
| qsort-demo | 7 | 100.00% | 28.57% | 28.57% |
| qsort-test | 5 | 100.00% | 80.00% | 80.00% |
| quaternions | 4 | 100.00% | 0.00% | 0.00% |
| rabinkarp-search | 2 | 100.00% | 0.00% | 0.00% |
| rand-test | 3 | 100.00% | 0.00% | 0.00% |
| ransac | 2 | 100.00% | 50.00% | 0.00% |
| regex-parser | 8 | 100.00% | 25.00% | 25.00% |
| rho-factor | 1 | 100.00% | 100.00% | 100.00% |
| rle-compress | 2 | 100.00% | 0.00% | 0.00% |
| rsa-cipher | 4 | 100.00% | 0.00% | 0.00% |
| sat-solver | 5 | 100.00% | 60.00% | 40.00% |
| shortest-path | 3 | 100.00% | 33.33% | 33.33% |
| sieve | 1 | 100.00% | 0.00% | 0.00% |
| simple-grep | 1 | 100.00% | 0.00% | 0.00% |
| spelt2num | 1 | 100.00% | 0.00% | 0.00% |
| spirograph | 2 | 100.00% | 50.00% | 0.00% |
| sudoku-solver | 4 | 100.00% | 75.00% | 75.00% |
| tetris-sim | 12 | 100.00% | 58.33% | 50.00% |
| tiny-NN | 4 | 100.00% | 25.00% | 25.00% |
| topo-sort | 7 | 100.00% | 0.00% | 0.00% |
| totient | 2 | 100.00% | 50.00% | 50.00% |
| transcend | 1 | 100.00% | 0.00% | 0.00% |
| uniquify | 1 | 100.00% | 0.00% | 0.00% |
| vectors-3d | 8 | 100.00% | 12.50% | 0.00% |
| verlet | 1 | 100.00% | 0.00% | 0.00% |
| weekday | 2 | 100.00% | 0.00% | 0.00% |
## Compilation Failures
- ackermann/ackermann.c::main@0x1100
- aes/aes.c::add_round_key@0x1810
- aes/aes.c::aes_decrypt@0x2760
- aes/aes.c::aes_encrypt@0x2200
- aes/aes.c::inv_shift_rows@0x1af0
- aes/aes.c::key_expansion@0x1ff0
- aes/aes.c::main@0x1100
- aes/aes.c::mix_columns@0x1bd0
- aes/aes.c::shift_rows@0x1a30
- anagram/anagram.c::BuildMask@0x1620
- anagram/anagram.c::BuildWord@0x1940
- anagram/anagram.c::DumpCandidates@0x1c10
- anagram/anagram.c::DumpWords@0x1ca0
- anagram/anagram.c::FindAnagram@0x1d00
- anagram/anagram.c::ReadDict@0x14c0
- anagram/anagram.c::SortCandidates@0x1f10
- anagram/anagram.c::main@0x1120
- audio-codec/audio-codec.c::decode@0x1590
- audio-codec/audio-codec.c::main@0x1100
- avl-tree/avlcore.c::CheckTreeNodeRotation@0x1c50
- avl-tree/element.c::Compare@0x1af0
- avl-tree/avlcore.c::DeleteByElement@0x2e50
- avl-tree/avlcore.c::DeleteByElementRecursive@0x2bf0
- avl-tree/avlcore.c::DeleteLeftMost@0x2720
- avl-tree/avlcore.c::DoubleLeftRotation@0x1c20
- avl-tree/avlcore.c::DoubleRightRotation@0x1bf0
- avl-tree/avlcore.c::FindByElement@0x1b20
- avl-tree/avlcore.c::Insert@0x1f40
- avl-tree/avlcore.c::InsertNode@0x1e10
- avl-tree/avlcore.c::MakeEmpty@0x2090
- avl-tree/avl-tree.c::breadth@0x1780
- avl-tree/avl-tree.c::main@0x1120
- banner/banner.c::main@0x1120
- bit-kernels/bit-kernels.c::main@0x1120
- blake2b/blake2b.c::F@0x12e0
- blake2b/blake2b.c::blake2b@0x17b0
- blake2b/blake2b.c::test@0x1b50
- bloom-filter/bloom-filter.c::bad_search@0x1450
- bloom-filter/tinybloom.c::bfilter_intersect@0x1570
- bloom-filter/bloom-filter.c::main@0x1120
- boyer-moore-search/boyer-moore-search.c::badCharHeuristic@0x15d0
- boyer-moore-search/boyer-moore-search.c::main@0x1140
- boyer-moore-search/boyer-moore-search.c::search@0x1630
- c-interp/c-interp.c::enum_declaration@0x34f0
- c-interp/c-interp.c::eval@0x3ea0
- c-interp/c-interp.c::function_body@0x37f0
- c-interp/c-interp.c::function_declaration@0x3a10
- c-interp/c-interp.c::main@0x1120
- c-interp/c-interp.c::next@0x15a0
- ccmac/ccmac.c::main@0x1120
- checkers/functions.c::fill_print_initial@0x18e0
- checkers/functions.c::free_tree@0x6210
- checkers/functions.c::generate_node_children@0x35d0
- checkers/functions.c::link_new_node@0x34c0
- checkers/checkers.c::main@0x1130
- cipher/cipher.c::encipher@0x12f0
- cipher/cipher.c::main@0x1100
- congrad/congrad.c::main@0x1100
- connect4-minimax/connect4-minimax.c::board_full@0x1500
- connect4-minimax/connect4-minimax.c::evaluate_window@0x2380
- connect4-minimax/connect4-minimax.c::init_board@0x1230
- connect4-minimax/connect4-minimax.c::main@0x1100
- connect4-minimax/connect4-minimax.c::minimax@0x3c30
- connect4-minimax/connect4-minimax.c::play_game@0x4260
- convex-hull/convex-hull.c::main@0x1100
- convex-hull/convex-hull.c::sortPoints@0x1740
- dhrystone/dhrystone.c::PFunc_1@0x1980
- dhrystone/dhrystone.c::PProc_8@0x1910
- dhrystone/dhrystone.c::main@0x1100
- distinctness/distinctness.c::isDistinct@0x12a0
- distinctness/distinctness.c::main@0x1100
- fft-int/fft-int.c::db_from_ampl@0x1c50
- fft-int/fft-int.c::fix_fft@0x1370
- fft-int/fft-int.c::fix_loud@0x1a90
- fft-int/fft-int.c::window@0x1650
- flood-fill/flood-fill.c::main@0x1100
- frac-calc/frac-calc.c::avaliatokens@0x1730
- frac-calc/frac-calc.c::copyr@0x1550
- frac-calc/frac-calc.c::divtokens@0x1980
- frac-calc/frac-calc.c::help@0x14a0
- frac-calc/frac-calc.c::main@0x1120
- frac-calc/frac-calc.c::misto@0x1610
- frac-calc/frac-calc.c::simplifica@0x28f0
- fuzzy-match/fuzzy-match.c::fuzzy_match_recurse@0x23e0
- fuzzy-match/fuzzy-match.c::main@0x2100
- fy-shuffle/fy-shuffle.c::fy_shuffle@0x1440
- fy-shuffle/fy-shuffle.c::main@0x1100
- gcd-list/gcd-list.c::gcd@0x1310
- gcd-list/gcd-list.c::main@0x1120
- grad-descent/grad-descent.c::derivateWRTBias@0x12e0
- grad-descent/grad-descent.c::derivateWRTWeight@0x1270
- grad-descent/grad-descent.c::gradientDescent@0x1350
- grad-descent/grad-descent.c::main@0x1100
- graph-tests/graph-tests.c::DFS_test@0x2340
- graph-tests/graph-tests.c::addEdge@0x1610
- graph-tests/graph-tests.c::addVertex@0x1f80
- graph-tests/graph-tests.c::bfs@0x1830
- graph-tests/graph-tests.c::bfs_test@0x1a70
- graph-tests/graph-tests.c::bubbleSort@0x1db0
- graph-tests/graph-tests.c::createGraph@0x1550
- graph-tests/graph-tests.c::createNode@0x1530
- graph-tests/graph-tests.c::createQueue@0x1680
- graph-tests/graph-tests.c::depthFirstSearch@0x2110
- graph-tests/graph-tests.c::dequeue@0x1720
- graph-tests/graph-tests.c::enqueue@0x16d0
- graph-tests/graph-tests.c::insertAtTheBegin@0x1d70
- graph-tests/graph-tests.c::link_list@0x1e20
- graph-tests/graph-tests.c::main@0x1180
- graph-tests/graph-tests.c::printQueue@0x17b0
- graph-tests/graph-tests.c::swap@0x1da0
- graph-tests/graph-tests.c::towers@0x2490
- hanoi/hanoi.c::main@0x1100
- heapsort/heapsort.c::HSORT@0x12f0
- heapsort/heapsort.c::main@0x11a0
- heat-calc/heat-calc.c::main@0x1100
- huff-encode/huff-encode.c::buildHuffmanTree@0x18b0
- huff-encode/huff-encode.c::main@0x1120
- idct-alg/idct-alg.c::main@0x1100
- indirect-test/indirect-test.c::main@0x1100
- k-means/k-means.c::calculateCentroid@0x1390
- k-means/k-means.c::calculateNearst@0x1310
- k-means/k-means.c::kMeans@0x1400
- k-means/k-means.c::main@0x1120
- k-means/k-means.c::printEPS@0x16c0
- kadane/kadane.c::main@0x1100
- kepler/kepler.c::J@0x1b80
- kepler/kepler.c::bin_fact@0x1ad0
- kepler/kepler.c::binary@0x16a0
- kepler/kepler.c::e_series@0x1740
- kepler/kepler.c::j_series@0x1920
- kepler/kepler.c::main@0x1100
- knapsack/knapsack.c::main@0x1100
- knapsack/knapsack.c::max@0x1310
- knights-tour/knights-tour.c::solveKT@0x1830
- knights-tour/knights-tour.c::solveKTUtil@0x1980
- life/life.c::getDown@0x1960
- life/life.c::getDownLeft@0x19f0
- life/life.c::getDownRight@0x1a20
- life/life.c::getLeft@0x18d0
- life/life.c::getNumNeigbors@0x16d0
- life/life.c::getRight@0x1900
- life/life.c::getUp@0x1930
- life/life.c::getUpLeft@0x1990
- life/life.c::getUpRight@0x19c0
- life/life.c::main@0x1100
- life/life.c::process@0x1430
- longdiv/longdiv.c::main@0x1120
- longdiv/longdiv.c::sub@0x1a80
- lu-decomp/lu-decomp.c::main@0x1100
- lu-decomp/lu-decomp.c::print_matrix@0x1320
- mandelbrot/mandelbrot.c::main@0x1100
- max-subseq/max-subseq.c::lcsAlgo@0x1290
- max-subseq/max-subseq.c::main@0x1120
- mersenne/mersenne.c::genrand@0x1380
- mersenne/mersenne.c::lsgenrand@0x1320
- mersenne/mersenne.c::main@0x1100
- mersenne/mersenne.c::sgenrand@0x12d0
- minspan/minspan.c::displayGraph@0x1db0
- minspan/minspan.c::displayGraph1@0x1ee0
- minspan/minspan.c::displayPath@0x2020
- minspan/minspan.c::displayTree@0x20c0
- minspan/minspan.c::main@0x1100
- minspan/minspan.c::minSpanTree@0x1400
- monte-carlo/monte-carlo.c::main@0x1100
- murmur-hash/murmur-hash.c::main@0x1100
- murmur-hash/murmur-hash.c::murmurhash@0x1290
- n-queens/n-queens.c::main@0x1120
- natlog/natlog.c::main@0x1100
- nbody-sim/nbody-sim.c::main@0x1100
- packet-filter/packet-filter.c::check_packet_filter@0x1520
- packet-filter/packet-filter.c::generate_packet@0x13d0
- packet-filter/packet-filter.c::main@0x1100
- packet-filter/packet-filter.c::print_packet@0x1580
- parrondo/parrondo.c::main@0x1100
- pascal/pascal.c::main@0x1100
- pi-calc/pi-calc.c::main@0x1100
- primal-test/primal-test.c::main@0x1100
- priority-queue/priority-queue.c::main@0x1120
- priority-queue/priority-queue.c::newNode@0x13a0
- priority-queue/priority-queue.c::push@0x1420
- qsort-demo/qsort-demo.c::main@0x1120
- qsort-demo/qsort-demo.c::print_struct_array@0x15b0
- qsort-demo/qsort-demo.c::sort_cstrings_example@0x1480
- qsort-demo/qsort-demo.c::sort_integers_example@0x1310
- qsort-demo/qsort-demo.c::sort_structs_example@0x1630
- qsort-test/qsort-test.c::main@0x1120
- quaternions/quaternions.c::euler_from_quat@0x1550
- quaternions/quaternions.c::main@0x1100
- quaternions/quaternions.c::quat_from_euler@0x13e0
- quaternions/quaternions.c::quaternion_multiply@0x1670
- rabinkarp-search/rabinkarp-search.c::main@0x1120
- rabinkarp-search/rabinkarp-search.c::search@0x15a0
- rand-test/rand-test.c::bad_rand@0x1240
- rand-test/rand-test.c::main@0x1100
- rand-test/rand-test.c::run_tests@0x1280
- ransac/ransac.c::main@0x1100
- regex-parser/regex-parser.c::main@0x2100
- regex-parser/regex-parser.c::matchcharclass@0x2420
- regex-parser/regex-parser.c::matchone@0x25c0
- regex-parser/regex-parser.c::matchpattern@0x26d0
- regex-parser/regex-parser.c::re_compile@0x2ac0
- regex-parser/regex-parser.c::re_print@0x2e30
- rle-compress/rle-compress.c::main@0x1120
- rle-compress/rle-compress.c::run_length_encode@0x1330
- rsa-cipher/rsa-cipher.c::main@0x1100
- rsa-cipher/rsa-cipher.c::mod_inverse@0x15a0
- rsa-cipher/rsa-cipher.c::mod_pow@0x14b0
- rsa-cipher/rsa-cipher.c::print_hex_int128@0x16c0
- sat-solver/sat-solver.c::main@0x1100
- sat-solver/sat-solver.c::printFormula@0x1680
- shortest-path/shortest-path.c::floydWarshall@0x1330
- shortest-path/shortest-path.c::main@0x1100
- sieve/sieve.c::main@0x1100
- simple-grep/simple-grep.c::main@0x1120
- spelt2num/spelt2num.c::main@0x1100
- spirograph/spirograph.c::spirograph@0x1230
- sudoku-solver/sudoku-solver.c::main@0x1100
- tetris-sim/tetris-sim.c::aggregate_height@0x1b20
- tetris-sim/tetris-sim.c::best_move@0x21d0
- tetris-sim/tetris-sim.c::count_holes@0x1b70
- tetris-sim/tetris-sim.c::evaluate_board@0x1ca0
- tetris-sim/tetris-sim.c::main@0x1100
- tiny-NN/tiny-NN.c::main@0x1120
- tiny-NN/tiny-NN.c::sampleSine@0x12d0
- tiny-NN/tiny-NN.c::train@0x13e0
- topo-sort/topo-sort.c::addEdge@0x13f0
- topo-sort/topo-sort.c::createGraph@0x1380
- topo-sort/topo-sort.c::createListNode@0x1360
- topo-sort/topo-sort.c::createStackNode@0x1340
- topo-sort/topo-sort.c::main@0x1120
- topo-sort/topo-sort.c::topologicalSort@0x18b0
- topo-sort/topo-sort.c::topologicalSortUtil@0x1440
- totient/totient.c::main@0x1100
- transcend/transcend.c::main@0x1120
- uniquify/uniquify.c::main@0x1120
- vectors-3d/vectors-3d.c::get_cross_matrix@0x1850
- vectors-3d/vectors-3d.c::main@0x1100
- vectors-3d/vectors-3d.c::print_vector@0x1730
- vectors-3d/vectors-3d.c::unit_vec@0x17a0
- vectors-3d/vectors-3d.c::vector_add@0x1650
- vectors-3d/vectors-3d.c::vector_prod@0x16b0
- vectors-3d/vectors-3d.c::vector_sub@0x1620
- verlet/verlet.c::main@0x1100
- weekday/weekday.c::dayOfWeek@0x1290
- weekday/weekday.c::main@0x1100
## Execution Failures
- cipher/cipher.c::decipher@0x1360
- idct-alg/idct-alg.c::idct_2d@0x12f0
- life/life.c::init@0x12c0
- ransac/ransac.c::ransac_line_fitting@0x1410
- sat-solver/sat-solver.c::solveSAT@0x13a0
- spirograph/spirograph.c::test@0x1390
- tetris-sim/tetris-sim.c::clear_lines@0x19a0
- vectors-3d/vectors-3d.c::get_angle@0x18c0

View file

@ -0,0 +1,493 @@
#!/usr/bin/env python3
"""Generate function-level mappings across source, pseudo, and assembly outputs."""
from __future__ import annotations
import argparse
import json
import os
import re
import sys
from pathlib import Path
from typing import Dict, List, Optional
import subprocess
FUNC_KEYWORDS = {"if", "for", "while", "switch", "return", "sizeof", "do", "case", "else"}
TYPEDEF_MAP = {
"cpu_set_t": "int",
"nl_item": "int",
"__time_t": "int",
"__mode_t": "unsigned short",
"__off64_t": "long long",
"__blksize_t": "long",
"__ino_t": "unsigned long",
"__blkcnt_t": "unsigned long long",
"__syscall_slong_t": "long",
"__ssize_t": "long int",
"wchar_t": "unsigned short int",
"wctype_t": "unsigned short int",
"__int64": "long long",
"__int32": "int",
"__int16": "short",
"__int8": "char",
"_QWORD": "uint64_t",
"_OWORD": "long double",
"_DWORD": "uint32_t",
"size_t": "unsigned int",
"_BYTE": "uint8_t",
"_TBYTE": "uint16_t",
"_BOOL8": "uint8_t",
"gcc_va_list": "va_list",
"_WORD": "unsigned short",
"_BOOL4": "int",
"__va_list_tag": "va_list",
"_IO_FILE": "FILE",
"DIR": "int",
"__fsword_t": "long",
"__kernel_ulong_t": "int",
"cc_t": "int",
"speed_t": "int",
"fd_set": "int",
"__suseconds_t": "int",
"_UNKNOWN": "void",
"__sighandler_t": "void (*)(int)",
"__compar_fn_t": "int (*)(const void *, const void *)",
}
def _load_config_env() -> dict:
"""Load config.env from the eval project root."""
eval_root = Path(__file__).resolve().parents[1]
config_path = eval_root / "config.env"
config = {}
if config_path.exists():
for line in config_path.read_text().splitlines():
line = line.strip()
if not line or line.startswith("#"):
continue
if "=" in line:
key, _, value = line.partition("=")
config[key.strip()] = value.strip()
return config
def _get_bench_root(cli_value: str | None = None) -> Path:
"""Resolve the benchmark repo root from CLI arg, env var, or config.env."""
if cli_value:
return Path(cli_value).resolve()
env_val = os.environ.get("BENCH_REPO_ROOT")
if env_val:
return Path(env_val).resolve()
config = _load_config_env()
if "BENCH_REPO_ROOT" in config:
return Path(config["BENCH_REPO_ROOT"]).resolve()
sys.exit("error: BENCH_REPO_ROOT not set. Use --bench-root, set the env var, or configure config.env")
def _read_text(path: Path) -> str:
return path.read_text(encoding="utf-8")
def _strip_empty(code: str) -> str:
return "\n".join(line for line in code.splitlines() if line.strip())
def _good_func(func: str) -> bool:
body = "{".join(func.split("{", 1)[1:]) if "{" in func else func
total = 0
for line in body.splitlines():
if len(line.strip()) >= 3:
total += 1
return 3 < total < 300
def _format_with_clang(func: str, style: str = "Google") -> Optional[str]:
if not func:
return None
cmd = ["clang-format", f"--style={style}"]
try:
proc = subprocess.run(
cmd,
input=func,
text=True,
capture_output=True,
check=True,
timeout=15,
)
return proc.stdout
except Exception as e:
print(e)
return None
def _hex_to_dec(text: str) -> str:
pattern = re.compile(r"\b(0x[0-9a-fA-F]+)([uUlL]{1,3})?\b")
def convert(match: re.Match[str]) -> str:
hex_part = match.group(1)
suffix = match.group(2) or ""
return str(int(hex_part, 16)) + suffix
return pattern.sub(convert, text)
def _remove_keywords(text: str) -> str:
patterns = [
r"\b__fastcall\b",
r"\b__cdecl\b",
r"\b__ptr32\b",
r"\b__noreturn\s+noreturn\b",
]
combined = re.compile("|".join(patterns))
return combined.sub("", text)
def _replace_typedefs(text: str) -> str:
for alias, original in TYPEDEF_MAP.items():
pattern = re.compile(rf"\b{re.escape(alias)}\b")
text = pattern.sub(original, text)
return text
def _remove_comments(text: str) -> str:
text = re.sub(r"/\*.*?\*/", "", text, flags=re.DOTALL)
text = re.sub(r"//.*?$", "", text, flags=re.MULTILINE)
return text
def _process_code(code_str: str) -> str:
code_str = _remove_comments(code_str)
code_str = _hex_to_dec(code_str)
code_str = _remove_keywords(code_str)
code_str = _replace_typedefs(code_str)
return code_str
def _normalize_pseudo(text: str) -> str:
processed = _process_code(text)
if not processed.strip():
return ""
formatted = _format_with_clang(processed)
if formatted is None:
return ""
cleaned = _strip_empty(formatted)
if not cleaned or not _good_func(cleaned):
return ""
return cleaned
def _strip_comments_and_strings(text: str) -> str:
result = list(text)
i = 0
length = len(text)
while i < length:
nxt = text[i : i + 2]
ch = text[i]
if nxt == "//":
end = text.find("\n", i)
if end == -1:
end = length
for j in range(i, end):
result[j] = " "
i = end
continue
if nxt == "/*":
end = text.find("*/", i + 2)
if end == -1:
end = length - 2
for j in range(i, end + 2):
result[j] = " "
i = end + 2
continue
if ch in {'"', "'"}:
quote = ch
result[i] = " "
i += 1
while i < length:
c = text[i]
result[i] = " "
if c == "\\":
i += 2
continue
if c == quote:
i += 1
break
i += 1
continue
i += 1
return "".join(result)
def _find_matching_brace(text: str, start_idx: int) -> int:
depth = 0
i = start_idx
length = len(text)
while i < length:
nxt = text[i : i + 2]
ch = text[i]
if nxt == "//":
i = text.find("\n", i)
if i == -1:
return length - 1
continue
if nxt == "/*":
i = text.find("*/", i + 2)
if i == -1:
return length - 1
i += 2
continue
if ch in {'"', "'"}:
quote = ch
i += 1
while i < length:
c = text[i]
if c == "\\":
i += 2
continue
if c == quote:
i += 1
break
i += 1
continue
if ch == "{":
depth += 1
elif ch == "}":
depth -= 1
if depth == 0:
return i
i += 1
return length - 1
def _extract_source_functions(path: Path, repo_root: Path) -> Dict[str, Dict[str, str]]:
text = _read_text(path)
sanitized = _strip_comments_and_strings(text)
pattern = re.compile(
r"(?P<prefix>^|[;\n}])(?P<signature>[^{;}]*?)\b(?P<name>[A-Za-z_][\w]*)\s*\([^;{}]*\)\s*\{",
re.MULTILINE,
)
funcs: Dict[str, Dict[str, str]] = {}
for match in pattern.finditer(sanitized):
name = match.group("name")
if name in FUNC_KEYWORDS:
continue
brace_idx = sanitized.find("{", match.start("signature"))
if brace_idx == -1:
continue
end_idx = _find_matching_brace(text, brace_idx)
if end_idx <= brace_idx:
continue
start_idx = match.start("signature")
content = text[start_idx : end_idx + 1].strip("\n") + "\n"
funcs.setdefault(
name,
{
"path": str(path.relative_to(repo_root)),
"function_name": name,
"content": content,
},
)
return funcs
def _parse_makefile(makefile: Path) -> List[Path]:
text = _read_text(makefile)
prog_match = re.search(r"^PROG\s*=\s*(\S+)", text, flags=re.MULTILINE)
if not prog_match:
raise RuntimeError(f"PROG not found in {makefile}")
prog = prog_match.group(1).strip()
objs_match = re.search(r"^LOCAL_OBJS\s*=\s*(.*)$", text, flags=re.MULTILINE)
obj_tokens: List[str] = []
if objs_match:
obj_tokens = [token for token in objs_match.group(1).split() if token]
if not obj_tokens:
obj_tokens = [f"{prog}.o"]
src_paths: List[Path] = []
for token in obj_tokens:
if not token.endswith(".o"):
continue
candidate = makefile.parent / token.replace(".o", ".c")
if candidate.exists():
src_paths.append(candidate)
if not src_paths:
fallback = makefile.parent / f"{prog}.c"
if fallback.exists():
src_paths.append(fallback)
return src_paths
def _collect_source_functions(bench_dir: Path, repo_root: Path) -> Dict[str, Dict[str, str]]:
makefile = bench_dir / "Makefile"
srcs = _parse_makefile(makefile)
func_map: Dict[str, Dict[str, str]] = {}
for src in srcs:
func_map.update(_extract_source_functions(src, repo_root))
return func_map
def _parse_pseudo(pseudo_path: Path, repo_root: Path) -> Dict[str, Dict[str, str]]:
text = _read_text(pseudo_path)
lines = text.splitlines()
pattern = re.compile(r"^/\*\s*(?P<name>[^@]+?)\s*@\s*(?P<addr>0x[0-9a-fA-F]+)\s*\*/$")
current: Optional[str] = None
current_addr: Optional[str] = None
buffer: List[str] = []
out: Dict[str, Dict[str, str]] = {}
for raw_line in lines:
line = raw_line.strip()
match = pattern.match(line)
if match:
if current and buffer:
content = "\n".join(buffer).strip("\n") + "\n"
out.setdefault(
current,
{
"path": str(pseudo_path.relative_to(repo_root)),
"function_name": current,
"address": current_addr,
"label": current,
"content": content,
},
)
current = match.group("name").strip()
current_addr = match.group("addr")
buffer = []
else:
if current is not None:
buffer.append(raw_line)
if current and buffer:
content = "\n".join(buffer).strip("\n") + "\n"
out.setdefault(
current,
{
"path": str(pseudo_path.relative_to(repo_root)),
"function_name": current,
"address": current_addr,
"label": current,
"content": content,
},
)
return out
def _clean_instruction(raw: str) -> Optional[str]:
stripped = raw.strip()
if not stripped:
return None
parts = raw.split("\t")
if len(parts) >= 3:
relevant = parts[2:]
elif len(parts) == 2:
relevant = parts[1:]
else:
relevant = [stripped]
instr = "\t".join(relevant)
instr = instr.split("#")[0].strip()
if not instr:
return None
if all(c in "0123456789abcdefABCDEF" for c in instr.replace(" ", "")):
return None
return instr
def _clean_asm_block(name: str, lines: List[str]) -> str:
cleaned = [f"<{name}>:"]
for raw in lines[1:]:
instr = _clean_instruction(raw)
if instr:
cleaned.append(instr)
return "\n".join(cleaned) + "\n"
def _parse_assembly(asm_path: Path) -> Dict[str, str]:
lines = _read_text(asm_path).splitlines()
header = re.compile(r"^\s*([0-9a-fA-F]+)\s+<([^>]+)>:\s*$")
current: Optional[str] = None
buffer: List[str] = []
result: Dict[str, str] = {}
for line in lines:
match = header.match(line)
if match:
if current and buffer:
result.setdefault(current, _clean_asm_block(current, buffer))
current = match.group(2)
buffer = [line]
else:
if current is not None:
buffer.append(line)
if current and buffer:
result.setdefault(current, _clean_asm_block(current, buffer))
return result
def _discover_binaries(explicit: Optional[List[str]], repo_root: Path) -> List[Path]:
if explicit:
binaries: List[Path] = []
for entry in explicit:
candidate = Path(entry)
if not candidate.is_absolute():
candidate = repo_root / candidate
if candidate.exists():
binaries.append(candidate)
return binaries
matches = []
for path in repo_root.rglob("*.O*"):
suffix = path.suffix.lower()
if suffix in {".o0", ".o1", ".o2", ".o3"}:
matches.append(path)
return sorted(matches)
def _build_map(binary: Path, repo_root: Path) -> None:
pseudo_path = Path(str(binary) + ".pseudo")
asm_path = Path(str(binary) + ".s")
if not pseudo_path.exists() or not asm_path.exists():
print(f"[skip] Missing pseudo or assembly for {binary.relative_to(repo_root)}")
return
bench_dir = binary.parent
source_funcs = _collect_source_functions(bench_dir, repo_root)
pseudo_funcs = _parse_pseudo(pseudo_path, repo_root)
asm_funcs = _parse_assembly(asm_path)
common = sorted(set(source_funcs) & set(pseudo_funcs) & set(asm_funcs))
if not common:
print(f"[warn] No overlapping functions for {binary.relative_to(repo_root)}")
return
output_path = Path(str(binary) + ".func_map.jsonl")
rel_binary = str(binary.relative_to(repo_root))
with output_path.open("w", encoding="utf-8") as handle:
for name in common:
pseudo_entry = pseudo_funcs[name]
pseudo_norm = _normalize_pseudo(pseudo_entry.get("content", ""))
record = {
"source": source_funcs[name],
"pseudo": pseudo_entry,
"pseudo_normalize": pseudo_norm,
"binary": rel_binary,
"assembly": asm_funcs[name],
}
handle.write(json.dumps(record, ensure_ascii=False))
handle.write("\n")
print(f"[ok] {output_path.relative_to(repo_root)} -> {len(common)} functions")
def main(argv: List[str]) -> int:
parser = argparse.ArgumentParser(description="Map source/pseudo/assembly per function")
parser.add_argument(
"--binary",
action="append",
help="Specific binary path (relative to repo) to process; can be repeated.",
)
parser.add_argument(
"--bench-root",
default=None,
help="Path to the Bringup-Bench repository root (default: from config.env).",
)
args = parser.parse_args(argv)
repo_root = _get_bench_root(args.bench_root)
binaries = _discover_binaries(args.binary, repo_root)
if not binaries:
print("No binaries found", file=sys.stderr)
return 1
for binary in binaries:
_build_map(binary, repo_root)
return 0
if __name__ == "__main__":
raise SystemExit(main(sys.argv[1:]))

View file

@ -0,0 +1,24 @@
#!/bin/bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
EVAL_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
# Load config; allow environment overrides
if [[ -f "${EVAL_ROOT}/config.env" ]]; then
set -a
source "${EVAL_ROOT}/config.env"
set +a
fi
BENCH_REPO_ROOT="${BENCH_REPO_ROOT:?Set BENCH_REPO_ROOT in config.env or environment}"
cd "${BENCH_REPO_ROOT}"
for opt in 0 1 2 3; do
echo "==> Building host binaries with -O${opt}"
make TARGET=host OPT_CFLAGS="-O${opt} -g" run-tests
find . -maxdepth 2 -type f -name '*.host' -execdir mv {} {}.O${opt} \;
done
echo "All host optimization builds complete."

View file

@ -0,0 +1,21 @@
#!/bin/bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
EVAL_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
# Load config; allow environment overrides
if [[ -f "${EVAL_ROOT}/config.env" ]]; then
set -a
source "${EVAL_ROOT}/config.env"
set +a
fi
BENCH_REPO_ROOT="${BENCH_REPO_ROOT:?Set BENCH_REPO_ROOT in config.env or environment}"
cd "${BENCH_REPO_ROOT}"
echo "==> Running make all-clean"
make all-clean
echo "All benchmarks cleaned."

View file

@ -0,0 +1,50 @@
#!/bin/bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
EVAL_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
# Load config; allow environment overrides
if [[ -f "${EVAL_ROOT}/config.env" ]]; then
set -a
source "${EVAL_ROOT}/config.env"
set +a
fi
BENCH_REPO_ROOT="${BENCH_REPO_ROOT:?Set BENCH_REPO_ROOT in config.env or environment}"
IDA_BIN="${IDA_BIN:-/home/bairidreamer/software/IDA-Pro/idat}"
DUMP_SCRIPT="${EVAL_ROOT}/scripts/dump_pseudo.py"
if [[ ! -x "${IDA_BIN}" ]]; then
echo "error: IDA binary not found or not executable at ${IDA_BIN}" >&2
exit 1
fi
if [[ ! -f "${DUMP_SCRIPT}" ]]; then
echo "error: dump script not found at ${DUMP_SCRIPT}" >&2
exit 1
fi
readarray -t BINARIES < <(
find "${BENCH_REPO_ROOT}" -mindepth 2 -maxdepth 2 -type f \
\( -iname '*.o0' -o -iname '*.o1' -o -iname '*.o2' -o -iname '*.o3' \) \
! -path "${BENCH_REPO_ROOT}/scripts/*" \
! -path "${BENCH_REPO_ROOT}/target/*" \
! -path "${BENCH_REPO_ROOT}/common/*" \
! -path "${BENCH_REPO_ROOT}/.git/*" \
| sort
)
if [[ ${#BINARIES[@]} -eq 0 ]]; then
echo "error: no O0/O1/O2/O3 binaries found under ${BENCH_REPO_ROOT}" >&2
exit 1
fi
for binary_path in "${BINARIES[@]}"; do
output_path="${binary_path}.pseudo"
echo "==> Decompiling ${binary_path#${BENCH_REPO_ROOT}/} -> ${output_path#${BENCH_REPO_ROOT}/}"
"${IDA_BIN}" -A "-S${DUMP_SCRIPT} ${output_path}" "${binary_path}"
done
echo "All pseudocode dumps are located alongside their binaries."

View file

@ -0,0 +1,66 @@
#!/bin/bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
EVAL_ROOT="$(cd "${SCRIPT_DIR}/.." && pwd)"
# Load config; allow environment overrides
if [[ -f "${EVAL_ROOT}/config.env" ]]; then
set -a
source "${EVAL_ROOT}/config.env"
set +a
fi
BENCH_REPO_ROOT="${BENCH_REPO_ROOT:?Set BENCH_REPO_ROOT in config.env or environment}"
OBJDUMP_BIN="${OBJDUMP:-objdump}"
NUM_JOBS="${JOBS:-}"
if ! command -v "${OBJDUMP_BIN}" >/dev/null 2>&1; then
echo "error: objdump binary '${OBJDUMP_BIN}' not found" >&2
exit 1
fi
if [[ -z "${NUM_JOBS}" ]]; then
if command -v nproc >/dev/null 2>&1; then
NUM_JOBS="$(nproc)"
elif [[ "$(uname)" == "Darwin" ]]; then
NUM_JOBS="$(sysctl -n hw.ncpu)"
else
NUM_JOBS=4
fi
fi
if ! [[ "${NUM_JOBS}" =~ ^[0-9]+$ ]] || (( NUM_JOBS <= 0 )); then
echo "error: invalid JOBS value '${NUM_JOBS}'" >&2
exit 1
fi
readarray -t BINARIES < <(
find "${BENCH_REPO_ROOT}" -mindepth 2 -maxdepth 2 -type f \
\( -iname '*.o0' -o -iname '*.o1' -o -iname '*.o2' -o -iname '*.o3' \) \
! -path "${BENCH_REPO_ROOT}/scripts/*" \
! -path "${BENCH_REPO_ROOT}/target/*" \
! -path "${BENCH_REPO_ROOT}/common/*" \
! -path "${BENCH_REPO_ROOT}/.git/*" \
| sort
)
if [[ ${#BINARIES[@]} -eq 0 ]]; then
echo "error: no O0/O1/O2/O3 binaries found under ${BENCH_REPO_ROOT}" >&2
exit 1
fi
export OBJDUMP_BIN BENCH_REPO_ROOT
printf '%s\0' "${BINARIES[@]}" | xargs -0 -n1 -P "${NUM_JOBS}" bash -c '
binary_path="$1"
bench_repo_root="${BENCH_REPO_ROOT}"
output_path="${binary_path}.s"
rel_in="${binary_path#"${bench_repo_root}/"}"
rel_out="${output_path#"${bench_repo_root}/"}"
echo "==> Disassembling ${rel_in} -> ${rel_out}"
"${OBJDUMP_BIN}" -d "${binary_path}" > "${output_path}"
' _
echo "Assembly listings written alongside each binary (extension .s)."

View file

@ -0,0 +1,62 @@
"""
Headless IDA/Hex-Rays helper to dump pseudocode for every discovered function.
Usage (from shell):
idat -A -S"scripts/dump_pseudo.py /path/to/output" /path/to/binary
"""
from __future__ import annotations
import os
import sys
import ida_auto
import ida_funcs
import ida_hexrays
import ida_pro
import idautils
import idc
def _get_output_path() -> str:
# IDA populates idc.ARGV with the script path at index 0 and the
# user-provided arguments afterwards.
if len(idc.ARGV) < 2:
raise RuntimeError("output path argument missing")
return os.path.abspath(idc.ARGV[1])
def main() -> None:
try:
output_path = _get_output_path()
except Exception as exc: # pragma: no cover - defensive
print(f"[dump_pseudo] {exc}", file=sys.stderr)
ida_pro.qexit(1)
return
ida_auto.auto_wait()
if not ida_hexrays.init_hexrays_plugin():
print("[dump_pseudo] Hex-Rays decompiler is unavailable", file=sys.stderr)
ida_pro.qexit(1)
return
os.makedirs(os.path.dirname(output_path), exist_ok=True)
with open(output_path, "w", encoding="utf-8") as handle:
for ea in idautils.Functions():
name = ida_funcs.get_func_name(ea)
handle.write(f"/* {name} @ 0x{ea:x} */\n")
try:
cfunc = ida_hexrays.decompile(ea)
except ida_hexrays.DecompilationFailure as exc:
handle.write(f"// decompilation failed: {exc}\n\n")
continue
handle.write(str(cfunc))
handle.write("\n\n")
ida_pro.qexit(0)
if __name__ == "__main__":
main()

View file

@ -0,0 +1,682 @@
#!/usr/bin/env python3
"""
Evaluate infer-out-model2 functions by patching benchmark sources inside an
isolated workspace, rebuilding, executing, and collecting structured logs for
every case listed in a JSONL file.
"""
from __future__ import annotations
import argparse
import json
import os
import re
import shutil
import subprocess
import sys
from concurrent.futures import ThreadPoolExecutor, as_completed
from dataclasses import asdict, dataclass, field
from datetime import datetime
from pathlib import Path
from typing import Dict, Iterable, List, Optional, Tuple
def _load_config_env() -> dict:
"""Load config.env from the eval project root."""
eval_root = Path(__file__).resolve().parents[1]
config_path = eval_root / "config.env"
config = {}
if config_path.exists():
for line in config_path.read_text().splitlines():
line = line.strip()
if not line or line.startswith("#"):
continue
if "=" in line:
key, _, value = line.partition("=")
config[key.strip()] = value.strip()
return config
def _get_bench_root(cli_value: str | None = None) -> Path:
"""Resolve the benchmark repo root from CLI arg, env var, or config.env."""
if cli_value:
return Path(cli_value).resolve()
env_val = os.environ.get("BENCH_REPO_ROOT")
if env_val:
return Path(env_val).resolve()
config = _load_config_env()
if "BENCH_REPO_ROOT" in config:
return Path(config["BENCH_REPO_ROOT"]).resolve()
sys.exit("error: BENCH_REPO_ROOT not set. Use --bench-root, set the env var, or configure config.env")
@dataclass
class CaseResult:
"""Container for the outcome of processing a single case."""
case_id: str
source_path: str
benchmark_dir: str
output_dir: str
workspace_dir: str = ""
artifact_dir: str = ""
replacement_applied: bool = False
build_status: str = "skipped" # succeeded | failed | skipped
test_status: str = "skipped"
notes: List[str] = field(default_factory=list)
errors: List[str] = field(default_factory=list)
log_files: Dict[str, str] = field(default_factory=dict)
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(
description="Replace functions with infer-out-model2 bodies, build, "
"execute, and record results without modifying the original benchmarks."
)
parser.add_argument(
"jsonl",
help="Path to the merged.*.jsonl file containing cases to evaluate.",
)
parser.add_argument(
"--bench-root",
default=None,
help="Path to the Bringup-Bench repository root (default: from config.env).",
)
parser.add_argument(
"--limit",
type=int,
default=None,
help="Optional limit on the number of cases to process.",
)
parser.add_argument(
"--target",
default="host",
help="Benchmark build target passed as TARGET=<target> (default: host).",
)
parser.add_argument(
"--report-dir",
default="reports/infer_out_eval",
help="Directory (relative to eval root) where aggregated reports are written.",
)
parser.add_argument(
"--workspace-root",
default="reports/infer_out_eval/workspaces",
help="Directory (relative to eval root) to host temporary build workspaces.",
)
parser.add_argument(
"--skip-clean",
action="store_true",
help="Skip running 'make clean' inside the workspace (useful when iterating).",
)
parser.add_argument(
"--keep-workspaces",
action="store_true",
help="Keep temporary workspaces after each case finishes (default removes them).",
)
parser.add_argument(
"--command-timeout",
type=int,
default=20,
help="Timeout (in seconds) for each make invocation; 0 disables the timeout.",
)
parser.add_argument(
"--jobs",
type=int,
default=96,
help="Number of cases to process in parallel (default: 1).",
)
return parser.parse_args()
def canonicalize(text: str) -> str:
"""Normalize newlines for reliable substring matching."""
return text.replace("\r\n", "\n")
def replace_function_body(
full_source: str, reference_function: str, inferred_function: str
) -> Tuple[str, bool]:
"""
Replace the exact reference_function text with inferred_function.
Returns the updated source and a boolean indicating if replacement happened.
"""
source_norm = canonicalize(full_source)
reference_norm = canonicalize(reference_function)
inferred_norm = canonicalize(inferred_function).rstrip() + "\n"
candidates = (
reference_norm,
reference_norm.rstrip() + "\n",
reference_norm.strip(),
)
for snippet in candidates:
start_idx = source_norm.find(snippet)
if start_idx == -1:
continue
end_idx = start_idx + len(snippet)
updated = source_norm[:start_idx] + inferred_norm + source_norm[end_idx:]
return updated, True
return full_source, False
def compose_case_id(case: Dict) -> str:
"""Build a stable identifier for a case."""
return (
f"{case['source']['path']}::{case['source']['function_name']}"
f"@{case['pseudo']['address']}"
)
def ensure_case_output_dir(
output_root: Path, pseudo_path_str: str, pseudo_address: str, result: CaseResult
) -> Path:
"""Create the per-case output directory, handling file path collisions."""
pseudo_rel = Path(pseudo_path_str)
base_dir = output_root / pseudo_rel
if base_dir.exists() and base_dir.is_file():
fallback = base_dir.parent / f"{base_dir.name}.infer_eval"
fallback.mkdir(parents=True, exist_ok=True)
result.notes.append(
f"pseudo.path '{pseudo_path_str}' is a file; using '{fallback.relative_to(output_root)}' for logs."
)
base_dir = fallback
else:
base_dir.mkdir(parents=True, exist_ok=True)
case_dir = base_dir / pseudo_address
if case_dir.exists():
shutil.rmtree(case_dir)
case_dir.mkdir(parents=True, exist_ok=True)
return case_dir
def run_command(
command: List[str],
cwd: Path,
log_handle,
step_name: str,
timeout: Optional[int],
) -> Optional[int]:
"""Run a command, capture stdout/stderr, and write everything to log_handle."""
log_handle.write(f"\n[{step_name}] $ {' '.join(command)}\n")
log_handle.flush()
try:
completed = subprocess.run(
command,
cwd=str(cwd),
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
text=True,
encoding="utf-8",
errors="replace",
timeout=timeout if timeout and timeout > 0 else None,
)
log_handle.write(completed.stdout)
log_handle.write(f"[{step_name}] exit code: {completed.returncode}\n")
log_handle.flush()
return completed.returncode
except subprocess.TimeoutExpired as exc:
output = exc.output or exc.stdout
if output:
if isinstance(output, bytes):
log_handle.write(output.decode("utf-8", "replace"))
else:
log_handle.write(output)
log_handle.write(
f"[{step_name}] timed out after {timeout} seconds; terminating process.\n"
)
log_handle.flush()
return None
def write_case_artifacts(
case_dir: Path,
case: Dict,
modified_source: str,
original_source: str,
) -> None:
"""Persist reusable artifacts for a case."""
(case_dir / "case.json").write_text(json.dumps(case, indent=2), encoding="utf-8")
(case_dir / "modified_source.c").write_text(modified_source, encoding="utf-8")
(case_dir / "original_source.c").write_text(original_source, encoding="utf-8")
(case_dir / "original_function.c").write_text(
canonicalize(case["source"]["content"]), encoding="utf-8"
)
(case_dir / "infer_function.c").write_text(
canonicalize(case["pseudo"]["content-fix"]), encoding="utf-8"
)
def sanitize_case_id(case_id: str) -> str:
"""Generate filesystem-safe case identifier."""
sanitized = re.sub(r"[^A-Za-z0-9._-]+", "_", case_id)
return sanitized.strip("_") or "case"
def copy_ignore_eval_dirs(_src: str, names: List[str]) -> List[str]:
"""Ignore helper to skip evaluation artifacts when copying benchmark dirs."""
ignored: List[str] = []
for name in names:
if name.endswith(".infer_eval"):
ignored.append(name)
return ignored
def prepare_workspace(
repo_root: Path,
benchmark_dir: Path,
workspace_root: Path,
case_id: str,
) -> Tuple[Path, Path]:
"""Clone the necessary subset of the repo into a temporary workspace."""
workspace_case_root = workspace_root / sanitize_case_id(case_id)
if workspace_case_root.exists():
shutil.rmtree(workspace_case_root)
workspace_repo_root = workspace_case_root / "repo"
workspace_repo_root.mkdir(parents=True, exist_ok=True)
shutil.copy2(repo_root / "Makefile", workspace_repo_root / "Makefile")
shutil.copytree(repo_root / "common", workspace_repo_root / "common", dirs_exist_ok=True)
shutil.copytree(repo_root / "target", workspace_repo_root / "target", dirs_exist_ok=True)
shutil.copytree(
benchmark_dir,
workspace_repo_root / benchmark_dir.name,
dirs_exist_ok=True,
ignore=copy_ignore_eval_dirs,
)
return workspace_case_root, workspace_repo_root
def relative_to_repo(path: Path, repo_root: Path) -> str:
"""Return a path relative to repo_root when possible."""
try:
return str(path.relative_to(repo_root))
except ValueError:
return str(path)
def init_case_result(case: Dict, repo_root: Path) -> CaseResult:
"""Create a CaseResult with basic metadata for the given case."""
source_rel = Path(case["source"]["path"])
benchmark_dir_path = (repo_root / source_rel).parent
try:
benchmark_rel = str(benchmark_dir_path.relative_to(repo_root))
except ValueError:
benchmark_rel = str(benchmark_dir_path)
return CaseResult(
case_id=compose_case_id(case),
source_path=str(source_rel),
benchmark_dir=benchmark_rel,
output_dir="",
)
def snapshot_artifacts(
case_dir: Path,
workspace_benchmark_dir: Path,
eval_root: Path,
result: CaseResult,
) -> None:
"""Copy the workspace benchmark directory into the case directory."""
artifacts_dir = case_dir / "artifacts"
if artifacts_dir.exists():
shutil.rmtree(artifacts_dir)
try:
shutil.copytree(workspace_benchmark_dir, artifacts_dir)
result.artifact_dir = relative_to_repo(artifacts_dir, eval_root)
except Exception as exc: # pragma: no cover - defensive
result.notes.append(f"Failed to copy artifacts: {exc}")
def process_case(
case: Dict,
args: argparse.Namespace,
repo_root: Path,
eval_root: Path,
) -> CaseResult:
"""Process a single JSONL entry."""
case_id = compose_case_id(case)
source_rel = Path(case["source"]["path"])
source_path = repo_root / source_rel
benchmark_dir = source_path.parent
result = init_case_result(case, repo_root)
if not source_path.exists():
result.errors.append(f"Source file '{source_rel}' does not exist.")
return result
try:
case_dir = ensure_case_output_dir(
eval_root, case["pseudo"]["path"], case["pseudo"]["address"], result
)
except Exception as exc: # pragma: no cover - defensive
result.errors.append(f"Failed to prepare case directory: {exc}")
return result
result.output_dir = str(case_dir.relative_to(eval_root))
full_source_text = source_path.read_text(encoding="utf-8")
updated_source, replaced = replace_function_body(
full_source_text,
case["source"]["content"],
case["pseudo"]["content-fix"],
)
if not replaced:
result.errors.append(
"Could not locate the original function snippet in source file."
)
return result
result.replacement_applied = True
write_case_artifacts(case_dir, case, updated_source, full_source_text)
workspace_root = Path(args.workspace_root)
if not workspace_root.is_absolute():
workspace_root = eval_root / workspace_root
workspace_root.mkdir(parents=True, exist_ok=True)
workspace_case_root: Optional[Path] = None
try:
workspace_case_root, workspace_repo_root = prepare_workspace(
repo_root, benchmark_dir, workspace_root, case_id
)
workspace_benchmark_dir = workspace_repo_root / benchmark_dir.name
artifacts_captured = False
def capture_artifacts() -> None:
nonlocal artifacts_captured
if artifacts_captured:
return
snapshot_artifacts(case_dir, workspace_benchmark_dir, eval_root, result)
artifacts_captured = True
workspace_source_path = workspace_repo_root / source_rel
workspace_source_path.write_text(updated_source, encoding="utf-8")
result.workspace_dir = relative_to_repo(workspace_case_root, eval_root)
log_path = case_dir / "case.log"
with log_path.open("w", encoding="utf-8") as log_handle:
log_handle.write(f"Case: {case_id}\n")
log_handle.write(f"Workspace: {workspace_case_root}\n")
log_handle.write(f"Benchmark copy: {workspace_benchmark_dir}\n")
log_handle.write(f"Target: {args.target}\n")
log_handle.flush()
if not args.skip_clean:
clean_rc = run_command(
["make", f"TARGET={args.target}", "clean"],
workspace_benchmark_dir,
log_handle,
"clean",
args.command_timeout,
)
if clean_rc is None:
result.errors.append(
f"'make clean' timed out after {args.command_timeout} seconds."
)
capture_artifacts()
result.log_files["case"] = relative_to_repo(log_path, eval_root)
return result
if clean_rc != 0:
result.build_status = "failed"
result.errors.append("make clean failed.")
capture_artifacts()
result.log_files["case"] = relative_to_repo(log_path, eval_root)
return result
else:
log_handle.write("Skipping 'make clean' per --skip-clean flag.\n")
build_rc = run_command(
["make", f"TARGET={args.target}", "build"],
workspace_benchmark_dir,
log_handle,
"build",
args.command_timeout,
)
result.log_files["case"] = relative_to_repo(log_path, eval_root)
if build_rc is None:
result.build_status = "failed"
result.errors.append(
f"'make build' timed out after {args.command_timeout} seconds."
)
capture_artifacts()
log_handle.write("Skipping test because build timed out.\n")
return result
if build_rc == 0:
result.build_status = "succeeded"
else:
result.build_status = "failed"
result.errors.append("make build failed.")
log_handle.write("Skipping test because build failed.\n")
capture_artifacts()
return result
test_rc = run_command(
["make", f"TARGET={args.target}", "test"],
workspace_benchmark_dir,
log_handle,
"test",
args.command_timeout,
)
if test_rc is None:
result.test_status = "failed"
result.errors.append(
f"'make test' timed out after {args.command_timeout} seconds."
)
elif test_rc == 0:
result.test_status = "succeeded"
else:
result.test_status = "failed"
result.errors.append("make test failed.")
capture_artifacts()
finally:
if (
workspace_case_root
and workspace_case_root.exists()
and not args.keep_workspaces
):
shutil.rmtree(workspace_case_root, ignore_errors=True)
return result
def collect_cases(jsonl_path: Path, limit: Optional[int]) -> Iterable[Dict]:
"""Yield cases from jsonl file respecting the optional limit."""
processed = 0
with jsonl_path.open("r", encoding="utf-8") as handle:
for line in handle:
stripped = line.strip()
if not stripped:
continue
yield json.loads(stripped)
processed += 1
if limit is not None and processed >= limit:
break
def compute_summary(results: List[CaseResult]) -> Dict:
"""Aggregate statistics over all case results."""
total = len(results)
replacements = sum(1 for r in results if r.replacement_applied)
build_success = sum(1 for r in results if r.build_status == "succeeded")
test_success = sum(1 for r in results if r.test_status == "succeeded")
def frac(passed: int, denom: int) -> float:
return round(passed / denom, 4) if denom else 0.0
per_benchmark: Dict[str, Dict[str, float]] = {}
for r in results:
stats = per_benchmark.setdefault(
r.benchmark_dir,
{
"cases": 0,
"replacements": 0,
"build_success": 0,
"test_success": 0,
},
)
stats["cases"] += 1
if r.replacement_applied:
stats["replacements"] += 1
if r.build_status == "succeeded":
stats["build_success"] += 1
if r.test_status == "succeeded":
stats["test_success"] += 1
for stats in per_benchmark.values():
stats["replacement_rate"] = frac(stats["replacements"], stats["cases"])
stats["build_rate"] = frac(stats["build_success"], stats["cases"])
stats["test_rate"] = frac(stats["test_success"], stats["cases"])
summary = {
"total_cases": total,
"replacement_success_count": replacements,
"replacement_success_rate": frac(replacements, total),
"compilable_count": build_success,
"compilable_rate": frac(build_success, total),
"executable_count": test_success,
"executable_rate": frac(test_success, total),
"compilation_failures": [
r.case_id for r in results if r.build_status == "failed"
],
"execution_failures": [
r.case_id
for r in results
if r.build_status == "succeeded" and r.test_status == "failed"
],
"cases": [asdict(r) for r in results],
"by_benchmark": per_benchmark,
}
return summary
def write_summary(
eval_root: Path,
args: argparse.Namespace,
jsonl_path: Path,
summary: Dict,
) -> Tuple[Path, Path]:
"""Write JSON and Markdown summary reports."""
report_root = eval_root / args.report_dir
report_root.mkdir(parents=True, exist_ok=True)
timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
base_name = f"{jsonl_path.stem}-{args.target}"
json_report = report_root / f"{base_name}-{timestamp}.json"
markdown_report = report_root / f"{base_name}-{timestamp}.md"
json_report.write_text(json.dumps(summary, indent=2), encoding="utf-8")
benchmark_lines = [
"| Benchmark | Cases | Replacement% | Build% | Exec% |",
"| --- | --- | --- | --- | --- |",
]
for bench, stats in sorted(summary["by_benchmark"].items()):
benchmark_lines.append(
f"| {bench} | {stats['cases']} | "
f"{stats['replacement_rate']*100:.2f}% | "
f"{stats['build_rate']*100:.2f}% | "
f"{stats['test_rate']*100:.2f}% |"
)
if len(benchmark_lines) == 2:
benchmark_lines.append("| (none) | 0 | 0.00% | 0.00% | 0.00% |")
compilation_items = summary["compilation_failures"] or ["None"]
execution_items = summary["execution_failures"] or ["None"]
relative_jsonl = relative_to_repo(jsonl_path, eval_root)
lines = [
f"# Infer-Out Model 2 Evaluation ({base_name})",
"",
f"- Timestamp: {timestamp}",
f"- Source JSONL: {relative_jsonl}",
f"- Target: {args.target}",
f"- Total cases: {summary['total_cases']}",
f"- Replacement success: {summary['replacement_success_count']} "
f"({summary['replacement_success_rate']*100:.2f}%)",
f"- Compilable: {summary['compilable_count']} "
f"({summary['compilable_rate']*100:.2f}%)",
f"- Executable: {summary['executable_count']} "
f"({summary['executable_rate']*100:.2f}%)",
"",
"## Benchmark Breakdown",
*benchmark_lines,
"",
"## Compilation Failures",
]
lines.extend(f"- {cid}" for cid in compilation_items)
lines.append("")
lines.append("## Execution Failures")
lines.extend(f"- {cid}" for cid in execution_items)
markdown_report.write_text("\n".join(lines), encoding="utf-8")
return json_report, markdown_report
def main() -> int:
args = parse_args()
eval_root = Path(__file__).resolve().parents[1]
repo_root = _get_bench_root(args.bench_root)
jsonl_path = Path(args.jsonl)
if not jsonl_path.is_absolute():
jsonl_path = eval_root / jsonl_path
if not jsonl_path.exists():
print(f"JSONL file '{jsonl_path}' not found.", file=sys.stderr)
return 1
cases = list(collect_cases(jsonl_path, args.limit))
if not cases:
print("No cases to process.")
return 0
results: List[Optional[CaseResult]] = [None] * len(cases)
def record_result(idx: int, case_result: CaseResult) -> None:
results[idx] = case_result
status = (
f"build={case_result.build_status}, test={case_result.test_status}"
if case_result.replacement_applied
else "replacement_failed"
)
print(f"[{idx + 1}] {case_result.case_id}: {status}")
if args.jobs <= 1:
for idx, case in enumerate(cases):
case_result = process_case(case, args, repo_root, eval_root)
record_result(idx, case_result)
else:
with ThreadPoolExecutor(max_workers=args.jobs) as executor:
future_to_idx = {
executor.submit(process_case, case, args, repo_root, eval_root): idx
for idx, case in enumerate(cases)
}
for future in as_completed(future_to_idx):
idx = future_to_idx[future]
try:
case_result = future.result()
except Exception as exc: # pragma: no cover - defensive
case_result = init_case_result(cases[idx], repo_root)
case_result.errors.append(f"Unhandled exception: {exc}")
record_result(idx, case_result)
final_results = [res for res in results if res is not None]
summary = compute_summary(final_results)
json_report, markdown_report = write_summary(eval_root, args, jsonl_path, summary)
print(f"Wrote summary reports:\n - {json_report}\n - {markdown_report}")
return 0
if __name__ == "__main__":
raise SystemExit(main())