Commit graph

255 commits

Author SHA1 Message Date
polfg
9c5ea2f2bc Reconstruct 3.11 while-loops (bottom-test optimization)
Python 3.11 compiles 'while cond:' with a bottom test: an entry guard
(eval cond; POP_JUMP_FORWARD_IF_FALSE end) before the body, and a
back-edge (eval cond; POP_JUMP_BACKWARD_IF_TRUE loop_start) at the end.
pycdc rendered both halves as separate 'if' blocks, so EVERY while loop
came out as 'if cond: ... if not cond: pass' and never looped.

- New ScanWhileLoops pre-pass pairs each conditional backward jump with
  the forward guard immediately preceding its target (guard skipping to
  the instruction after the back-edge) to identify genuine while loops.
- At the guard, open a BLK_WHILE with the condition instead of an if.
- At the back-edge, discard the duplicated condition and close the loop,
  but only when a BLK_WHILE is actually open (guard against a
  misidentified back-edge collapsing the block stack and crashing).

Fixes while loops across all files (including nested loops and
continue). decompilation target: 237/239 files, corpus 41/95 (+1: signature; 0 regressions).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 12:27:24 +02:00
polfg
7a0109a6f6 Fix except-as-e handler after return-in-if inside try
Two combined fixes for the 'return inside if inside try / except as e'
pattern:

1. The return-in-if skip logic could consume the PUSH_EXC_INFO that
   immediately follows a return inside an if within a try. Dropping it
   left the handler without its exception sentinel, so the 'as e'
   binding captured a garbage stack value and the handler was mis-nested
   as a statement in the try body. Never skip PUSH_EXC_INFO.

2. Suppress the compiler cleanup 'e = None' when the store value is an
   explicit None constant (LOAD_CONST None; STORE), not only the NULL
   placeholder form. With the binding now correct, this removes the
   spurious 'e = None; del e' tail. decompilation target: 236/239 files, corpus 41/95 (+1: utilities; 0 regressions).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 12:15:57 +02:00
polfg
96a672e587 Fix for-loop not closing in 3.11 (return/code after loop absorbed into body)
In 3.11 there is no POP_BLOCK, so a for-loop only closed via the
JUMP_BACKWARD that ends its body. The is_jump_to_start test compared
the RELATIVE jump operand against the loop's ABSOLUTE start position,
so it almost never matched and the loop never closed: any statement
after the loop (and even except handlers / the function's return) was
absorbed into the loop body, producing wrong (often still-compiling)
output and breaking nested try/except indentation.

- Compute the real jump target (pos - offs) in 3.10+ and compare to
  the loop start.
- Distinguish the implicit loop-iteration back-jump (pos == block end,
  closes the loop) from an explicit  (earlier, emits continue).
- Guard the BLK_ELSE branch's stack_hist.top() against an empty stack
  (a for nested in a while/else could otherwise crash, e.g. csv).

Fixes the core 'code after for loop' defect across all files. decompilation target: 235/239 files, corpus 41/95 (+2: realization, gettext; 0 regressions).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 12:06:54 +02:00
polfg
ba35489dd8 Fix std::bad_cast crash on return-in-if followed by comprehension
In 3.11 a return inside an if/else may fall straight into a sibling
branch. The old code unconditionally consumed the next instruction to
skip a redundant jump; when that instruction was the LOAD_CONST of a
code object feeding a MAKE_FUNCTION (e.g. a list comprehension after
'if not x: return []'), dropping it left MAKE_FUNCTION without its
operand and crashed with std::bad_cast.

Now peek the next instruction and keep it only when it is a LOAD_CONST
of a code object; otherwise preserve the original skip behavior.
Added PycBuffer::pos()/setPos() for safe peeking. decompilation target: 234/239 files, corpus 40/95 (+1, 0 regressions).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 11:49:55 +02:00
polfg
761ba7c44e Strip any spurious module-level return (not just 'return None')
Reconstruction artifacts at module scope can be 'return <expr>' too (invalid
Python). Drop any trailing plain return (rettype RETURN) from module blocks,
not only None returns, since none are legitimate at module scope.

Harness: +1, 0 regressions (decompilation target: 232→233).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 11:20:16 +02:00
polfg
9a83ef3fbf Strip spurious module-level returns from all nested blocks
A 'return' is invalid at module scope, but the implicit 'return None' (and,
with nested ifs, copies of it) can land inside module-level blocks. Recurse
into every nested block (not only the last) and strip each block's trailing
bare return. Only applied to the <module> code object, so it never removes a
real return from a function/class body.

Harness: +4, 0 regressions (corpus 37->40, decompilation target: 231→232).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 11:17:25 +02:00
polfg
63eee37215 Fix multiple except-clause boundary (clause that raises / at function end)
When refining the initial type-less except handler, the clause end was taken
from the whole handler region (curblock->end()) instead of the dispatch jump
target. A clause that does not fall through (e.g. ends in 'raise') then never
closed, nesting the following 'except' inside it. Use the dispatch jump target
(offs) as the clause end whenever it is a valid forward offset.

Harness: +2, 0 regressions (decompilation target: 229→231).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 08:47:11 +02:00
polfg
aabdba96d4 Close with/finally blocks before processing exception-table entries
When a with-statement (or try/finally) is nested inside an enclosing try, the
implicit __exit__/finally cleanup region is re-protected by the outer try's
exception-table entry. Processing that entry reopened a spurious try over the
cleanup, leaking 'None(None, None)' / 'if not None:' into the body. Move the
with/finally block close and the cleanup-skip to the top of the loop, before
exception-entry processing, so the block closes and the region is skipped first.

Harness: +3, 0 regressions (corpus 36->37, decompilation target: 227→229).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 08:40:25 +02:00
polfg
efb56a045c Inline comprehensions/genexprs with multiple (nested) filters
A comprehension with several 'if' clauses nests the filters as IF blocks inside
the for-loop. findCompYield now descends through nested IF blocks, combining
their conditions with 'and' and honoring negated filters (POP_JUMP_*_IF_TRUE),
so '(x for x in y if a if not b)' reconstructs as 'x for x in y if a and not b'.

Harness: 0 gate change (affected stdlib files have other errors) but fixes
multi-filter comprehensions generally.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 08:15:52 +02:00
polfg
2222d2be3e Inline filtered list comprehensions
A list comprehension with an 'if' filter performs LIST_APPEND inside the filter
block, so the normal comprehension build path (which expects the FOR block) is
missed and produced a '[][x]' hack. In a <listcomp> code object, emit the
appended value as a yield-style marker instead, so SynthGenexpr reconstructs the
comprehension together with its filter condition.

Harness: +3, 0 regressions (corpus 33->36).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 20:05:37 +02:00
polfg
4260d9078f Reconstruct class keyword arguments (metaclass=) for Python 3.11
__build_class__ keyword arguments (e.g. metaclass=) arrive via a 3.11 KW_NAMES
map at TOS, which broke the build-class detection (consumed as a base / caused
fall-through to a bare __build_class__ call printed as <NODE:27>). Capture the
KW_NAMES map before scanning bases, store it as the class call's kwparams, and
emit 'class X(bases, kw=val):'.

Harness: +3, 0 regressions (corpus 31->33, decompilation target: 226→227).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 11:26:42 +02:00
polfg
0c79dbf063 Inline Python 3.11 generator expressions
A generator expression compiles to a <genexpr> code object that yields instead
of building a comprehension node. SynthGenexpr reconstructs it from the
decompiled for-loop: the FOR block becomes the generator, the yielded value the
result, and a wrapping 'if' the filter. Rendered as an equivalent comprehension
with the real iterable substituted for the implicit '.0'.

Harness: +9, 0 regressions (corpus 22->31).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 17:31:29 +02:00
polfg
3cea1b75c2 Fix function signature parameter order (*args before keyword-only)
The def/lambda signature printed positional, keyword-only(*), *args, **kwargs,
producing invalid 'def f(*, kw, *args)'. Python order is positional, *args,
keyword-only, **kwargs. Index locals explicitly (they are stored positional,
keyword-only, *args, **kwargs) and emit in source order.

Harness: 0 gate change (affected files have other errors) but fixes incorrect
signatures across many files.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 17:18:41 +02:00
polfg
90c6906505 Reconstruct Python 3.11 try/finally
A finally compiles to: try body -> finally body (normal copy) -> JUMP over an
exception handler that duplicates the finally body and re-raises. A pre-pass
(ScanTryFinally) recognizes this from the exception table, distinguishing
finally from bare/typed except by the handler shape after PUSH_EXC_INFO (no
POP_TOP, no CHECK_EXC_MATCH). The try-body entry opens a CONTAINER carrying the
finally end + BLK_TRY ending at the real body end; the try close opens a
BLK_FINALLY for the normal copy; the duplicate exception handler region is
skipped.

Harness: +2, 0 regressions (corpus 20->22, decompilation target 226).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 17:15:29 +02:00
polfg
2f0e6180b4 Reconstruct Python 3.11 with-statement (common mid-block shape)
3.11 'with' compiles to: body -> implicit __exit__(None,None,None) -> JUMP over
an exception-cleanup handler -> resume. A pre-pass (ScanWithBlocks) recognizes
this shape from the exception table: it records the body end and the resume
offset, verifying the handler begins with PUSH_EXC_INFO; WITH_EXCEPT_START and
that the normal-exit jump skips over it. BEFORE_WITH then opens an ASTWithBlock
(the context manager stays on the stack for the STORE/POP_TOP -> expr + 'as'),
and the [bodyEnd, resume) cleanup region is skipped during decompilation.
With-statements without this clean shape are left unhandled (no regression).

Harness: +2, 0 regressions (corpus 19->20, decompilation target: 225→226).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 17:02:59 +02:00
polfg
27cc6c2a74 Reconstruct 'except ... as e' for Python 3.11 + dict/set comprehension inlining
- PUSH_EXC_INFO pushes an exception sentinel; CHECK_EXC_MATCH keeps it so the
  'as <var>' STORE can bind it; POP_TOP discards it for bare handlers.
- Emit 'except <type> as <var>:' and suppress the compiler cleanup
  (<var> = None; del <var>).
- WITH_EXCEPT_START no longer aliases SETUP_WITH; it consumes the sentinel so
  the with-cleanup never leaks it.
- Detect <setcomp>/<dictcomp> (not just <listcomp>) as comprehensions so
  SET_ADD/MAP_ADD reconstruct them for inlining.

Harness: +1, 0 regressions (foundation for multi-except/finally/with).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 12:00:11 +02:00
polfg
b5bbd70989 Render 'raise X from Y' for Python 3 (was Python-2 'raise X, Y')
NODE_RAISE always joined params with commas (Python 2 syntax). For Python 3,
two params is 'raise X from Y'. Harness: +1, 0 regressions; also fixes the
common 'raise X from None' idiom across many files.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 11:50:29 +02:00
polfg
717b3d7d9c Fix SWAP_A as a real stack swap (chained comparisons, starred unpack)
SWAP_A was modelled only as tuple-unpack construction, corrupting the stack
for the 3.11 chained-comparison idiom (SWAP n; COPY n). Implement it as a
genuine stack swap via FastStack::swap.

Harness: +17 files (decompilation target: 212→224, stdlib corpus 13->18), 0 regressions.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 11:43:44 +02:00
polfg
7baa5df5c3 Python 3.11 decompilation fixes
- MAKE_FUNCTION 3.6+ flag bitmask (defaults/kwdefaults/annotations/closure)
- decorator reconstruction for functions and classes (3.11 no-PUSH_NULL form)
- consume NULL before LOAD_BUILD_CLASS so decorated classes reconstruct
- strip 3.11 class artifacts (__classcell__ / return __class__)
- try/except inside loops (exception-table stack_depth > 0)
- inline list comprehensions called as code objects (substitute .0)
- guard extra RETURN_VALUE bytecode read against EOF (fixes empty bodies)
- strip implicit module-level 'return None'
- add opcodes: MAKE_CELL, COPY_FREE_VARS, LOAD_ASSERTION_ERROR, DICT_MERGE/UPDATE,
  MAP_ADD, LIST_TO_TUPLE, RETURN_GENERATOR, POP_JUMP_FORWARD/BACKWARD_IF_NONE/etc.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 23:32:11 +02:00
daiche330-png
413fbd83c9 Support Python 3.11 exception table try/except flow 2026-04-06 09:30:16 +09:00
daiche330-png
0b1f7cf070 Handle Python 3.11 exception opcodes in AST builder 2026-04-05 22:13:27 +09:00
Michael Hansen
577720302e Add basic protection aginst circular references in pycdas and pycdc.
This fixes the last case of fuzzer errors detected by #572.
2025-08-28 16:42:03 -07:00
Michael Hansen
0e7be40367 Add some extra guards against null dereference and empty std::stack pops
Fixes segfault cases of #572
2025-08-28 15:36:35 -07:00
Sahil Jain
8b0ea9450e
Fix RAISE_VARARGS bug 2025-07-04 19:16:04 +05:30
Michael Hansen
e64ea4bdec
Merge branch 'master' into new-opcodes 2025-07-02 07:56:28 -07:00
Sahil Jain
6e0089e01c
Update 2025-07-01 22:52:59 +05:30
Sahil Jain
5fe61462a2
Support COPY opcde 2025-07-01 09:25:49 +05:30
Sahil Jain
ad5f39db56
Support SLICE opcodes 2025-07-01 09:24:53 +05:30
Sahil Jain
97ec04789d
Add JUMP_BACKWARD + CACHE comments 2025-07-01 09:22:57 +05:30
Sahil Jain
6dae4e801f
Remove COPY opcode 2025-07-01 09:22:57 +05:30
Sahil Jain
040732920b
Add comment 2025-07-01 09:22:57 +05:30
Sahil Jain
a93fd14672
Add new loop tests 2025-07-01 09:22:57 +05:30
Sahil Jain
a4a6a24f3e
Support END_FOR opcode 2025-07-01 09:22:56 +05:30
samy kamkar
bbc19885f4 show INVALID opcode properly if < 0 2024-10-12 13:45:16 -07:00
Michael Hansen
b939aeb87c Update operand documentation for new opcodes and oparg changes.
Also extends the disassembly oparg decoding for new 3.13 additions.
2024-08-07 15:44:36 -07:00
Michael Hansen
48d1bfa59f Fix a null dereference.
Fixes #486
2024-08-06 08:21:12 -07:00
Michael Hansen
0b45b5fa07 Fix FORMAT_VALUE for values that have both a conversion and a format_spec.
Also output the conversion and flags in disassembly.
2024-08-01 13:28:43 -07:00
easyz
6ad3ceb67e
Add support for swap bytecode and simple WITH_EXCEPT_START bytecode support. (#488)
* Modify .gitignore

* Added support for SWAP and WITH_EXCEPT_START, WITH_EXCEPT_START is simply added on top of SETUP_WITH_A so that it works properly.

* Resolve the warning about comparing size_t and int.

* Revert "Resolve the warning about comparing size_t and int."

This reverts commit 54dfe36629.

* Reapply "Resolve the warning about comparing size_t and int."

This reverts commit d21d1681ed.

* Modify decompyle_test.sh

* Modify .gitignore

* Fix the logic error by placing the assignment inside the tuple

* Re-adding test files

* Fixing redundant brackets

* Add support for swap bytecode and simple WITH_EXCEPT_START bytecode support.

* Clean up some formatting issues

---------

Co-authored-by: Michael Hansen <zrax0111@gmail.com>
2024-06-23 11:59:30 -07:00
Alex
8367a8e0ab Return back a fix for Centos6/7 compilation issues not related to shadow ones 2024-03-12 22:58:43 +02:00
Alex
68a697dfc1 Revert "Fix for Centos6/7 compilation issues"
This reverts commit f80b662f77.
2024-03-12 22:38:03 +02:00
Alex
f80b662f77 Fix for Centos6/7 compilation issues 2024-03-08 14:27:36 +02:00
Michael Hansen
1f30136e21 Merge branch 'master' of github.com:ncaklovic/pycdc 2024-02-28 15:43:58 -08:00
Michael Hansen
8e48bf2194
Merge pull request #462 from greenozon/master
Aligning some opcodes for Python 3.11, 3.12: LOAD_GLOBAL, LOAD_ATTR
2024-02-28 15:40:02 -08:00
ncaklovic
32c1ca10bb
Update ASTree.cpp
Co-authored-by: Michael Hansen <zrax0111@gmail.com>
2024-02-28 12:34:32 +01:00
Alex
035f963f3d Aligning some opcodes for Python 3.11, 3.12: LOAD_GLOBAL, LOAD_ATTR 2024-02-27 23:12:03 +02:00
MrDakik
e8be65b2f3 add support for WITH_CLEANUP_START,WITH_CLEANUP_FINISH 2024-02-27 10:54:07 +02:00
MrDakik
00d4b02d1e Added support for LOAD_CLASSDEREF
The opcode itself is exactly the same as `LOAD_DEREF`
1) The problem is when the class is a closure (e.g. defined inside a function body) then there is a `BUILD_TUPLE` after the `LOAD_BUILD_CLASS` which makes problems.
2) There is another problem which makes the `code->name()` of the class to be part of the function locals. (e.g. `func.<locals>.my_class` instead of `my_class`) which makes the check `srcString->isEqual(code->name().cast<PycObject>())` be invalid.
2024-02-26 16:52:31 +02:00
Michael Hansen
787090e0a5 Merge github.com:kako57/pycdc 2024-02-14 21:31:17 -08:00
Nenad Čaklović
5f225caf52 LOAD_ATTR operand changes in 3.12 2024-01-05 21:32:53 +01:00
Nenad Čaklović
830dd13228 COMPARE_OP operand changes in 3.12 2024-01-04 23:49:07 +01:00