tinygrad/extra/assembly/amd
George Hotz 50554115ee
fix VALU_SALU / IMMED_MASK and improve amd_asm_matmul (#14196)
* fix VALU_SALU / IMMED_MASK and improve amd_asm_matmul

* immed

* wave override

* restore ALT

* advance sgprs correctly

* no helpers

* decrease to 192 VGPRs
2026-01-17 11:58:34 +09:00
..
autogen assembly/amd: amdxml cleanups, remove broken SDWA/DPP, merge in pdf.py (#14154) 2026-01-15 09:23:19 +09:00
test assembly/amd: remove IMG instruction support and asm.py (#14163) 2026-01-17 06:21:50 +09:00
amdxml.py assembly/amd: amdxml cleanups, remove broken SDWA/DPP, merge in pdf.py (#14154) 2026-01-15 09:23:19 +09:00
decode.py assembly/amd: remove IMG instruction support and asm.py (#14163) 2026-01-17 06:21:50 +09:00
disasm.py assembly/amd: remove IMG instruction support and asm.py (#14163) 2026-01-17 06:21:50 +09:00
dsl.py assembly/amd: remove IMG instruction support and asm.py (#14163) 2026-01-17 06:21:50 +09:00
emu.py assembly/amd: refactor to use op_bits/op_regs (#14156) 2026-01-15 11:20:21 +09:00
pcode.py assembly/amd: clean up dsl and make type verification strict (#14102) 2026-01-13 08:52:16 +09:00
README Fix spelling errors in README for AMD assembly (#13975) 2026-01-02 10:15:20 -05:00
sqtt.py fix VALU_SALU / IMMED_MASK and improve amd_asm_matmul (#14196) 2026-01-17 11:58:34 +09:00

An integrated environment for AMD GPU assembly and emulation

Test with `PYTHONPATH="." pytest -n12 extra/assembly/amd/`
`AMD_LLVM=1 PYTHONPATH="." pytest -n12 extra/assembly/amd/`

* pdf.py -- extract assembly format + instruction pseudocode from AMD PDF
* dsl.py -- helpers for the autogen instruction classes in `__init__.py`. should be standalone with init
* pcode.py -- pseudocode execution environment. pseudocode should be transformed as little as possible.
* asm.py -- an asm/disasm function to transform to and from AMD assembly syntax
* emu.py -- an emulator for RDNA that runs in tinygrad with `AMD=1 MOCKGPU=1 PYTHON_REMU=1`

The code should be as readable and deduplicated as possible. asm and emu shouldn't be required for dsl.

The autogen folder is autogenerated from the AMD PDFs with `python3 -m extra.assembly.amd.pdf --arch all`

test_emu.py has a good set of instruction tests for the emulation, with USE_HW=1 it will compare to real hardware.
Whenever an instruction is fixed, regression tests should be added here and confirmed with real hardware.

test_llvm.py tests asm/disasm on the LLVM tests, confirming it behaves the same as LLVM.

tinygrad's dtype tests should pass with and without LLVM. they run in about 12 seconds.

`PYTHONPATH="." AMD=1 PYTHON_REMU=1 MOCKGPU=1 AMD_LLVM=0 pytest -n=12 test/test_dtype_alu.py test/test_dtype.py`
`PYTHONPATH="." AMD=1 PYTHON_REMU=1 MOCKGPU=1 AMD_LLVM=1 pytest -n=12 test/test_dtype_alu.py test/test_dtype.py`

The ops tests also pass, but they are very slow, so you should run them one at a time.

`SKIP_SLOW_TEST=1 PYTHONPATH="." AMD=1 PYTHON_REMU=1 MOCKGPU=1 AMD_LLVM=0 pytest -n=12 test/test_ops.py`
`SKIP_SLOW_TEST=1 PYTHONPATH="." AMD=1 PYTHON_REMU=1 MOCKGPU=1 AMD_LLVM=1 pytest -n=12 test/test_ops.py`

When something is caught by main tinygrad tests, a local regression test should be added to `extra/assembly/amd/test`.
While working with tinygrad, you can dump the assembly with `DEBUG=7`. These tests all pass on real hardware
If a test is failing with `AMD=1 PYTHON_REMU=1 MOCKGPU=1` it's because an instruction is emulated incorrectly.
You can test without `MOCKGPU=1` to test on real hardware, if it works on real hardware there's a bug in the emulator.
IMPORTANT: if a test is failing in the emulator, it's an instruction bug. Use DEBUG=7, get the instructions, and debug.

Currently, only RDNA3 is well supported, but when finished, this will support RDNA3+RDNA4+CDNA in ~2000 lines.
Get line count with `cloc --by-file extra/assembly/amd/*.py`