tinygrad/extra
Christopher Milan 0aabc1e938
Mesa NIR backend (NAK/LLVMpipe) (#12089)
* nak works

* TestOps::test_add works

* testop has no crashes

* fix bool casts

* fix typo

* add disassemble

* RANGE and locals/regs

* simplify NAKCompiler

* disass cleanup

* cleanup nir codegen

* almost all tests passing

* cleanup notes in extra/

* old notes

* only import nak if NIR=1

* fix new SPECIAL syntax

* fix local/shared memory

* more tests passing

* add DEFINE_VAR support

* llvmpipe kinda works

* diskcache

* some mypy stuff

* lvp passing test_ops.py

* fix imports

* actually fix imports

* remove 'stdout'

* fix llvm import

* fix mypy issues

* nicer errors

* simpler test_dtype skips

* test lvp in CI

* fix github action syntax

* fix more actions typos

* switch to mesa 25.1.0

* diskcache_put

* better generation for lvp nir_options

* b64encode shader blobs

* Revert diskcache changes

This reverts commits 930fa3de8a and 8428c694b3.

* general cleanup

* better error messages

* fix llvm import

* fix windows tests

* link with libm and libgcc_s

* fix some errors

* dont check for 'float4'

* NIR uses pointer arithmetic

* use tinymesa

* bump tinymesa

* bump tinymesa again

* update lvp nir_options

* print nir shader with DEBUG

* simplify LVPCompiler

* more tests

* "gated" STORE

* NAK is cacheable

* more tests

* all tests pass locally for NAK

* test autogen in CI

* autogen deps

* more deps

* fix uop_gc

* fix macos

* mypy

* save 2 lines

* save two more lines

* save 1 line

* save 4 lines

* save more lines

* Revert "save more lines"

This reverts commit dd3a720c5a.

* save more lines

* fix LVP on windows

* refactor

* reorganize some code

* refactor lib_gpu

* move LVP check

* out of order loads

* remove support.mesa

* bump tinymesa version

* simplify LVP jit

* macos

* macos ci

* shell: bash

* testing

* more testing

* compute brew prefix

* stupid typo

* actually fix

* lib

* stdout on macos

* inline gallivm_compile_module

* Revert "inline gallivm_compile_module"

This reverts commit b65983b151.

* elf macos

* semicolon

* inherit from CPULLVMCompiler

* ruff

* disas test

* fix libm linking

* default is fine actually

* arm works

* add elf loader link test

* fix NAK beam

* pylint is too smart by half

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
2025-10-15 17:38:33 +08:00
..
amdpci am: init support for aql (#11888) 2025-08-28 18:41:46 +03:00
assembly ops_gpu -> ops_cl (#12103) 2025-09-10 15:15:48 -04:00
backends var_vals uses str for var (#12011) 2025-09-06 04:16:12 +02:00
datasets very tiny generate_dataset (#11013) 2025-06-27 17:10:45 -04:00
disassemblers/adreno qcom fix disasm (#6703) 2024-09-24 15:23:43 +08:00
dsp dsp stuff / sniff ioctls from snpe (#9490) 2025-03-20 10:38:23 +08:00
gemm remove trivial use of RANGEIFY flag (#12550) 2025-10-09 02:29:38 -04:00
hcq ast seems to probe nv as well (#11494) 2025-08-04 11:47:07 +03:00
hcqfuzz remove FUSE_ARANGE (#12511) 2025-10-08 04:54:07 -04:00
hip_gpu_driver amd: support rocm7 (#12502) 2025-10-08 14:30:39 +08:00
hiprtc use comgr to compile (#3248) 2024-01-26 18:27:49 -08:00
huggingface_onnx move frontend dir to nn [pr] (#12470) 2025-10-07 10:42:22 +08:00
junk coder.py can write and run code (#2439) 2023-11-25 12:27:54 -08:00
mesa Mesa NIR backend (NAK/LLVMpipe) (#12089) 2025-10-15 17:38:33 +08:00
mmapeak mmapeak implementation for 7900 XTX (#10417) 2025-05-23 16:26:12 -07:00
models Stable Diffusion model init for mlperf (#12314) 2025-10-02 02:28:41 -04:00
nv_gpu_driver auto-select available compilers (#12094) 2025-09-10 19:52:01 +03:00
optimization ShapeTracker.real_strides -> is_expanded [pr] (#12579) 2025-10-09 22:52:45 -04:00
perfetto upd perfetto (#11528) 2025-08-06 14:00:34 +03:00
qcom_gpu_driver ops_gpu -> ops_cl (#12103) 2025-09-10 15:15:48 -04:00
remu remu: add new instructions introduced in RANGEIFY (#12363) 2025-09-30 12:36:29 +03:00
resnet18 remove Tensor.no_grad, it's meaningless now [pr] (#10556) 2025-05-28 22:20:02 -07:00
sched move fuzz_schedule.py to extra [pr] (#10444) 2025-05-21 10:07:24 +03:00
sqtt sqtt: osx decoder installer (#12637) 2025-10-13 17:26:12 +08:00
thunder feat: add thunderkittens (#12590) 2025-10-10 00:32:33 -07:00
tinyfs fetch raid from cloud (#10799) 2025-10-14 07:53:55 -07:00
torch_backend remove assign contiguous hack (#12659) 2025-10-14 16:42:14 +08:00
torch_hook rename lazydata to uop (#10698) 2025-06-08 08:42:22 -07:00
usbgpu amd: usb4/thunderbolt on macs (#12641) 2025-10-15 13:02:01 +08:00
webgpu Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) (#8646) 2025-02-07 15:16:59 +08:00
archprobe.py ops_gpu -> ops_cl (#12103) 2025-09-10 15:15:48 -04:00
augment.py [ready] Replacing os with pathlib (#1708) 2023-08-30 10:41:08 -07:00
bench_log.py hotfix: BenchEvent MLPERF_RUN is mlperf_run (#10526) 2025-05-26 20:19:37 -04:00
disk_read_speed.py io_uring for copies from disk (#5035) 2024-06-21 11:36:51 +03:00
dump_cache.py wow how did i think that was okay (#2339) 2023-11-16 21:21:11 -08:00
export_model.py ops_gpu -> ops_cl (#12103) 2025-09-10 15:15:48 -04:00
f16_decompress.py u32 to f16 in tinygrad (#8074) 2024-12-06 12:00:13 +01:00
gradcheck.py tests from grad uop path [pr] (#8313) 2024-12-18 09:25:05 -08:00
hip_events.py move autogen to runtime/autogen (#3254) 2024-01-26 12:44:19 -08:00
hip_large_kernel.py minimum change for rdna4 [pr] (#9455) 2025-03-16 13:39:24 +08:00
hook_cuda.py cuda hooking (#9180) 2025-02-20 19:20:01 +08:00
introspection.py move files into uop dir (#10399) 2025-05-18 11:38:28 -07:00
lr_scheduler.py more beautiful cifar (#10551) 2025-05-28 20:48:20 -07:00
mcts_search.py var_vals uses str for var (#12011) 2025-09-06 04:16:12 +02:00
multitensor.py rename lazydata to uop (#10698) 2025-06-08 08:42:22 -07:00
onnx_helpers.py move frontend dir to nn [pr] (#12470) 2025-10-07 10:42:22 +08:00
reduce_speed.py VALIDATE_WITH_CPU [pr] (#9488) 2025-03-18 15:15:04 +08:00
replay_pkl.py update Kernel API in tests + move optimize_local_size (#11907) 2025-08-28 15:12:47 -07:00
ring_copy.py ring copy example (#3185) 2024-01-19 23:34:30 -05:00
setup_mock_amd_osx.sh add rocm 6.4 support (#10491) 2025-05-23 16:20:54 -07:00
setup_mock_nv_osx.sh hotfix: setup_mock_nv_osx 2025-02-13 12:26:15 +08:00
test_pyrender.py test pyrender (#12005) 2025-09-04 11:48:40 -07:00
thneed.py ops_gpu -> ops_cl (#12103) 2025-09-10 15:15:48 -04:00
threefry.py feat: make buffer (#6745) 2024-09-25 18:31:03 +08:00
to_movement_ops.py update torch 2.8 (#12172) 2025-09-14 15:19:03 -04:00
torch_muon.py [bounty] Muon optim (#11414) 2025-08-13 14:27:55 -04:00
training.py tinytqdm.set_description and tinytrange (#5101) 2024-06-22 14:45:06 -04:00
transfer_speed.py hotfix: copy size is in bytes 2024-01-17 16:44:15 +00:00
weekly_commits_table.py fix weekly commits table (i didn't know we linted extra) 2025-10-10 09:23:33 +08:00