Commit graph

12,010 commits

Author SHA1 Message Date
George Hotz
1019a3d8f8
Merge branch 'master' into mac_pytest 2026-02-02 23:40:03 +08:00
chenyu
61ca19ff24
after with empty src is self [pr] (#14496) 2026-02-02 10:19:05 -05:00
George Hotz
6e958dbfd4
assembly/amd: add RDNA4 support to emulator (#14341)
* start new rdna4

* work

* plus works

* more pass

* rdna4

* assembly/amd: fix RDNA4 emulator for float16 and VOP3 clamp

* stale

* rev

* rr

* rdna4 emu tests

* cleanup

* cleanup

* simp

* works

* better factorizaion

* hacks

* fix mockgpu

* guard both

* cleaner

* gate

* bug fix and a few tests

* all test_tiny
2026-02-02 21:35:59 +08:00
chenyu
a908f447d5
remove disk special case in mstack_early_shrink [pr] (#14494) 2026-02-02 08:34:45 -05:00
qazal
965940dd00
sqtt: update examples after event field change (#14493)
* regen sqtt examples

* cdna

* rdna4

* packet counts for rdna3

* sqttmap work
2026-02-02 21:39:48 +09:00
George Hotz
965149a46d
assembly/amd: add ds perm instructions (#14486)
* assembly/amd: add ds perm instructions

* NO SKIP

* fix preexisting RDNA3 issues

* pcode

* assert

* asserts

* unify

* simp

* good fix
2026-02-02 16:02:00 +08:00
qazal
1746d1f997
remove SPEC=0 context in custom_kernel tests, pyrender always skips it (#14489) 2026-02-02 16:32:01 +09:00
George Hotz
d4007f36e0
remove DEFINE_GLOBAL (it is PARAM now) (#14488) 2026-02-02 14:56:37 +08:00
qazal
6c487656f9
viz: kernel metadata from rodata entry (#14487) 2026-02-02 15:41:42 +09:00
Robbe Derks
d75a1b0d5a
usbgpu: use BOT interface for patch.py (#13644)
* BOT usage

* cleanup

* fix lint

* fix ruff

* fix -7?
2026-02-02 11:54:46 +08:00
Christopher Milan
2931b52875
skip autogen if MTLCompiler is loaded (#14466) 2026-02-01 22:12:27 -05:00
George Hotz
9a32d6e090
add depth limit for SPEC=2 (#14485)
* make SPEC=2 work for everything

* that's a horrible fix

* add depth limit
2026-02-02 10:43:28 +08:00
George Hotz
368a692e1a
make SPEC=2 work for everything (#14476)
* make SPEC=2 work for everything

* that's a horrible fix
2026-02-02 10:30:56 +08:00
chenyu
ea1f1d2b9d
test_assign_to_bitcast_view (#14483)
currently disk allows assign same size dtype into a bitcasted view
2026-02-01 16:46:04 -05:00
chenyu
6deeccc192
fix RING with single dest (#14482) 2026-02-01 12:14:46 -05:00
chenyu
3ff390159b
don't implicitly change dtype in assign (#14481)
broadcast shape is fine, but implicitly cast dtype is hard to find
2026-02-01 11:48:54 -05:00
imaolo
2111762a48
failed test case for RING output device (#14191)
* Add enable/disable scheduler cache ContextVar

* add allreduce ring and naive to() tests

* clearer test comparing native vs ring allreduce

* split tests, add helper

* removing trailing whitespace

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2026-02-01 11:48:43 -05:00
chenyu
02afae04f4
atol in test_call_gemm (#14480)
flaky
2026-02-01 11:24:58 -05:00
chenyu
5705398a1f
assign cleanup [pr] (#14479)
share more code path between disk and non-disk. also raise RuntimeError instead of Assert for mismatches
2026-02-01 09:10:22 -05:00
chenyu
da500dbe06
simplify late_buffer_view [pr] (#14478)
check the only allowed Ops in the chain, and offset cannot be negative
2026-01-31 22:38:40 -05:00
chenyu
b4f96301e0
remove unused rules [pr] (#14477) 2026-01-31 21:29:30 -05:00
qazal
54e78dbec8
viz: remove hardcoded strings in cfg tests (#14462) 2026-02-01 09:30:43 +09:00
chenyu
5d38db9da6
generic bitcast assign (#14474)
a.bitcast(X).assign(src) -> a.assign(src.bitcast(a.dtype))
2026-01-31 17:29:20 -05:00
chenyu
b38fc43b07
assert assign dtype mismatch for disk [pr] (#14473)
the disk hack is generally wrong, now force bitcast on the source before assign
2026-01-31 17:08:54 -05:00
chenyu
ced886f26c
failed test case for assign into bitcast (#14469)
* failed test case for assign into bitcast

DISK assign has custom hack for this. need to fix before we can unify assign

* test_assign_bitcast_different_size
2026-01-31 14:26:47 -05:00
chenyu
81eee5b30a
unused spec [pr] (#14468)
no BUFFER_VIEW in tensor, and no CONTIGUOUS in KERNEL
2026-01-31 13:53:16 -05:00
nimlgen
f873c7b6c5
amd: fetch_name is file_name (#14465) 2026-01-31 20:11:07 +03:00
chenyu
c765641215
remove unused allow_any_len [pr] (#14464)
STORE has 2 src, RESHAPE has 2 src, BUFFER has 2 src
added some tests for the untested allow_any_len
2026-01-31 11:05:42 -05:00
chenyu
b4f5a51ebb
move tests to unit (#14463)
test_uop_graph does not need device, test_memory_planner can use NULL
2026-01-31 10:49:31 -05:00
qazal
616e9c1483
CDNA assembly gemm in tensor.py with flag (#14310)
* work

* work

* the assembly

* remove the old one

* remove ws bufs, assert splitk

* notes cleanup

* work

* gemm args

* gemm in mixins would be nice

* add gemm gradient

* print counters

* the realize is for DEBUG=2 aesthetics

* dedup

* rewrite to python dsl, no list copies

* leave that

* add B, M, N, K to gemm name

* it's M0 not NULL

* fp16 support

* test cleanup + more gemms

* work from viz

* more work

* gemm batch_size

* xccg path work

* tiny comments on the label naming

* s_waitcnt
2026-01-31 22:34:14 +09:00
chenyu
55f806b713
tighter late_buffer_view match [pr] (#14456)
src must be len 2 at that point
2026-01-31 07:28:26 -05:00
qazal
d69bc5aa1a
make DEV=NULL EMULATE=AMD amd_asm_matmul run (#14460) 2026-01-31 20:45:24 +09:00
George Hotz
4d7b16f330 in the pyenv 2026-01-31 13:37:31 +08:00
George Hotz
814a1d59e2 fresh db 2026-01-31 13:22:03 +08:00
George Hotz
de52fa6116 comment 2026-01-31 13:19:15 +08:00
George Hotz
067747560b setup env 2026-01-31 13:18:21 +08:00
qazal
4976544bf9
multi ram usage tests on the NULL device (#14457) 2026-01-31 14:14:53 +09:00
George Hotz
d3c808d90b 3 min 2026-01-31 13:09:03 +08:00
George Hotz
dde297fd47 3 minute timeout 2026-01-31 13:08:09 +08:00
George Hotz
84b3d8117f add pytest -nauto to benchmark 2026-01-31 13:06:48 +08:00
chenyu
99b44121bc
failed test case for non-consecutive disk read (#14455)
silently fail now
2026-01-30 23:44:04 -05:00
George Hotz
b705c9143c
assembly/amd: test more instructions (#14365)
* assembly/amd: test more instructions

* more

* passing

* revert

* no const fold

* remove junk

* cleaner
2026-01-31 12:40:22 +08:00
George Hotz
c9a3ddb341
benchmark llama walltime script (#14454)
* benchmark llama walltime script

* adj layers
2026-01-31 10:21:54 +08:00
George Hotz
f5346d6a1a
fix USE_ATOMICS for non float dtypes and make it the default (#14444)
* embedded multistep test

* complex test

* with jit

* fix dtypes and reenable USE_ATOMICS

* that test didn't catch anything
2026-01-31 09:44:16 +08:00
Christopher Milan
e575dd8275
prevent UB in long decomp and more emulated tests (#14447) 2026-01-30 19:38:41 -05:00
chenyu
3204f94454
correct var_vals schedule filter (#14451)
complete_create_schedule_with_vars returns var_vals that's used in schedule
2026-01-30 17:10:07 -05:00
chenyu
cfcd1debb5
test schedule with multiple AFTER (#14449) 2026-01-30 15:59:00 -05:00
nimlgen
486d53d646
device: call free for external_ptr (#14448)
* device: call free for external_ptr

* lin
2026-01-30 23:53:17 +03:00
nimlgen
e0978498dc
amd: read_ptr/write_ptr/doorbells are not lists (#14445) 2026-01-30 23:11:57 +03:00
Christopher Milan
1803ee939d
EMULATED_DTYPES=long works with CPU_LLVM (#14446) 2026-01-30 13:54:43 -05:00