Commit graph

13,471 commits

Author SHA1 Message Date
Nino Risteski
af4c74bb41
delete extra cast (#14517) 2026-02-03 08:29:04 -05:00
chenyu
9d1e9e643e
removed a duplicated remove_bufferize rule [pr] (#14519) 2026-02-03 08:28:07 -05:00
George Hotz
d59e6e7a37
move more tests to test/null, split some existing ones (#14512)
* move more tests to test/null, split some existing ones

* null work

* null work

* move more

* fixes

* move PIL

* PIL in CLIP

* don't move that
2026-02-03 20:20:20 +08:00
qazal
a98c53769a
ASM_GEMM=1 runs the UOp gemm on non cdna (#14516)
* ASM_GEMM=1 runs the UOp gemm on non cdna

tests run on mac in 3 seconds

* min diff
2026-02-03 20:42:02 +09:00
qazal
5c1d21349e
viz: profiler command line tool (#14515) 2026-02-03 19:51:25 +09:00
George Hotz
dd2de4f838
rename all DEFINE_GLOBAL to PARAM (#14511) 2026-02-03 15:09:38 +08:00
George Hotz
dc77b3318b
move files that pass with NULL=1 to test/null (#14508)
* move files that pass with NULL=1 to test/null

* fix windows

* cpu 0

* bugfix + durations
2026-02-03 13:52:36 +08:00
George Hotz
888819ee09
call autodiff gradient (#14510) 2026-02-03 13:51:02 +08:00
wozeparrot
bbcd3d67a3
fa: faster (#14453) 2026-02-02 21:34:17 -08:00
Christopher Milan
e579613b90
IR3 has aux (#14509) 2026-02-02 23:46:41 -05:00
George Hotz
85c7b23160
add pytest -nauto to benchmark for mac (#14458)
* add pytest -nauto to benchmark

* 3 minute timeout

* 3 min

* setup env

* comment

* fresh db

* in the pyenv
2026-02-03 12:26:09 +08:00
Christopher Milan
a5d7eb37db
IR3 works on versions earlier than 3.14 (#14507) 2026-02-02 23:10:19 -05:00
George Hotz
33c886cafa
disable copyout on NULL backend by default (#14506)
* disable copyout on NULL backend

* gate it

* allow copyout on some tests
2026-02-03 11:57:47 +08:00
chenyu
3c5845e8a5
remove cut_store_range (#14505)
special scheduling for CPU
2026-02-02 21:58:36 -05:00
chenyu
4f2e7aed24
fix multiple REDUCE on same RANGE (#14504)
each RANGE maps to one END, but reduce_to_acc is local and would not know this
2026-02-02 20:42:09 -05:00
chenyu
93c41a78fa
clean up NOOP [pr] (#14503)
should not be used as a COPY, started with removing from ALWAYS_RUN_OPS
2026-02-02 19:46:45 -05:00
chenyu
66d2b02f11
delete files that depends on extra.optimization.helpers (#14499) 2026-02-02 13:33:33 -05:00
George Hotz
ec0398fceb
test amd gpu crashes (#14459)
* test amd gpu crashes

* cleanup

* less sketch tests
2026-02-02 18:57:47 +03:00
nimlgen
6e4238c016
amd: recovery (#14461)
* rec

* ?

* rv

* cleaner

* post merge

* not used

* um

* clnr

* x

* x

* d

* move
2026-02-02 18:57:35 +03:00
chenyu
61ca19ff24
after with empty src is self [pr] (#14496) 2026-02-02 10:19:05 -05:00
George Hotz
6e958dbfd4
assembly/amd: add RDNA4 support to emulator (#14341)
* start new rdna4

* work

* plus works

* more pass

* rdna4

* assembly/amd: fix RDNA4 emulator for float16 and VOP3 clamp

* stale

* rev

* rr

* rdna4 emu tests

* cleanup

* cleanup

* simp

* works

* better factorizaion

* hacks

* fix mockgpu

* guard both

* cleaner

* gate

* bug fix and a few tests

* all test_tiny
2026-02-02 21:35:59 +08:00
chenyu
a908f447d5
remove disk special case in mstack_early_shrink [pr] (#14494) 2026-02-02 08:34:45 -05:00
qazal
965940dd00
sqtt: update examples after event field change (#14493)
* regen sqtt examples

* cdna

* rdna4

* packet counts for rdna3

* sqttmap work
2026-02-02 21:39:48 +09:00
George Hotz
965149a46d
assembly/amd: add ds perm instructions (#14486)
* assembly/amd: add ds perm instructions

* NO SKIP

* fix preexisting RDNA3 issues

* pcode

* assert

* asserts

* unify

* simp

* good fix
2026-02-02 16:02:00 +08:00
qazal
1746d1f997
remove SPEC=0 context in custom_kernel tests, pyrender always skips it (#14489) 2026-02-02 16:32:01 +09:00
George Hotz
d4007f36e0
remove DEFINE_GLOBAL (it is PARAM now) (#14488) 2026-02-02 14:56:37 +08:00
qazal
6c487656f9
viz: kernel metadata from rodata entry (#14487) 2026-02-02 15:41:42 +09:00
Robbe Derks
d75a1b0d5a
usbgpu: use BOT interface for patch.py (#13644)
* BOT usage

* cleanup

* fix lint

* fix ruff

* fix -7?
2026-02-02 11:54:46 +08:00
Christopher Milan
2931b52875
skip autogen if MTLCompiler is loaded (#14466) 2026-02-01 22:12:27 -05:00
George Hotz
9a32d6e090
add depth limit for SPEC=2 (#14485)
* make SPEC=2 work for everything

* that's a horrible fix

* add depth limit
2026-02-02 10:43:28 +08:00
George Hotz
368a692e1a
make SPEC=2 work for everything (#14476)
* make SPEC=2 work for everything

* that's a horrible fix
2026-02-02 10:30:56 +08:00
chenyu
ea1f1d2b9d
test_assign_to_bitcast_view (#14483)
currently disk allows assign same size dtype into a bitcasted view
2026-02-01 16:46:04 -05:00
chenyu
6deeccc192
fix RING with single dest (#14482) 2026-02-01 12:14:46 -05:00
chenyu
3ff390159b
don't implicitly change dtype in assign (#14481)
broadcast shape is fine, but implicitly cast dtype is hard to find
2026-02-01 11:48:54 -05:00
imaolo
2111762a48
failed test case for RING output device (#14191)
* Add enable/disable scheduler cache ContextVar

* add allreduce ring and naive to() tests

* clearer test comparing native vs ring allreduce

* split tests, add helper

* removing trailing whitespace

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2026-02-01 11:48:43 -05:00
chenyu
02afae04f4
atol in test_call_gemm (#14480)
flaky
2026-02-01 11:24:58 -05:00
chenyu
5705398a1f
assign cleanup [pr] (#14479)
share more code path between disk and non-disk. also raise RuntimeError instead of Assert for mismatches
2026-02-01 09:10:22 -05:00
chenyu
da500dbe06
simplify late_buffer_view [pr] (#14478)
check the only allowed Ops in the chain, and offset cannot be negative
2026-01-31 22:38:40 -05:00
chenyu
b4f96301e0
remove unused rules [pr] (#14477) 2026-01-31 21:29:30 -05:00
qazal
54e78dbec8
viz: remove hardcoded strings in cfg tests (#14462) 2026-02-01 09:30:43 +09:00
chenyu
5d38db9da6
generic bitcast assign (#14474)
a.bitcast(X).assign(src) -> a.assign(src.bitcast(a.dtype))
2026-01-31 17:29:20 -05:00
chenyu
b38fc43b07
assert assign dtype mismatch for disk [pr] (#14473)
the disk hack is generally wrong, now force bitcast on the source before assign
2026-01-31 17:08:54 -05:00
chenyu
ced886f26c
failed test case for assign into bitcast (#14469)
* failed test case for assign into bitcast

DISK assign has custom hack for this. need to fix before we can unify assign

* test_assign_bitcast_different_size
2026-01-31 14:26:47 -05:00
chenyu
81eee5b30a
unused spec [pr] (#14468)
no BUFFER_VIEW in tensor, and no CONTIGUOUS in KERNEL
2026-01-31 13:53:16 -05:00
nimlgen
f873c7b6c5
amd: fetch_name is file_name (#14465) 2026-01-31 20:11:07 +03:00
chenyu
c765641215
remove unused allow_any_len [pr] (#14464)
STORE has 2 src, RESHAPE has 2 src, BUFFER has 2 src
added some tests for the untested allow_any_len
2026-01-31 11:05:42 -05:00
chenyu
b4f5a51ebb
move tests to unit (#14463)
test_uop_graph does not need device, test_memory_planner can use NULL
2026-01-31 10:49:31 -05:00
qazal
616e9c1483
CDNA assembly gemm in tensor.py with flag (#14310)
* work

* work

* the assembly

* remove the old one

* remove ws bufs, assert splitk

* notes cleanup

* work

* gemm args

* gemm in mixins would be nice

* add gemm gradient

* print counters

* the realize is for DEBUG=2 aesthetics

* dedup

* rewrite to python dsl, no list copies

* leave that

* add B, M, N, K to gemm name

* it's M0 not NULL

* fp16 support

* test cleanup + more gemms

* work from viz

* more work

* gemm batch_size

* xccg path work

* tiny comments on the label naming

* s_waitcnt
2026-01-31 22:34:14 +09:00
chenyu
55f806b713
tighter late_buffer_view match [pr] (#14456)
src must be len 2 at that point
2026-01-31 07:28:26 -05:00
qazal
d69bc5aa1a
make DEV=NULL EMULATE=AMD amd_asm_matmul run (#14460) 2026-01-31 20:45:24 +09:00