Commit graph

13,618 commits

Author SHA1 Message Date
qazal
910ad33f87 simpler 2026-06-15 16:22:42 +08:00
qazal
bf75d235a1 simpler 2026-06-15 16:06:11 +08:00
qazal
8325dcea43 rename 2026-06-15 15:48:56 +08:00
qazal
7e4e895daa work 2026-06-15 15:47:27 +08:00
qazal
f777ff7180 diff polish 2026-06-15 15:43:50 +08:00
qazal
122bb10d4d cleaner 2026-06-15 15:32:54 +08:00
qazal
673f359c41 more cleanups 2026-06-15 15:24:57 +08:00
qazal
3e952f63c0 more inv_scale removal 2026-06-15 15:14:57 +08:00
qazal
8a05aa9d32 remove python inv_scale 2026-06-15 15:08:03 +08:00
qazal
a3c85c4dd6 fuse scale 2026-06-15 14:51:33 +08:00
nimlgen
4dc51aff6e
hcq2: jit (#16621)
* hcq2: jit

* x

* x

* minor
2026-06-15 06:35:35 +07:00
chenyu
2adedf5ccb
clean up fold_divmod_general [pr] (#16622)
genralized fold_binary_numerator in fold_divmod_congruence
2026-06-14 17:15:52 -04:00
George Hotz
a6d7fb9d4d
only SHRINK for non scalar access (#16619) 2026-06-14 10:08:37 -07:00
George Hotz
b1fb39502d delete that test 2026-06-14 09:42:58 -07:00
chenyu
2e181f4259
simpler cancel_divmod [PR] (#16616) 2026-06-14 11:41:31 -04:00
chenyu
5d5ead78da
inline unique_const in invalids [PR] (#16612) 2026-06-13 10:14:32 -04:00
Sieds Lykles
b00dd754a9
Remove if-condition from nested div rule [pr] (#16611)
* add rules and test

* trigger [pr]
2026-06-13 15:47:21 +02:00
nimlgen
5a9227b30a
hcq2: rebind var params (#16610) 2026-06-13 14:55:52 +03:00
nimlgen
8efc8d064f
unique based on opaque in from_buffer (#16609) 2026-06-13 14:31:58 +03:00
nimlgen
c43091a464
fix missing cast in cstyle (#16608)
* fix missing cast in cstyle

* x

* x
2026-06-13 10:04:06 +03:00
qazal
2e77bd01db
fp8 gemm cleanup (#16607) 2026-06-13 13:17:32 +09:00
Christopher Milan
bcdb988df0
split comma benchmark, dsp on c4 [PR] (#16598) 2026-06-12 23:26:05 -04:00
George Hotz
6b8fdfe4ca
alu addrspace is where the math happens (#16606)
* alu addrspace

* fix cstyle/llvm

* on ptx, reg+alu are the same thing
2026-06-12 20:01:28 -07:00
wozeparrot
67a4f129c2
llama: fix bf16 gemm oob (#16603) 2026-06-12 19:43:05 -07:00
Christopher Milan
8862c7549c
new-style dcache_flush (#16602) 2026-06-12 22:25:08 -04:00
chenyu
9e72a6b376
more indexing cleanup [PR] (#16600) 2026-06-12 21:33:47 -04:00
chenyu
aa32d309db
fix rangeify indexing for pad/reduce (#16599) 2026-06-12 20:26:15 -04:00
George Hotz
96b86aad7b
move new style transform up more (#16593)
* move new style transform up more

* pm_move_gates_from_index works on new style
2026-06-12 17:20:12 -07:00
chenyu
a35964493e
UPat method cleanups [PR] (#16596) 2026-06-12 17:22:54 -04:00
chenyu
3036b15ed9
remove Tensor.ufix [PR] (#16594)
* remove Tensor.ufix [PR]

* inline _ufix_keep_dtype
2026-06-12 14:40:28 -04:00
qazal
b2e95b2db3
rangeify: no copies for write+read of same slice (#16585)
* failing test

* cleaner failing tests

* assign and read of same slice shouldn't create copies

* err in the changes

* shrink with no overlapping regions in dest is fine
2026-06-13 02:19:47 +09:00
George Hotz
833cb37574
move up new style transform (#16592)
* simpler names

* move up new style transform

* fix that rule
2026-06-12 10:13:37 -07:00
George Hotz
51100d2c5c
new style cleanups (#16584)
* spec tighten

* revert

* lin fix

* lin fix

* needed for x86

* revert
2026-06-12 08:10:38 -07:00
Philip Sinitsin
76c10cd635
jit: don't memplan buffers reachable from live tensors (#16588)
The memory planner was suballocating BUFFERs created during JIT capture that are still referenced by external lazy tensor graphs, like the .grad tensors assigned by backward(). The replay then only writes the arena slices, so realizing such a tensor after the call reads freshly allocated memory and silently returns zeros. Hold every BUFFER reachable from a live Tensor instead of only the parameters of the return value; true internals are still planned. Fixes #16571.
2026-06-12 17:51:54 +03:00
nimlgen
2bfdf85f87
hcq2: move pre bufferize (#16589)
* hcq2: move pre bufferize

* x
2026-06-12 16:11:59 +03:00
nimlgen
fb74f75485
var params sort after global params (#16590) 2026-06-12 14:33:15 +03:00
qazal
4d34590b7d
llama: less E kernels (#16517) 2026-06-12 19:49:25 +09:00
qazal
12f4cf0e49
rename amd/test_custom_kernel.py to test_asm_kernel (#16586)
* rename amd/test_custom_kernel.py to test_asm_kernel

* update
2026-06-12 16:11:01 +09:00
wozeparrot
e770805d21
llama: mxfp8 (#16574) 2026-06-11 22:15:24 -07:00
George Hotz
b8aec4cce7
port x86 to new_style (fable slop) and now everything is new style (#16581)
* port x86 to new_style (fable slop)

* don't change ops

* port NIR to new_style (fable)

* lil cleanup

* fix tests, and remove new_style
2026-06-11 21:09:34 -07:00
chenyu
762f50bd52
move gradient.py to mixin/ [PR] (#16583) 2026-06-11 23:58:21 -04:00
chenyu
a2cec397f3
UOp cast and bitcast takes DTypeLike [PR] (#16582)
* UOp cast and bitcast takes DTypeLike [PR]

match Tensor

* fix type
2026-06-11 22:38:54 -04:00
George Hotz
b97e3e01e3
port NIR to new_style (fable) (#16580)
* port NIR to new_style (fable)

* lil cleanup
2026-06-11 18:47:30 -07:00
Christopher Milan
4d893f626a
move a bunch of test_schedule to null (#16578) 2026-06-11 20:26:34 -04:00
George Hotz
b57639a6cc
port python to new_style (fable) (#16579)
* port python to new_style (fable)

* doesn't have to be const in python
2026-06-11 17:26:05 -07:00
George Hotz
a04d2fa4eb
port ptx to new_style (fable) (#16577)
* port ptx to new_style (fable)

* simplify

* simpler
2026-06-11 17:05:03 -07:00
George Hotz
587333fddb
replace DEFINE_VAR with PARAM (#16576)
* replace DEFINE_VAR with PARAM

* cleanups

* cleanups
2026-06-11 15:03:20 -07:00
chenyu
5f1e2d3900
PADTO pads Invalids (#16562) 2026-06-11 16:54:26 -04:00
George Hotz
434a8ffc38
move llvm to new style (#16573)
* move llvm to new style

* fix wmma

* buffer is early
2026-06-11 12:59:02 -07:00
George Hotz
347608a523
put loads back on reg (#16572)
* put loads back on reg

* fix dsp
2026-06-11 11:24:50 -07:00