Commit graph

13,615 commits

Author SHA1 Message Date
George Hotz
bdfcb1cb98 test ops passes 2026-06-14 12:58:18 -07:00
George Hotz
a6fdb53a1e
Merge branch 'master' into codegen2 2026-06-14 10:09:00 -07:00
George Hotz
a6d7fb9d4d
only SHRINK for non scalar access (#16619) 2026-06-14 10:08:37 -07:00
George Hotz
b1fb39502d delete that test 2026-06-14 09:42:58 -07:00
George Hotz
49deb9714b test_tiny passes 2026-06-14 09:36:51 -07:00
George Hotz
afab220947
Merge branch 'master' into codegen2 2026-06-14 08:52:36 -07:00
chenyu
2e181f4259
simpler cancel_divmod [PR] (#16616) 2026-06-14 11:41:31 -04:00
George Hotz
a7523b2596 simpler 2026-06-13 10:40:52 -07:00
chenyu
5d5ead78da
inline unique_const in invalids [PR] (#16612) 2026-06-13 10:14:32 -04:00
Sieds Lykles
b00dd754a9
Remove if-condition from nested div rule [pr] (#16611)
* add rules and test

* trigger [pr]
2026-06-13 15:47:21 +02:00
nimlgen
5a9227b30a
hcq2: rebind var params (#16610) 2026-06-13 14:55:52 +03:00
nimlgen
8efc8d064f
unique based on opaque in from_buffer (#16609) 2026-06-13 14:31:58 +03:00
nimlgen
c43091a464
fix missing cast in cstyle (#16608)
* fix missing cast in cstyle

* x

* x
2026-06-13 10:04:06 +03:00
qazal
2e77bd01db
fp8 gemm cleanup (#16607) 2026-06-13 13:17:32 +09:00
Christopher Milan
bcdb988df0
split comma benchmark, dsp on c4 [PR] (#16598) 2026-06-12 23:26:05 -04:00
George Hotz
21806848df improve new codegen 2026-06-12 20:08:20 -07:00
George Hotz
6fda6c704d
Merge branch 'master' into codegen2 2026-06-12 20:01:43 -07:00
George Hotz
6b8fdfe4ca
alu addrspace is where the math happens (#16606)
* alu addrspace

* fix cstyle/llvm

* on ptx, reg+alu are the same thing
2026-06-12 20:01:28 -07:00
wozeparrot
67a4f129c2
llama: fix bf16 gemm oob (#16603) 2026-06-12 19:43:05 -07:00
Christopher Milan
8862c7549c
new-style dcache_flush (#16602) 2026-06-12 22:25:08 -04:00
George Hotz
3f7ec187df work 2026-06-12 19:24:56 -07:00
George Hotz
af9284e9b1 try for a full rewrite of codegen 2026-06-12 19:11:54 -07:00
chenyu
9e72a6b376
more indexing cleanup [PR] (#16600) 2026-06-12 21:33:47 -04:00
chenyu
aa32d309db
fix rangeify indexing for pad/reduce (#16599) 2026-06-12 20:26:15 -04:00
George Hotz
96b86aad7b
move new style transform up more (#16593)
* move new style transform up more

* pm_move_gates_from_index works on new style
2026-06-12 17:20:12 -07:00
chenyu
a35964493e
UPat method cleanups [PR] (#16596) 2026-06-12 17:22:54 -04:00
chenyu
3036b15ed9
remove Tensor.ufix [PR] (#16594)
* remove Tensor.ufix [PR]

* inline _ufix_keep_dtype
2026-06-12 14:40:28 -04:00
qazal
b2e95b2db3
rangeify: no copies for write+read of same slice (#16585)
* failing test

* cleaner failing tests

* assign and read of same slice shouldn't create copies

* err in the changes

* shrink with no overlapping regions in dest is fine
2026-06-13 02:19:47 +09:00
George Hotz
833cb37574
move up new style transform (#16592)
* simpler names

* move up new style transform

* fix that rule
2026-06-12 10:13:37 -07:00
George Hotz
51100d2c5c
new style cleanups (#16584)
* spec tighten

* revert

* lin fix

* lin fix

* needed for x86

* revert
2026-06-12 08:10:38 -07:00
Philip Sinitsin
76c10cd635
jit: don't memplan buffers reachable from live tensors (#16588)
The memory planner was suballocating BUFFERs created during JIT capture that are still referenced by external lazy tensor graphs, like the .grad tensors assigned by backward(). The replay then only writes the arena slices, so realizing such a tensor after the call reads freshly allocated memory and silently returns zeros. Hold every BUFFER reachable from a live Tensor instead of only the parameters of the return value; true internals are still planned. Fixes #16571.
2026-06-12 17:51:54 +03:00
nimlgen
2bfdf85f87
hcq2: move pre bufferize (#16589)
* hcq2: move pre bufferize

* x
2026-06-12 16:11:59 +03:00
nimlgen
fb74f75485
var params sort after global params (#16590) 2026-06-12 14:33:15 +03:00
qazal
4d34590b7d
llama: less E kernels (#16517) 2026-06-12 19:49:25 +09:00
qazal
12f4cf0e49
rename amd/test_custom_kernel.py to test_asm_kernel (#16586)
* rename amd/test_custom_kernel.py to test_asm_kernel

* update
2026-06-12 16:11:01 +09:00
wozeparrot
e770805d21
llama: mxfp8 (#16574) 2026-06-11 22:15:24 -07:00
George Hotz
b8aec4cce7
port x86 to new_style (fable slop) and now everything is new style (#16581)
* port x86 to new_style (fable slop)

* don't change ops

* port NIR to new_style (fable)

* lil cleanup

* fix tests, and remove new_style
2026-06-11 21:09:34 -07:00
chenyu
762f50bd52
move gradient.py to mixin/ [PR] (#16583) 2026-06-11 23:58:21 -04:00
chenyu
a2cec397f3
UOp cast and bitcast takes DTypeLike [PR] (#16582)
* UOp cast and bitcast takes DTypeLike [PR]

match Tensor

* fix type
2026-06-11 22:38:54 -04:00
George Hotz
b97e3e01e3
port NIR to new_style (fable) (#16580)
* port NIR to new_style (fable)

* lil cleanup
2026-06-11 18:47:30 -07:00
Christopher Milan
4d893f626a
move a bunch of test_schedule to null (#16578) 2026-06-11 20:26:34 -04:00
George Hotz
b57639a6cc
port python to new_style (fable) (#16579)
* port python to new_style (fable)

* doesn't have to be const in python
2026-06-11 17:26:05 -07:00
George Hotz
a04d2fa4eb
port ptx to new_style (fable) (#16577)
* port ptx to new_style (fable)

* simplify

* simpler
2026-06-11 17:05:03 -07:00
George Hotz
587333fddb
replace DEFINE_VAR with PARAM (#16576)
* replace DEFINE_VAR with PARAM

* cleanups

* cleanups
2026-06-11 15:03:20 -07:00
chenyu
5f1e2d3900
PADTO pads Invalids (#16562) 2026-06-11 16:54:26 -04:00
George Hotz
434a8ffc38
move llvm to new style (#16573)
* move llvm to new style

* fix wmma

* buffer is early
2026-06-11 12:59:02 -07:00
George Hotz
347608a523
put loads back on reg (#16572)
* put loads back on reg

* fix dsp
2026-06-11 11:24:50 -07:00
nimlgen
e5f498de3b
hcq2: debug=2 info (#16569)
* hcq2: debug=2 info

* t

* x

* hcq2: debug=2 info

* x
2026-06-11 19:52:01 +03:00
qazal
a83710396c
support mselect input to CALL, less kernels in allreduce (#16567)
* support mselect input to CALL, less kernels in allreduce

* resolve mstack
2026-06-11 18:10:47 +09:00
qazal
7d4a77dce4
relax comma benchmark timeout (#16568) 2026-06-11 18:03:37 +09:00