Commit graph

1,578 commits

Author SHA1 Message Date
kamilisjon
e20bc0b9b5
remove unused function parameter in beam search (#13602) 2025-12-06 11:40:47 -05:00
chenyu
21aac568fd
limit lift x*y out of reduce to int [pr] (#13535) 2025-12-02 16:11:45 -05:00
George Hotz
037edc151c
late gate for ALLOW_TF32 (#13527)
* remove ALLOW_TF32

* the right place to put that gate
2025-12-02 07:51:58 -08:00
Roelof van Dijk
eb543a91e8
perf: remove graph-in-graph from expand_index (#13473)
* remove graph-in-graph from devectorizer

* vectorize, not sink

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-11-27 11:32:16 -08:00
Sieds Lykles
114bb94c55
Fix load collapse MAX to ADD (#13406)
* add Ops.ADD to pattern

* add test
2025-11-21 12:26:14 +01:00
chenyu
fa3def2f12
call less simplify in simplify_valid_load [pr] (#13401) 2025-11-20 19:54:22 -05:00
chenyu
647fde64e6
no sym in pm_reduce [pr] (#13398)
* no sym in pm_reduce [pr]

* fix that
2025-11-20 16:49:09 -05:00
George Hotz
225eb1500f
generic range changes that work for str + int (#13350)
* generic range changes that work for str + int

* opt range counts up
2025-11-19 08:07:49 -08:00
George Hotz
6d3385c284
print special ops in postrange (#13318)
* print special ops in postrange

* fix on OSX
2025-11-17 14:43:23 -08:00
George Hotz
6c5fa349e1
add (unused) outer range (#13285) 2025-11-14 16:47:52 -08:00
chenyu
58b7e4fab3
GROUPTOP heuristic on more axes (#13206)
fixed dm speed
2025-11-10 23:30:37 -05:00
chenyu
e1d46de8f8
update GROUPTOP heuristic more (#13178)
reverts #13176
2025-11-09 02:31:12 -05:00
chenyu
41e45c20ff
minor stuff reading the printed code [pr] (#13177) 2025-11-09 00:58:51 -05:00
chenyu
8e868dced8
only GROUPTOP one reduce kernel (#13176)
* only GROUPTOP one reduce kernel

* ALLOWED_GATED_READ_IMAGE=148
2025-11-08 22:38:44 -05:00
chenyu
a62496cb3d
clean up get_grouped_dims [pr] (#13159) 2025-11-08 01:53:54 -05:00
wozeparrot
eb0192b0bb
feat: print ranges that aren't ended (#13167) 2025-11-07 22:01:29 -08:00
chenyu
6a509da7f3
Scheduler.reduceops helper [pr] (#13162) 2025-11-07 18:59:46 -05:00
Ahmed Harmouche
3ecff3a8da
Fix dim splitting bug for len(dim) == len(limited) case (#13142)
* Fix gpudims bug on webgpu

* Fix split dim bug

* Remove webgpu_bug from examples

* Add test for shape correctness

* Fix 3D indexing

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-11-07 12:31:06 -05:00
George Hotz
bb6364d7c7
tuplize from linearizer behind flag (#13136)
* remove tuplize from linearizer

* optional tuplize
2025-11-06 20:15:03 -08:00
George Hotz
42b34cf83d
bottom up linearizer (#13133)
* bottom up linearizer

* late stores

* more complete

* remove broken heuristic

* upcast size

* opt

* more conservative

* it needs that

* disable opencl half on QCOM

* fix

* make that a real test

* cpu test okay

* ptx skip

* end is after the range
2025-11-06 15:30:32 -08:00
George Hotz
e0d828dba8 little cleanups 2025-11-06 13:58:19 -08:00
George Hotz
290441dd44
do loads early (#13131)
* do loads early

* local and reg
2025-11-06 09:57:09 -08:00
George Hotz
097264853d
very simple priority (#13130)
* very simple priority

* still simple
2025-11-06 09:25:28 -08:00
George Hotz
07b415e831
fixup op order (#13128)
* fixup op order

* more order

* move a few more

* more

* DEBUG_LINEARIZE
2025-11-06 08:50:04 -08:00
George Hotz
91cc773397
add run count to toposort (#13119) 2025-11-05 22:29:34 -08:00
chenyu
ca17718b6d
remove symbolic_flat (#13083)
* remove symbolic_flat

some kernels are different but sometimes it's better so not clear, will merge as long as benchmark passes

* test_location
2025-11-03 17:25:21 -05:00
George Hotz
1e3d6e49a6
index slicing + allclose (#13071)
* continue work on slicing+allclose

* Revert "Revert "slicing + allclose""

This reverts commit 6c7a12f21c.

* fix tests + better syntax

* forgot an after

* slot is an integer
2025-11-03 13:01:48 +08:00
George Hotz
6c7a12f21c Revert "slicing + allclose"
This reverts commit c9a1e35b1e.
2025-11-03 12:05:44 +08:00
George Hotz
c9a1e35b1e slicing + allclose 2025-11-03 12:00:45 +08:00
Sieds Lykles
885b6dea9e
multiple reduce range arange folding (#13047)
* multi reduce arange folding

* add test

* cvar to var

* add circular_pad_bw test
2025-11-01 22:11:26 +01:00
Sieds Lykles
f97fb703c8
catch group error in matvec heuristic (#13051) 2025-11-01 22:09:35 +01:00
George Hotz
65a0a31475
AMD mi350x matmul from stream (#13040)
* works

* working mfma

* 120 TFLOPS

* regs

* 192 TFLOPS

* try pipelining

* something

* notes

* contract

* linter to 3.11

* that was a bug
2025-11-01 17:55:19 +08:00
George Hotz
b2caf4c2b3
prepare for custom kernel (#13029) 2025-10-31 14:47:37 +08:00
George Hotz
b46229ca51
use shrink in amd_matmul_uop (#13026)
* use shrink in amd_matmul_uop

* colors
2025-10-31 10:43:41 +08:00
b1tg
363a201cc6
fp8 amd cstyle (#12999)
* amd fp8 cstyle

* don't repeat

* space

* lint

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-10-30 10:45:52 -04:00
chenyu
985b6eb95f
ues less typing.cast [pr] (#13002) 2025-10-30 09:29:52 -04:00
George Hotz
4a741e8364
modernize amd uop matmul (#13011)
* modernize amd uop matmul

* progress

* comment

* more comments

* revert that

* mac cleanups

* fix estimates

* format
2025-10-30 17:02:38 +08:00
qazal
66ea3a0be4
put DEFINE_LOCAL counter in context (#13008) 2025-10-30 15:49:26 +08:00
George Hotz
e456f2cb1e
more uop programs (#13007)
* more uop program

* test_matmul_relu

* tests fix
2025-10-30 14:57:59 +08:00
George Hotz
e64d4b3b44
uops programs (#13005)
* uops programs

* work

* work

* more syntax

* more syntax

* comments
2025-10-30 12:28:10 +08:00
George Hotz
5894df059c hotfix: prevent inf loop if reduce splits 2025-10-30 11:21:40 +08:00
George Hotz
2da02f1ae1
add loads at the end (#12988)
* add loads at the end

* simpler

* late load

* tests passing

* fix matvec

* spec test passes

* fix where on load

* fix abs2

* fix more tests
2025-10-30 10:42:19 +08:00
George Hotz
819592ee67 hotfix: disable DoubleMatmul for PTX 2025-10-29 16:37:17 +08:00
George Hotz
30ca3f2af8
all double matmul (#12993)
* fix more double matmuls

* a few more

* all double matmul passes

* opts for flash attention

* fix spec

* comment
2025-10-29 16:25:27 +08:00
George Hotz
1c362736aa
fix more double matmuls (#12991)
* fix more double matmuls

* a few more
2025-10-29 16:09:48 +08:00
George Hotz
e42b4edf8c
remove if stuff (#12992) 2025-10-29 15:29:35 +08:00
George Hotz
8c47cf4323
pcontig double matmul works (#12899)
* pcontig double matmul works

* tests

* contract

* closer

* works-ish

* add that broadcast

* 2 more work

* something

* disable broken ones

* llvm

* align 16
2025-10-29 13:06:43 +08:00
George Hotz
35b6f4148d
delete untested quantize (#12990) 2025-10-29 12:46:32 +08:00
Sieds Lykles
5ce8a1d2f2
Merge adjacent try all permutations for reduce (#12972) 2025-10-29 05:04:54 +01:00
chenyu
9442442cb1
update variable names in search [pr] (#12979)
no lin nor linearize
2025-10-28 15:37:52 -04:00