Commit graph

1,892 commits

Author SHA1 Message Date
nimlgen
5a9227b30a
hcq2: rebind var params (#16610) 2026-06-13 14:55:52 +03:00
qazal
2e77bd01db
fp8 gemm cleanup (#16607) 2026-06-13 13:17:32 +09:00
Christopher Milan
bcdb988df0
split comma benchmark, dsp on c4 [PR] (#16598) 2026-06-12 23:26:05 -04:00
wozeparrot
67a4f129c2
llama: fix bf16 gemm oob (#16603) 2026-06-12 19:43:05 -07:00
nimlgen
2bfdf85f87
hcq2: move pre bufferize (#16589)
* hcq2: move pre bufferize

* x
2026-06-12 16:11:59 +03:00
qazal
4d34590b7d
llama: less E kernels (#16517) 2026-06-12 19:49:25 +09:00
qazal
12f4cf0e49
rename amd/test_custom_kernel.py to test_asm_kernel (#16586)
* rename amd/test_custom_kernel.py to test_asm_kernel

* update
2026-06-12 16:11:01 +09:00
wozeparrot
e770805d21
llama: mxfp8 (#16574) 2026-06-11 22:15:24 -07:00
nimlgen
e5f498de3b
hcq2: debug=2 info (#16569)
* hcq2: debug=2 info

* t

* x

* hcq2: debug=2 info

* x
2026-06-11 19:52:01 +03:00
wozeparrot
c38d6a7e3a
mxfp8 part 2 (#16561) 2026-06-10 23:36:11 -07:00
George Hotz
7e6d617935
addrspace cleanups (#16565)
* addrspace cleanups

* bumps

* eh, relax a little
2026-06-10 15:57:18 -07:00
wozeparrot
2bdc360606
gemm: mxfp8 hipkittens gemm (#16541)
* gemm: mxfp8 hipkittens gemm

* feat: update hipkittens

* feat: kernel signature

* clean: just kernel

* feat: from tinygrad

* feat: test

* fix: add back utils

* clean: no diff

* clean: no diff
2026-06-09 15:20:05 -07:00
nimlgen
2ab2d51099
hcq2: fix repeated calls (#16552) 2026-06-09 19:11:42 +03:00
nimlgen
fa31c744b9
hcq2: cleaner (#16550) 2026-06-09 16:33:05 +03:00
wozeparrot
5ef30005fa
update hipkittens (#16544) 2026-06-08 18:53:25 -07:00
nimlgen
95d63d6c07
hcq2: lower to ins (#16535)
* hcq2: lower to ins

* pm4

* f
2026-06-08 16:15:30 +03:00
nimlgen
8baca185d5
hcq2: add kfd (#16537) 2026-06-08 13:48:27 +03:00
wozeparrot
a1ec32cfd2
llama: current grad scaling (#16518) 2026-06-05 15:39:41 -07:00
nimlgen
5ebd44aa12
hcq2: merge queues (#16514)
* hcq2: mergw queues

* cleaner
2026-06-05 21:20:25 +03:00
qazal
79a13310b3
viz: kernel_graph.txt unique is per schedule (#16511) 2026-06-05 16:17:28 +09:00
nimlgen
3838c8df1b
hcq2: move global sync (#16504) 2026-06-04 17:32:40 +03:00
chenyu
0faaf6df26
remove kwargs from arange and linspace [PR] (#16505)
it used to have requires_grad and device, now both are removed
2026-06-04 10:32:37 -04:00
qazal
3b1a5f9770
llama: a_bT and aT_b bf16 gemms (#16487)
* hk_bf16_gemm

* enable in 8b

* cleanups

* rename to USE_HK_BF16_GEMM

* work

* work

* work

* work

* change the gemms

* work

* work

* set as default

* work

* change
2026-06-04 23:30:21 +09:00
nimlgen
11af81f96f
hcq2: cleaner (#16502) 2026-06-04 15:26:37 +03:00
chenyu
2c915c61ed
no CONST(DEVICE) in torch_backend (#16499) 2026-06-04 00:26:47 -04:00
qazal
f7f03bd7e5
viz: better name for src id in kernel_graph.txt (#16495)
* viz: better name for src id in kernel_graph.txt

* better order

* cleanup
2026-06-04 11:09:29 +09:00
nimlgen
6f2a2857c8
hcq2: refactor deps (#16490) 2026-06-03 23:20:24 +03:00
chenyu
8a4203638a
make full with buffer=False deviceless (#16483)
affects arange and eye
2026-06-03 12:35:59 -04:00
qazal
405866f2b7
viz: improve kernel_graph.py usability (#16486)
* better default

* always format kernel output

* also show ref

* sched num
2026-06-03 21:12:44 +09:00
wozeparrot
7dcfd144b6
llama: columnwise fp8 scaling (#16480) 2026-06-02 18:55:45 -07:00
George Hotz
ffadd7a315
remove intel and amx support (#16482) 2026-06-02 18:53:05 -07:00
nimlgen
99e37b1ee3
hcq2: deps (#16459)
* start

* sin

* f
2026-06-02 22:34:25 +03:00
qazal
854eac09c6
llama: no E_ copy after bf16 GEMM (#16458) 2026-06-02 14:14:13 +09:00
chenyu
7e7b481ba7
less CONST(DEVICE) (#16452)
* less CONST(DEVICE)

no DEVICE for single device in const_like, multi has other issues

* maybe

* that?
2026-06-01 15:55:12 -04:00
qazal
29b47a0057
llama: update local amax implementation after ParamArgs change (#16446)
* local amax failing test

* update _local_abs_max_fxn
2026-05-30 16:55:43 +09:00
Christopher Milan
434cfa96a3
ci: no fetch in backend tests (#16438)
should make for less actions cache thrashing
2026-05-29 17:11:16 -04:00
nimlgen
d69aca41a9
hcq2: rework pm_bufferize (#16431) 2026-05-29 22:09:52 +03:00
George Hotz
1e7f1dcf49
add ParamArgs [pr] (#16421)
* add ParamArgs

* fix export

* cleanups

* fixes

* simpler
2026-05-28 19:17:17 -07:00
nimlgen
b0e49afaf1
hcq2: new multi (#16413)
* hcq2: new multi

* op
2026-05-28 22:16:10 +03:00
George Hotz
edca5df25a
flip offset and shape in pad and shrink (#16414)
* flip offset and shape in pad and shrink

* dumb test
2026-05-28 11:58:19 -07:00
George Hotz
8ee3a37524
shrink/pad use (new_shape, offset) (#16405)
* shrink uses offset and shape

* pad does too

* fix
2026-05-27 15:13:08 -07:00
qazal
452c7d4230
llama: don't allocate grad_xw13 in bf16 (#16359) 2026-05-28 04:33:07 +09:00
nimlgen
0c385e31c6
hcq2 rewrite (#16375)
* hcq2 rewrite

* fi

* x

* simpler
2026-05-27 22:25:35 +03:00
chenyu
c33b767407
bring back test and torch backend change for unique const (#16403) 2026-05-27 15:16:08 -04:00
chenyu
945ed4f689
revert const unique changes (#16395) 2026-05-27 00:06:41 -04:00
George Hotz
156a4438d9
rename BUFFER_VIEW to SLICE (#16391)
* rename BUFFER_VIEW to SLICE

* fix comments
2026-05-26 18:15:00 -07:00
chenyu
d861c50dce
remove unique_const (#16382) 2026-05-26 13:53:31 -04:00
chenyu
9b00defc8c
Revert "remove unique_const (#16372)" (#16380)
This reverts commit 09019d6761.
2026-05-26 12:30:07 -04:00
chenyu
09019d6761
remove unique_const (#16372)
* remove unique_const

* fix SDWA thing

* that?
2026-05-26 12:18:03 -04:00
George Hotz
7f1b02854e
bufferview offset is units of input dtype (#16378) 2026-05-26 08:49:31 -07:00