Commit graph

29 commits

Author SHA1 Message Date
wozeparrot
bba611bb59
gemm: fix mxfp8 on more shapes (#16677) 2026-06-19 13:28:53 -07:00
wozeparrot
d37248c3ec
gemm: fix mxfp8 on odd shapes (#16664) 2026-06-18 12:03:59 -07:00
wozeparrot
bed0c343a3
faster mxfp8 gemm (#16656) 2026-06-17 22:35:36 -07:00
qazal
f998b9930a
fp8 gemm inv_scale in epilogue (#16625)
* fuse scale

* remove python inv_scale

* more inv_scale removal

* more cleanups

* cleaner

* diff polish

* work

* rename

* simpler

* simpler

* compute

* c

* Revert "c"

This reverts commit 8941fec7ca.

* Revert "compute"

This reverts commit 9db573a6d3.

* Revert "simpler"

This reverts commit 910ad33f87.

* Revert "simpler"

This reverts commit bf75d235a1.

* s_g

* update types

* less diff noise

* remove
2026-06-15 18:44:41 +09:00
qazal
2e77bd01db
fp8 gemm cleanup (#16607) 2026-06-13 13:17:32 +09:00
wozeparrot
67a4f129c2
llama: fix bf16 gemm oob (#16603) 2026-06-12 19:43:05 -07:00
qazal
4d34590b7d
llama: less E kernels (#16517) 2026-06-12 19:49:25 +09:00
wozeparrot
e770805d21
llama: mxfp8 (#16574) 2026-06-11 22:15:24 -07:00
wozeparrot
2bdc360606
gemm: mxfp8 hipkittens gemm (#16541)
* gemm: mxfp8 hipkittens gemm

* feat: update hipkittens

* feat: kernel signature

* clean: just kernel

* feat: from tinygrad

* feat: test

* fix: add back utils

* clean: no diff

* clean: no diff
2026-06-09 15:20:05 -07:00
wozeparrot
5ef30005fa
update hipkittens (#16544) 2026-06-08 18:53:25 -07:00
qazal
3b1a5f9770
llama: a_bT and aT_b bf16 gemms (#16487)
* hk_bf16_gemm

* enable in 8b

* cleanups

* rename to USE_HK_BF16_GEMM

* work

* work

* work

* work

* change the gemms

* work

* work

* set as default

* work

* change
2026-06-04 23:30:21 +09:00
qazal
bfb2d1f89a
Revert "fp8 gemm speedup (#16236)" (#16245)
This reverts commit d95bf394e1.
2026-05-19 02:01:44 +09:00
qazal
d95bf394e1
fp8 gemm speedup (#16236)
* add asm_gemm option

* milestone

* work

* edit

* only the fast kernel

* diff
2026-05-17 04:58:28 +09:00
wozeparrot
528d35e306
llama speed 4 (#15993) 2026-04-30 17:14:41 -07:00
chenyu
9192c93b7e
Tensor.invalid -> Tesnor.invalids (#15849)
matches ones and zeros, and to not share name with UOp.invalid
2026-04-21 11:19:51 -04:00
wozeparrot
9e60e4a7e7
llama: native fp8 (#15733) 2026-04-16 22:16:05 -07:00
wozeparrot
457508d5a0
llama: save more 2 (#15681) 2026-04-11 01:03:36 -07:00
wozeparrot
7e54992bf6
fp8 llama (#15588)
Co-authored-by: qazal <qazal.software@gmail.com>
2026-04-04 18:24:57 -07:00
Christopher Milan
645d45d968
DEV has arch (#15577)
Co-authored-by: Comma Device <device@comma.ai>
2026-04-03 19:17:19 -04:00
qazal
8feb8edc68
gemm/asm: add fp8 support to cdna asm_gemm (#15542)
* work

* hmm, mixins

* rhs_transposed

* also fix the dtype

* check for hipcc

* Exception

* select dev

* default
2026-03-31 19:32:54 +09:00
George Hotz
6e196195d8
add test for flat llama (#15327)
* add test for flat llama

* simpler

* back to split w1/w3

* env

* still too much ram

* invalid
2026-03-18 15:16:33 +08:00
wozeparrot
be23772d43
llama3 fixes part2 (#15150) 2026-03-04 23:43:50 -08:00
wozeparrot
4e9b85ecfd
fa: pull inputs out of call (#15127) 2026-03-04 03:15:49 -08:00
George Hotz
8ebd24637b
fix fa forward building with clang 22 (#15124)
* fix fa forward building with clang 22

* fix: override rocm path

---------

Co-authored-by: Woze Parrot <wozeparrot@gmail.com>
2026-03-04 02:32:25 -08:00
wozeparrot
df23057984
fa: change bwd grid dim + unshuffle using mops (#15068) 2026-03-04 01:23:40 -08:00
wozeparrot
25565b2410
fa: test for mp (#14907) 2026-02-22 21:47:36 -08:00
wozeparrot
9317e96881
fa: explicitly pass shapes (#14857) 2026-02-19 05:26:16 -08:00
wozeparrot
45aebe1572
hipkittens fa backward (#14723) 2026-02-16 00:38:44 -08:00
wozeparrot
0613c0ac0c
hipkittens fa forward (#14692) 2026-02-12 20:16:43 -08:00