Commit graph

14 commits

Author SHA1 Message Date
qazal
f998b9930a
fp8 gemm inv_scale in epilogue (#16625)
* fuse scale

* remove python inv_scale

* more inv_scale removal

* more cleanups

* cleaner

* diff polish

* work

* rename

* simpler

* simpler

* compute

* c

* Revert "c"

This reverts commit 8941fec7ca.

* Revert "compute"

This reverts commit 9db573a6d3.

* Revert "simpler"

This reverts commit 910ad33f87.

* Revert "simpler"

This reverts commit bf75d235a1.

* s_g

* update types

* less diff noise

* remove
2026-06-15 18:44:41 +09:00
qazal
854eac09c6
llama: no E_ copy after bf16 GEMM (#16458) 2026-06-02 14:14:13 +09:00
qazal
29b47a0057
llama: update local amax implementation after ParamArgs change (#16446)
* local amax failing test

* update _local_abs_max_fxn
2026-05-30 16:55:43 +09:00
qazal
452c7d4230
llama: don't allocate grad_xw13 in bf16 (#16359) 2026-05-28 04:33:07 +09:00
qazal
eecd4706ff
fix mailbox comment, add types (#16360) 2026-05-25 22:24:00 +09:00
qazal
bbfe4f80ec
quantize_fp8 kernels in uops (#16288)
* add tests

* simple UOp kernel is n^2

* fast kernel matching c++, opts_to_apply=()

* remove cpp

* simple o(n) kernel, two passes

* fuse the loops

* works on DEV=CPU

* multi regression test

* fix multi, this can possibly be its own bugfix

* test cleanups

* minimal diff

* match C in UOps

* Revert "match C in UOps"

This reverts commit 0bef740c30.

* edit test

* match speed with C try 2

* needs_second_gpu

* cleanup
2026-05-22 20:54:06 +09:00
wozeparrot
afc5bfa183
llama: remove fused grad accum (#16301) 2026-05-21 09:38:40 -07:00
qazal
1e0fffe256
fused ce llama kernel in UOps (#16263)
* work

* using uops

* delete things

* work

* work

* higher level uops

* cleanups
2026-05-20 19:45:28 +09:00
wozeparrot
e97f2c1114
llama: only gemm + fa custom kernel (#16180)
* llama: tie store to grad directly

* llama: set mp flags

* llama: non fused grad fp8 quantize path
2026-05-12 21:03:49 -07:00
wozeparrot
730fa66bf3
llama speed 6 (#16071) 2026-05-06 20:51:03 -07:00
wozeparrot
ab6218bc92
llama mp fixes (#16050) 2026-05-05 15:35:32 -07:00
wozeparrot
ef09071073
llama: speed 2 (#15960) 2026-04-28 20:44:37 -07:00
qazal
b3f0f8d349
llama: fix missing label_smoothing arg (#15955) 2026-04-29 02:12:14 +09:00
wozeparrot
5e861cd2c4
llama: move llama kernels to llama_kernels (#15952) 2026-04-27 22:48:53 -07:00