Commit graph

14 commits

Author SHA1 Message Date
wozeparrot
480ad264a4
llama: per device amax (#15735) 2026-04-14 19:01:17 -07:00
wozeparrot
457508d5a0
llama: save more 2 (#15681) 2026-04-11 01:03:36 -07:00
wozeparrot
590464c8d8
llama: only support wqkv path + cleanups (#15680)
* llama: only support wqkv path + cleanups

* llama: missing transpose
2026-04-11 07:39:27 +08:00
wozeparrot
55bcd7cc9e
llama amax outside (#15670) 2026-04-09 23:08:03 -07:00
qazal
39a029ec55
remove ASM_GEMM context var (#15645) 2026-04-08 18:02:40 +09:00
wozeparrot
70dbd35023
llama: move custom_kernel into flat_llama (#15643) 2026-04-08 00:19:14 -07:00
wozeparrot
7e54992bf6
fp8 llama (#15588)
Co-authored-by: qazal <qazal.software@gmail.com>
2026-04-04 18:24:57 -07:00
wozeparrot
a65e958be9
llama: new apply_grad (#15503) 2026-03-26 19:39:25 -07:00
wozeparrot
da2031266a
llama: correct 8b init (#15397) 2026-03-24 13:41:41 -07:00
wozeparrot
87c4ec1724
llama: use flat llama (#15353) 2026-03-19 22:12:38 -07:00
George Hotz
4091d37e8e
flat llama step work (#15355)
* flat llama step work

* fp8 support

* blacklisted matmul

* chestertons fence
2026-03-20 09:06:12 +08:00
George Hotz
5524916e39
llama compute gradients explicitly + 243 GB of RAM on MP=8 (#15343)
* llama compute gradients explicitly

* apply grads

* fix multi issue

* multi BUFFER_VIEW support

* simpler

* skip the flaky test
2026-03-18 19:54:40 +08:00
George Hotz
6e196195d8
add test for flat llama (#15327)
* add test for flat llama

* simpler

* back to split w1/w3

* env

* still too much ram

* invalid
2026-03-18 15:16:33 +08:00
George Hotz
2605840ee2
flat llama (#15324)
* FlatTransformer

* works

* pass in buffer views

* print stuff

* print

* bugfixes
2026-03-17 19:39:55 +08:00