Commit graph

213 commits

Author SHA1 Message Date
wozeparrot
e770805d21
llama: mxfp8 (#16574) 2026-06-11 22:15:24 -07:00
wozeparrot
fd13080636
deviceless const skip axis check (#16496) 2026-06-03 19:13:20 -07:00
wozeparrot
7dcfd144b6
llama: columnwise fp8 scaling (#16480) 2026-06-02 18:55:45 -07:00
wozeparrot
6795c2d5c9
llama: zero grad this way (#16445) 2026-05-29 20:25:21 -07:00
wozeparrot
c23652e486
llama: minimize peak init mem (#16440) 2026-05-29 18:00:37 -07:00
wozeparrot
36c8ff70c1
llama: use old scale for dequant in optim (#16417) 2026-05-28 15:21:19 -07:00
wozeparrot
3a7a6da7d5
llama: fakedata uses real vocab size (#16389) 2026-05-26 18:58:55 -07:00
wozeparrot
68d2102fd2
llama: offload master weights (#16355) 2026-05-25 08:48:13 -07:00
chenyu
31424cda71
Tensor.requires_grad -> is_param (#16325)
for optimizer
2026-05-21 19:39:57 -04:00
chenyu
dcee90aa3f
remove requires_grad use in extra/examples (#16238)
except the ones fed into optimizer
2026-05-16 18:40:26 -04:00
wozeparrot
b4d267dfd4
llama: only save when small (#16208) 2026-05-14 17:46:29 -07:00
wozeparrot
88ac2ac1fd
llama: cleanups (#16189) 2026-05-13 17:08:06 -07:00
wozeparrot
ab6218bc92
llama mp fixes (#16050) 2026-05-05 15:35:32 -07:00
wozeparrot
528d35e306
llama speed 4 (#15993) 2026-04-30 17:14:41 -07:00
wozeparrot
ef09071073
llama: speed 2 (#15960) 2026-04-28 20:44:37 -07:00
wozeparrot
5e861cd2c4
llama: move llama kernels to llama_kernels (#15952) 2026-04-27 22:48:53 -07:00
wozeparrot
4b908b6e2c
llama: fused ce loss (#15920) 2026-04-24 20:01:24 -07:00
wozeparrot
9d134a2848
llama: fix fakedata timing (#15905) 2026-04-23 21:37:03 -07:00
wozeparrot
06343092c8
llama: combined w13 (#15803) 2026-04-17 22:27:31 -07:00
wozeparrot
9e60e4a7e7
llama: native fp8 (#15733) 2026-04-16 22:16:05 -07:00
chenyu
839d37b7bc
update median_step_time in model_train.py (#15649)
BENCHMARK=5 used to pick the 4th largest, not the middle one
2026-04-08 09:53:59 -04:00
wozeparrot
70dbd35023
llama: move custom_kernel into flat_llama (#15643) 2026-04-08 00:19:14 -07:00
wozeparrot
7e54992bf6
fp8 llama (#15588)
Co-authored-by: qazal <qazal.software@gmail.com>
2026-04-04 18:24:57 -07:00
qazal
09f60d80fd
llama: fix FP8=1 FAKEDATA=1 (#15564) 2026-04-01 20:53:03 +09:00
wozeparrot
0c3e438229
llama: mllog (#15502) 2026-03-28 11:18:25 -07:00
wozeparrot
a65e958be9
llama: new apply_grad (#15503) 2026-03-26 19:39:25 -07:00
Christopher Milan
bc180a963c
deprecate <dev>=1 in favor of DEV=<dev> (#15467)
* start work on target

* add test

* update actions to use DEV

* update docs

* update readmes

* tests need that too

* update example

* update tests (comments)

* fix that test

* ruff

* mypy

* oops

* remove getenvs

* don't add Target yet

* and the test

* lint

* and docs

* more stuff

* assert

* few more fixes

* test assert
2026-03-26 03:48:03 -04:00
wozeparrot
da2031266a
llama: correct 8b init (#15397) 2026-03-24 13:41:41 -07:00
wozeparrot
87c4ec1724
llama: use flat llama (#15353) 2026-03-19 22:12:38 -07:00
wozeparrot
a191ac0566
llama: use mlperf model (#15257) 2026-03-13 08:08:32 -07:00
wozeparrot
4fab320abe
llama: clean (#15224) 2026-03-11 13:33:59 -07:00
wozeparrot
05d6d9120a
llama offload null (#15222) 2026-03-11 10:04:31 -07:00
wozeparrot
525a178966
llama: jit more (#15199) 2026-03-10 11:04:59 +08:00
wozeparrot
4544da1c54
llama3 fixes part3 (#15152) 2026-03-05 01:17:54 -08:00
wozeparrot
92c16810ac
feat: per device mem_used (#15100) 2026-03-03 01:31:28 -08:00
wozeparrot
824ba4386a
llama3 dp fix (#15098) 2026-03-02 22:43:07 -08:00
wozeparrot
a4f6365929
llama3: fstep takes grads (#15069) 2026-03-01 20:05:07 -08:00
wozeparrot
cfc5cf65ad
llama3: vocab padding fix + jit copies on fakedata (#15067) 2026-02-28 08:44:55 -08:00
wozeparrot
d941dd5aeb
llama3: pad vocab when mp sharding (#14998) 2026-02-25 00:04:06 -08:00
wozeparrot
e1c9985715
llama3: better time keeping (#14999) 2026-02-24 22:42:05 -08:00
wozeparrot
8d9545e09e
llama3: correctly shard wqkv (#14978) 2026-02-23 23:57:10 -08:00
wozeparrot
3cda781876
llama optim offload (#14901) 2026-02-21 08:53:45 -08:00
wozeparrot
95e97ec341
seperate llama optim (#14810) 2026-02-17 13:02:35 -08:00
wozeparrot
4b5d3bda1f
llama3: data seed (#14681) 2026-02-11 19:04:40 -08:00
wozeparrot
a60220bed9
llama3: move dl to numpy & jit more (#14677)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2026-02-10 18:16:40 -08:00
wozeparrot
4845e42135
llama3 gradacc fixes (#14414) 2026-01-28 19:12:39 -08:00
nimlgen
aec1ae0de1
llama: set manual_seed (#14409) 2026-01-28 14:40:00 -08:00
George Hotz
0c6b3f50aa
add marker to llama training (#14401) 2026-01-28 22:44:28 +08:00
wozeparrot
e496547720
llama3 gradacc (#14291) 2026-01-27 19:48:10 -08:00
wozeparrot
963c59ebdb
fix: pull fixes from gradacc branch (#14296) 2026-01-22 23:07:54 -08:00