Commit graph

21 commits

Author SHA1 Message Date
wozeparrot
e770805d21
llama: mxfp8 (#16574) 2026-06-11 22:15:24 -07:00
wozeparrot
f11f63007d
llama: immediate scaling on flag (#16494) 2026-06-04 10:30:00 -07:00
wozeparrot
7dcfd144b6
llama: columnwise fp8 scaling (#16480) 2026-06-02 18:55:45 -07:00
wozeparrot
6787de9f52
llama: fix mp (#16434) 2026-05-29 11:21:43 -07:00
wozeparrot
f86966af56
llama: optim amax margin (#16425) 2026-05-28 20:18:11 -07:00
wozeparrot
36c8ff70c1
llama: use old scale for dequant in optim (#16417) 2026-05-28 15:21:19 -07:00
wozeparrot
dac3743d75
llama: delayed scaling in optim (#16407) 2026-05-27 15:40:03 -07:00
wozeparrot
68d2102fd2
llama: offload master weights (#16355) 2026-05-25 08:48:13 -07:00
chenyu
dcee90aa3f
remove requires_grad use in extra/examples (#16238)
except the ones fed into optimizer
2026-05-16 18:40:26 -04:00
wozeparrot
ab6218bc92
llama mp fixes (#16050) 2026-05-05 15:35:32 -07:00
wozeparrot
9e60e4a7e7
llama: native fp8 (#15733) 2026-04-16 22:16:05 -07:00
wozeparrot
1ca178f379
llama: stochastic rounding (#15456) 2026-03-25 18:16:31 -07:00
wozeparrot
da2031266a
llama: correct 8b init (#15397) 2026-03-24 13:41:41 -07:00
wozeparrot
87c4ec1724
llama: use flat llama (#15353) 2026-03-19 22:12:38 -07:00
wozeparrot
749162bd2f
llama memory tweaks (#15223) 2026-03-12 12:36:23 -07:00
wozeparrot
4544da1c54
llama3 fixes part3 (#15152) 2026-03-05 01:17:54 -08:00
wozeparrot
824ba4386a
llama3 dp fix (#15098) 2026-03-02 22:43:07 -08:00
wozeparrot
a4f6365929
llama3: fstep takes grads (#15069) 2026-03-01 20:05:07 -08:00
wozeparrot
a36a26d4ed
llama3: optim does grad acc in correct order (#14965) 2026-02-23 22:25:13 -08:00
wozeparrot
3cda781876
llama optim offload (#14901) 2026-02-21 08:53:45 -08:00
wozeparrot
95e97ec341
seperate llama optim (#14810) 2026-02-17 13:02:35 -08:00