mirrors/tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-06-24 02:14:17 +00:00

Author	SHA1	Message	Date
qazal	f998b9930a	fp8 gemm inv_scale in epilogue (#16625 ) * fuse scale * remove python inv_scale * more inv_scale removal * more cleanups * cleaner * diff polish * work * rename * simpler * simpler * compute * c * Revert "c" This reverts commit `8941fec7ca`. * Revert "compute" This reverts commit `9db573a6d3`. * Revert "simpler" This reverts commit `910ad33f87`. * Revert "simpler" This reverts commit `bf75d235a1`. * s_g * update types * less diff noise * remove	2026-06-15 18:44:41 +09:00
qazal	854eac09c6	llama: no E_ copy after bf16 GEMM (#16458 )	2026-06-02 14:14:13 +09:00
qazal	29b47a0057	llama: update local amax implementation after ParamArgs change (#16446 ) * local amax failing test * update _local_abs_max_fxn	2026-05-30 16:55:43 +09:00
qazal	452c7d4230	llama: don't allocate grad_xw13 in bf16 (#16359 )	2026-05-28 04:33:07 +09:00
qazal	eecd4706ff	fix mailbox comment, add types (#16360 )	2026-05-25 22:24:00 +09:00
qazal	bbfe4f80ec	quantize_fp8 kernels in uops (#16288 ) * add tests * simple UOp kernel is n^2 * fast kernel matching c++, opts_to_apply=() * remove cpp * simple o(n) kernel, two passes * fuse the loops * works on DEV=CPU * multi regression test * fix multi, this can possibly be its own bugfix * test cleanups * minimal diff * match C in UOps * Revert "match C in UOps" This reverts commit `0bef740c30`. * edit test * match speed with C try 2 * needs_second_gpu * cleanup	2026-05-22 20:54:06 +09:00
wozeparrot	afc5bfa183	llama: remove fused grad accum (#16301 )	2026-05-21 09:38:40 -07:00
qazal	1e0fffe256	fused ce llama kernel in UOps (#16263 ) * work * using uops * delete things * work * work * higher level uops * cleanups	2026-05-20 19:45:28 +09:00
wozeparrot	e97f2c1114	llama: only gemm + fa custom kernel (#16180 ) * llama: tie store to grad directly * llama: set mp flags * llama: non fused grad fp8 quantize path	2026-05-12 21:03:49 -07:00
wozeparrot	730fa66bf3	llama speed 6 (#16071 )	2026-05-06 20:51:03 -07:00
wozeparrot	ab6218bc92	llama mp fixes (#16050 )	2026-05-05 15:35:32 -07:00
wozeparrot	ef09071073	llama: speed 2 (#15960 )	2026-04-28 20:44:37 -07:00
qazal	b3f0f8d349	llama: fix missing label_smoothing arg (#15955 )	2026-04-29 02:12:14 +09:00
wozeparrot	5e861cd2c4	llama: move llama kernels to llama_kernels (#15952 )	2026-04-27 22:48:53 -07:00

14 commits