tinygrad/extra/llama_kernels
George Hotz 0a8e61d0c5
switch to the new memory coaleser [pr] (#16716)
* switch to the new memory coalese

* move that stuff

* copy in allowed length logic

* mulitple buffers

* new coalese is better

* fine

* earlier

* fixes

* work

* work

* valid

* stack on index const
2026-06-23 18:03:48 -07:00
..
cast_amax fp8 gemm inv_scale in epilogue (#16625) 2026-06-15 18:44:41 +09:00
fp8_transpose llama speed 6 (#16071) 2026-05-06 20:51:03 -07:00
fused_ce llama: no E_ copy after bf16 GEMM (#16458) 2026-06-02 14:14:13 +09:00
fused_rmsnorm_mul_quantize_fp8 fp8 gemm inv_scale in epilogue (#16625) 2026-06-15 18:44:41 +09:00
fused_silu_mul_quantize_mxfp8 llama: fused silu mul quantize mxfp8 (#16704) 2026-06-23 16:59:50 -07:00
quantize_fp8_delayed switch to the new memory coaleser [pr] (#16716) 2026-06-23 18:03:48 -07:00
quantize_mxfp8_fused llama: fused quantize mxfp8 (#16667) 2026-06-18 16:02:28 -07:00
rmsnorm llama: move llama kernels to llama_kernels (#15952) 2026-04-27 22:48:53 -07:00
__init__.py llama: update local amax implementation after ParamArgs change (#16446) 2026-05-30 16:55:43 +09:00