|
cast_amax
|
llama: only gemm + fa custom kernel (#16180)
|
2026-05-12 21:03:49 -07:00 |
|
fp8_transpose
|
llama speed 6 (#16071)
|
2026-05-06 20:51:03 -07:00 |
|
fused_ce
|
fused ce llama kernel in UOps (#16263)
|
2026-05-20 19:45:28 +09:00 |
|
fused_pad_grad_accum
|
llama: speed 2 (#15960)
|
2026-04-28 20:44:37 -07:00 |
|
fused_rmsnorm_mul_quantize_fp8
|
llama mp fixes (#16050)
|
2026-05-05 15:35:32 -07:00 |
|
quantize_fp8_delayed
|
llama mp fixes (#16050)
|
2026-05-05 15:35:32 -07:00 |
|
__init__.py
|
llama mp fixes (#16050)
|
2026-05-05 15:35:32 -07:00 |