mirror of
https://github.com/tinygrad/tinygrad.git
synced 2026-06-24 02:14:17 +00:00
* extra/gemm/max_matmul: start of custom kernels for GEMM * add an unoptimized FP16/FP16 MMA example * add slow 3-stage fp16 acc example * add correct 3-stage pipeline with unswizzled/flat smem input (slow) * add acc fp16 example with 3 stages and swizzle (no bank conflicts) * add max version of NV fp16_fp16_fp16 * fix up comments and removed unused code in max variations * add start of no_xor example * fix to account for UOps to Ops |
||
|---|---|---|
| .. | ||
| nv.fp16_fp16_fp16.2_stage.cu | ||
| nv.fp16_fp16_fp16.3_stage.cu | ||
| nv.fp16_fp16_fp16.3_stage_swizzled.cu | ||
| nv.fp16_fp16_fp16.max.cu | ||
| nv.fp16_fp16_fp16.no_xor.cu | ||
| nv.fp16_fp32_fp16.hcopt.cu | ||
| nv.fp16_fp32_fp32.2_stage_swizzled_smem_input.cu | ||
| nv.fp16_fp32_fp32.flat_smem_input.cu | ||
| nv.fp16_fp32_fp32.max.cu | ||
| nv.fp16_fp32_fp32.swizzled_smem_input.cu | ||