tinygrad/extra/gemm/max_kernels
Francis Lam 1e5d9ad8f7
extra/gemm/max_matmul: start of custom kernels for GEMM (#6926)
* extra/gemm/max_matmul: start of custom kernels for GEMM

* add an unoptimized FP16/FP16 MMA example

* add slow 3-stage fp16 acc example

* add correct 3-stage pipeline with unswizzled/flat smem input (slow)

* add acc fp16 example with 3 stages and swizzle (no bank conflicts)

* add max version of NV fp16_fp16_fp16

* fix up comments and removed unused code in max variations

* add start of no_xor example

* fix to account for UOps to Ops
2025-03-19 15:04:57 +08:00
..
nv.fp16_fp16_fp16.2_stage.cu extra/gemm/max_matmul: start of custom kernels for GEMM (#6926) 2025-03-19 15:04:57 +08:00
nv.fp16_fp16_fp16.3_stage.cu extra/gemm/max_matmul: start of custom kernels for GEMM (#6926) 2025-03-19 15:04:57 +08:00
nv.fp16_fp16_fp16.3_stage_swizzled.cu extra/gemm/max_matmul: start of custom kernels for GEMM (#6926) 2025-03-19 15:04:57 +08:00
nv.fp16_fp16_fp16.max.cu extra/gemm/max_matmul: start of custom kernels for GEMM (#6926) 2025-03-19 15:04:57 +08:00
nv.fp16_fp16_fp16.no_xor.cu extra/gemm/max_matmul: start of custom kernels for GEMM (#6926) 2025-03-19 15:04:57 +08:00
nv.fp16_fp32_fp16.hcopt.cu extra/gemm/max_matmul: start of custom kernels for GEMM (#6926) 2025-03-19 15:04:57 +08:00
nv.fp16_fp32_fp32.2_stage_swizzled_smem_input.cu extra/gemm/max_matmul: start of custom kernels for GEMM (#6926) 2025-03-19 15:04:57 +08:00
nv.fp16_fp32_fp32.flat_smem_input.cu extra/gemm/max_matmul: start of custom kernels for GEMM (#6926) 2025-03-19 15:04:57 +08:00
nv.fp16_fp32_fp32.max.cu extra/gemm/max_matmul: start of custom kernels for GEMM (#6926) 2025-03-19 15:04:57 +08:00
nv.fp16_fp32_fp32.swizzled_smem_input.cu extra/gemm/max_matmul: start of custom kernels for GEMM (#6926) 2025-03-19 15:04:57 +08:00