tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-06-24 02:14:17 +00:00

History

Francis Lam 1e5d9ad8f7 extra/gemm/max_matmul: start of custom kernels for GEMM (#6926 ) * extra/gemm/max_matmul: start of custom kernels for GEMM * add an unoptimized FP16/FP16 MMA example * add slow 3-stage fp16 acc example * add correct 3-stage pipeline with unswizzled/flat smem input (slow) * add acc fp16 example with 3 stages and swizzle (no bank conflicts) * add max version of NV fp16_fp16_fp16 * fix up comments and removed unused code in max variations * add start of no_xor example * fix to account for UOps to Ops		2025-03-19 15:04:57 +08:00
..
nv.fp16_fp16_fp16.2_stage.cu	extra/gemm/max_matmul: start of custom kernels for GEMM (#6926 )	2025-03-19 15:04:57 +08:00
nv.fp16_fp16_fp16.3_stage.cu	extra/gemm/max_matmul: start of custom kernels for GEMM (#6926 )	2025-03-19 15:04:57 +08:00
nv.fp16_fp16_fp16.3_stage_swizzled.cu	extra/gemm/max_matmul: start of custom kernels for GEMM (#6926 )	2025-03-19 15:04:57 +08:00
nv.fp16_fp16_fp16.max.cu	extra/gemm/max_matmul: start of custom kernels for GEMM (#6926 )	2025-03-19 15:04:57 +08:00
nv.fp16_fp16_fp16.no_xor.cu	extra/gemm/max_matmul: start of custom kernels for GEMM (#6926 )	2025-03-19 15:04:57 +08:00
nv.fp16_fp32_fp16.hcopt.cu	extra/gemm/max_matmul: start of custom kernels for GEMM (#6926 )	2025-03-19 15:04:57 +08:00
nv.fp16_fp32_fp32.2_stage_swizzled_smem_input.cu	extra/gemm/max_matmul: start of custom kernels for GEMM (#6926 )	2025-03-19 15:04:57 +08:00
nv.fp16_fp32_fp32.flat_smem_input.cu	extra/gemm/max_matmul: start of custom kernels for GEMM (#6926 )	2025-03-19 15:04:57 +08:00
nv.fp16_fp32_fp32.max.cu	extra/gemm/max_matmul: start of custom kernels for GEMM (#6926 )	2025-03-19 15:04:57 +08:00
nv.fp16_fp32_fp32.swizzled_smem_input.cu	extra/gemm/max_matmul: start of custom kernels for GEMM (#6926 )	2025-03-19 15:04:57 +08:00