tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-06-24 02:14:17 +00:00

History

Francis Lam 1e5d9ad8f7 extra/gemm/max_matmul: start of custom kernels for GEMM (#6926 ) * extra/gemm/max_matmul: start of custom kernels for GEMM * add an unoptimized FP16/FP16 MMA example * add slow 3-stage fp16 acc example * add correct 3-stage pipeline with unswizzled/flat smem input (slow) * add acc fp16 example with 3 stages and swizzle (no bank conflicts) * add max version of NV fp16_fp16_fp16 * fix up comments and removed unused code in max variations * add start of no_xor example * fix to account for UOps to Ops		2025-03-19 15:04:57 +08:00
..
max_kernels	extra/gemm/max_matmul: start of custom kernels for GEMM (#6926 )	2025-03-19 15:04:57 +08:00
.gitignore	fast amd gemm (#9318 )	2025-03-03 12:01:14 +08:00
amd_matmul.py	fast amd gemm (#9318 )	2025-03-03 12:01:14 +08:00
amx.py	rename allocator methods to not conflict [pr] (#7788 )	2024-11-20 00:10:29 +08:00
cuda_matmul.py	rename allocator methods to not conflict [pr] (#7788 )	2024-11-20 00:10:29 +08:00
fuzz_matmul.py	acc_dtype -> dtype (#9402 )	2025-03-10 16:05:30 -04:00
gemm.c	only 62 gflops (#2629 )	2023-12-05 13:28:24 -08:00
gemm.py	only 62 gflops (#2629 )	2023-12-05 13:28:24 -08:00
hip_matmul.py	rename allocator methods to not conflict [pr] (#7788 )	2024-11-20 00:10:29 +08:00
intel_xmx.py	Intel XMX Tensor Core Support (#5622 )	2024-08-16 09:19:21 -07:00
jax_pmatmul.py	jax parallel matmul example	2023-11-28 13:48:11 -08:00
kernel8_batched_gmem.s	fast amd gemm (#9318 )	2025-03-03 12:01:14 +08:00
max_matmul.py	extra/gemm/max_matmul: start of custom kernels for GEMM (#6926 )	2025-03-19 15:04:57 +08:00
metal_conv.py	create engine folder and move code (#3948 )	2024-03-26 20:38:03 -07:00
metal_matmul.py	rename allocator methods to not conflict [pr] (#7788 )	2024-11-20 00:10:29 +08:00
metal_matvec.py	rename allocator methods to not conflict [pr] (#7788 )	2024-11-20 00:10:29 +08:00
mlx_matmul.py	mlx benchmark, a lil slower than tg	2023-12-05 19:00:43 -08:00
real_pmatmul.py	pmatmul example + GB/s bugfix [run_process_replay] (#5974 )	2024-08-07 22:32:11 -07:00
simple_conv.py	acc_dtype -> dtype (#9402 )	2025-03-10 16:05:30 -04:00
simple_matmul.py	redo simple_matmul change (#9450 )	2025-03-14 17:53:52 -04:00
simple_matvec.py	acc_dtype -> dtype (#9402 )	2025-03-10 16:05:30 -04:00
tf_gemm.py	Add tensorflow GEMM benchmark script (#1000 )	2023-06-18 10:57:45 -07:00
tinygrad_nv_matmul.py	remove graph [pr] (#7085 )	2024-10-16 11:40:07 +08:00
torch_gemm.py	speed docs + upgrades [pr] (#8964 )	2025-02-08 17:28:52 +08:00
triton_nv_matmul.py	BufferSpec and ProgramSpec [pr] (#7814 )	2024-11-21 12:18:05 +08:00
tvm_gemm.py	CLANG -> CPU (#9189 )	2025-02-20 18:03:09 -05:00