tinygrad/extra/gemm
George Hotz c62dea6881
ai slop flash attention (it works) (#15401)
* ai slop flash attention (it works)

* speed up, 2 TFLOPS + 7 GB/s

* simpler

* simpler

* optimize

* faster

* warp shuffle

* sqtt: link dispatch to exec (#15396)

* sqtt packet linking infra

python

* javascript

* ~doubly linked list

* ui works

* work

* exec can also highlight the pc, coloring work

* more work

* rm sqtt/model.py, doesn't need to be upstreamed

* viz: no context enters in cli, update llama profile (#15404)

* removed unused named arg in rules [pr] (#15414)

* viz: sqtt printer in viz/cli.py (#15411)

* work

* sqtt timeline in CLI

* format all printers nicely

* s/Showed/Printed

* ansistrip

* sys.exit

* keep colors in list

* work from amd_copy_matmul

* has_more always gets returned

* linter

* don't print colors

* more colors

* wow this is so deep

* work

* minor details

* selected

* improve progress bar

* remove it

* 22, global_load_vaddr is so long

* remove *0 hack in sign, gradient materializes zeros for unconnected nodes (#15416)

Amp-Thread-ID: https://ampcode.com/threads/T-019d1612-6322-706b-a94d-a812400a55cb

Co-authored-by: Amp <amp@ampcode.com>

* works

* cnt=20

* revert that

* uop slice tests

* simpler

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
Co-authored-by: gg <ggordbegli@gmail.com>
Co-authored-by: Amp <amp@ampcode.com>
2026-03-23 16:15:10 +08:00
..
amd_seb kernel4 and 5 in uops (#11411) 2025-07-28 19:35:48 -07:00
max_kernels extra/gemm/max_matmul: start of custom kernels for GEMM (#6926) 2025-03-19 15:04:57 +08:00
.gitignore mi350x 1tflop bf16 gemm in extra (#13702) 2025-12-28 21:45:42 +09:00
amd_asm_matmul.py minimal vec in amd_copy_matmul (#15398) 2026-03-21 14:57:21 +08:00
amd_copy_matmul.py add SHAPED_WMMA (#15400) 2026-03-21 16:16:03 +08:00
amd_flash_attention.py ai slop flash attention (it works) (#15401) 2026-03-23 16:15:10 +08:00
amd_matmul.py remove ScheduleItem and merge it with ExecItem (#13759) 2025-12-19 17:04:24 -04:00
amd_uop_matmul.py add wmma support to amd_copy_matmul (#15384) 2026-03-20 19:02:19 +08:00
amx.py rename allocator methods to not conflict [pr] (#7788) 2024-11-20 00:10:29 +08:00
cdna_asm_gemm.py dtypes.index -> dtypes.weakint (#15377) 2026-03-20 01:08:46 -04:00
cuda_matmul.py rename allocator methods to not conflict [pr] (#7788) 2024-11-20 00:10:29 +08:00
fuzz_matmul.py acc_dtype -> dtype (#9402) 2025-03-10 16:05:30 -04:00
gemm.c only 62 gflops (#2629) 2023-12-05 13:28:24 -08:00
gemm.py only 62 gflops (#2629) 2023-12-05 13:28:24 -08:00
halide_gemm.py add halide example (#10980) 2025-06-26 16:14:57 -07:00
hip_matmul.py rename allocator methods to not conflict [pr] (#7788) 2024-11-20 00:10:29 +08:00
intel_xmx.py Buffer.as_buffer -> Buffer.as_memoryview [pr] (#14535) 2026-02-04 11:31:11 -05:00
max_matmul.py fix: make max_matmul run again (#13085) 2025-11-03 18:09:09 -08:00
metal_conv.py create engine folder and move code (#3948) 2024-03-26 20:38:03 -07:00
metal_matmul.py compile fixes (#10442) 2025-06-06 18:38:37 -04:00
metal_matvec.py compile fixes (#10442) 2025-06-06 18:38:37 -04:00
metal_uop_matmul.py matmul example on metal showing off tensor core (#13033) 2025-10-31 19:40:36 +08:00
mi350x_uop_matmul.py index slicing + allclose (#13071) 2025-11-03 13:01:48 +08:00
mi350x_uop_matmul_2.py more mi350x matmul work (#13138) 2025-11-13 09:09:28 -08:00
real_pmatmul.py pmatmul example + GB/s bugfix [run_process_replay] (#5974) 2024-08-07 22:32:11 -07:00
simple_conv.py acc_dtype -> dtype (#9402) 2025-03-10 16:05:30 -04:00
simple_matmul.py remove ScheduleItem and merge it with ExecItem (#13759) 2025-12-19 17:04:24 -04:00
simple_matvec.py acc_dtype -> dtype (#9402) 2025-03-10 16:05:30 -04:00
tinygrad_nv_matmul.py remove ScheduleItem and merge it with ExecItem (#13759) 2025-12-19 17:04:24 -04:00
torch_gemm.py work from benchmarking tinybox red v2 (#13264) 2025-11-13 16:38:40 -08:00
triton_nv_matmul.py Buffer.as_buffer -> Buffer.as_memoryview [pr] (#14535) 2026-02-04 11:31:11 -05:00
tvm_gemm.py move opt under codegen (#11569) 2025-08-07 14:19:17 -07:00