tinygrad/tinygrad
qazal 616e9c1483
CDNA assembly gemm in tensor.py with flag (#14310)
* work

* work

* the assembly

* remove the old one

* remove ws bufs, assert splitk

* notes cleanup

* work

* gemm args

* gemm in mixins would be nice

* add gemm gradient

* print counters

* the realize is for DEBUG=2 aesthetics

* dedup

* rewrite to python dsl, no list copies

* leave that

* add B, M, N, K to gemm name

* it's M0 not NULL

* fp16 support

* test cleanup + more gemms

* work from viz

* more work

* gemm batch_size

* xccg path work

* tiny comments on the label naming

* s_waitcnt
2026-01-31 22:34:14 +09:00
..
apps return types for all math.py function (#14413) 2026-01-28 20:10:11 -05:00
codegen add few more types [pr] (#14425) 2026-01-29 14:04:09 -05:00
engine correct var_vals schedule filter (#14451) 2026-01-30 17:10:07 -05:00
mixin return types for all math.py function (#14413) 2026-01-28 20:10:11 -05:00
nn fix USE_ATOMICS for non float dtypes and make it the default (#14444) 2026-01-31 09:44:16 +08:00
renderer return types for all math.py function (#14413) 2026-01-28 20:10:11 -05:00
runtime device: call free for external_ptr (#14448) 2026-01-30 23:53:17 +03:00
schedule tighter late_buffer_view match [pr] (#14456) 2026-01-31 07:28:26 -05:00
uop CDNA assembly gemm in tensor.py with flag (#14310) 2026-01-31 22:34:14 +09:00
viz CDNA assembly gemm in tensor.py with flag (#14310) 2026-01-31 22:34:14 +09:00
__init__.py move files into uop dir (#10399) 2025-05-18 11:38:28 -07:00
device.py device: call free for external_ptr (#14448) 2026-01-30 23:53:17 +03:00
dtype.py add InvalidType to ConstType [pr] (#14373) 2026-01-27 14:09:34 -05:00
gradient.py add call/param UOps (#14433) 2026-01-30 14:51:45 +08:00
helpers.py CDNA assembly gemm in tensor.py with flag (#14310) 2026-01-31 22:34:14 +09:00
py.typed add a single py.typed (#6083) 2024-08-14 17:31:46 -07:00
tensor.py CDNA assembly gemm in tensor.py with flag (#14310) 2026-01-31 22:34:14 +09:00