tinygrad/extra/gemm
George Hotz 5472a14544
openpilot compile2 (#1977)
* start compile2

* tweak

* why are there two more kernels?

* minor cleanups

* don't break onnx tests

* add __metadata__ support to safetensors

* no early realize in onnx

* cleanups

* bugfix

* clean up image type, add optimize

* opt to match old

* try that

* opt work

* run compile2

* optimizer

* prt more

* prerealize

* imp

* NOLOCALS works

* no locals means no locals

* support fractional globals

* all locals welcome

* int that

* cleanups

* show gemv regression

* clean up diff

* use idx for the cond

* nolocals

---------

Co-authored-by: Comma Device <device@comma.ai>
2023-10-15 20:39:46 -07:00
..
.gitignore updates from the chonker branch 2022-11-07 21:12:08 -08:00
amx.py fixes (#1893) 2023-09-22 07:20:27 +08:00
cuda_matmul.py FLOAT16 off works 2023-04-19 15:34:56 -07:00
gemm.c Revert "update editorconfig, enforce via CI (#1343)" (#1380) 2023-07-31 10:35:50 -07:00
gemm.py fixes (#1893) 2023-09-22 07:20:27 +08:00
gemv_845.py openpilot compile2 (#1977) 2023-10-15 20:39:46 -07:00
hip_matmul.py fast HIP gemm -> 100 TFLOPS (#1476) 2023-08-09 06:54:15 -07:00
metal_conv.py move device to ops (#1646) 2023-08-23 08:30:17 -07:00
metal_matmul.py good stuff from tensor cores branch (#1199) 2023-07-08 16:58:26 -07:00
metal_matvec.py optimizer: add matvec optimizations (#1972) 2023-10-04 14:16:27 -07:00
simple_matmul.py wmma: clean up to make WMMA arg order consistent (#2014) 2023-10-07 17:45:40 -07:00
tf_gemm.py Add tensorflow GEMM benchmark script (#1000) 2023-06-18 10:57:45 -07:00
torch_gemm.py faster RDNA assembly backend (#990) 2023-06-16 12:06:38 -07:00
tvm_gemm.py fix tvm gemm example 2023-10-08 05:57:41 -07:00