tinygrad/extra/gemm
nimlgen 81a4a9623c
add qcom dsp runtime (#6112)
* calling qualcomm dsp from python

* include so files

* add include file

* adsprpc.py

* running with adsprpc

* work

* 32-bit support in elf

* compilation works

* ion

* msm_ion

* working DSP backend

* getting 500 MFLOPS on matmul

* beam works with timing

* move to autogen

* disasm

* progress

* simple tests pass

* qcom_dsp

* more dsp autogen

* progress

* some progress

* works w/o lib

* checkpoint

* no lib

* ugh, better

* cleaner, but with lib. test good, but with the hack

* remove autogens

* small

* push

* simpler

* revert this

* run_3

* simpler

* android

* handle

* run it

* why?

* run2

* to gen

* cc

* cleaner

* elf

* part of autogen

* comemnt

* no lib

* autohen

* linter

* bug reproducer

* cleaner

* this repro is almost empty and doesn't work!!!!

* with this test_ops passes, no crashes anymore

* cleaner

* linter

* renames

* shorter

* remoev contextlib

* ugh

* myoy

* cleaner

* cleaner

* remove import

* conn

* import

* revert this

* remove heavy .so

* shorter alloc

* not tue anymore

---------

Co-authored-by: Comma Device <device@comma.ai>
Co-authored-by: George Hotz <geohot@gmail.com>
Co-authored-by: George Hotz <george@comma.ai>
2024-09-13 21:01:33 +03:00
..
.gitignore updates from the chonker branch 2022-11-07 21:12:08 -08:00
amx.py update amx gemm (#3991) 2024-03-29 11:45:03 -04:00
cuda_matmul.py fix 'Import Error: cannot import name compile_cuda from tinygrad.runtime.ops_cuda' error in extra/gemm/cuda_matmul.py (#3531) 2024-02-28 17:15:32 -08:00
fuzz_matmul.py wmma: widen TC usage in search by using PADTO on TC axes when possible (#4216) 2024-04-22 16:50:31 -04:00
gemm.c only 62 gflops (#2629) 2023-12-05 13:28:24 -08:00
gemm.py only 62 gflops (#2629) 2023-12-05 13:28:24 -08:00
hip_matmul.py retire hsa (#4885) 2024-06-09 11:33:03 +03:00
intel_xmx.py Intel XMX Tensor Core Support (#5622) 2024-08-16 09:19:21 -07:00
jax_pmatmul.py jax parallel matmul example 2023-11-28 13:48:11 -08:00
metal_conv.py create engine folder and move code (#3948) 2024-03-26 20:38:03 -07:00
metal_matmul.py create engine folder and move code (#3948) 2024-03-26 20:38:03 -07:00
metal_matvec.py move GlobalCounter to helpers (#4002) 2024-03-30 00:30:30 -04:00
mlx_matmul.py mlx benchmark, a lil slower than tg 2023-12-05 19:00:43 -08:00
real_pmatmul.py pmatmul example + GB/s bugfix [run_process_replay] (#5974) 2024-08-07 22:32:11 -07:00
simple_conv.py wmma: refactor to remove wmma_func and create TC funcs as needed (#3945) 2024-03-27 16:43:09 -04:00
simple_matmul.py add qcom dsp runtime (#6112) 2024-09-13 21:01:33 +03:00
simple_matvec.py extra/gemm/simple_matvec: add simple_matvec.py (#4021) 2024-03-31 16:38:52 -04:00
tf_gemm.py Add tensorflow GEMM benchmark script (#1000) 2023-06-18 10:57:45 -07:00
tinygrad_nv_matmul.py work to make GEMV fast (#5824) 2024-07-30 17:41:40 -07:00
torch_gemm.py faster RDNA assembly backend (#990) 2023-06-16 12:06:38 -07:00
triton_nv_matmul.py extra/gemm/triton_nv_matmul: fix Program arguments (#6212) 2024-08-20 14:05:38 -07:00
tvm_gemm.py lowerer is kernel [run_process_replay] (#5437) 2024-07-12 18:50:55 -07:00