tinygrad/tinygrad
David Hou aebaab011f
faster wino compile by catting consts across data expand dim (#3293)
* PoC faster wino compile by catting consts across data expand dim

* fix fusions

* faster + golf it

* noqa 501

* implicit broadcast

* Revert "implicit broadcast"

This reverts commit 5915a9083d045ec1e6be84dcb492333325d48666.

* shorter

* shorter

* oops

* 216 upcasts is probably fine

* wino kernel count test

* test winograd number of sts

* specify device for apply_matrix mat elements
2024-02-02 03:47:45 -05:00
..
codegen wmma: add HIP FP16 to FP16 tensor core (#3287) 2024-01-31 23:00:51 -05:00
features limit group_for_reduce bufs to 32kb (#3299) 2024-02-02 03:13:12 -05:00
nn sharding for llama (#3151) 2024-01-16 19:28:00 -08:00
renderer wmma: add HIP FP16 to FP16 tensor core (#3287) 2024-01-31 23:00:51 -05:00
runtime Simple RDNA3 emulator (#2974) 2024-01-30 10:39:28 -08:00
shape add canonicalization to View.create (#3280) 2024-01-30 10:26:48 -08:00
__init__.py move dtypes to dtype.py (#2964) 2024-01-01 14:58:48 -08:00
device.py type annotation for Compiler.cachekey and minor cleanup (#3298) 2024-02-01 21:31:21 -05:00
dtype.py const cleanup with dtype.Scalar (#3257) 2024-01-26 21:16:22 -05:00
graph.py remove unused expr node (#3170) 2024-01-18 14:18:43 -08:00
helpers.py hip mutex signal (#3234) 2024-01-24 13:23:09 -08:00
jit.py fix jit realize issue (#3258) 2024-01-26 18:27:35 -08:00
lazy.py fix pad 0 size (#3277) 2024-01-30 08:58:10 -08:00
mlops.py Fix backward fn for < and == (#3037) 2024-01-14 20:39:52 -08:00
ops.py hip events work (#3229) 2024-01-24 11:49:53 -08:00
realize.py minor hip cleanups (#3237) 2024-01-24 15:13:38 -08:00
tensor.py faster wino compile by catting consts across data expand dim (#3293) 2024-02-02 03:47:45 -05:00