tinygrad/tinygrad
Timmy 4592fc8fe7
Multireduce Kernels - prereq refactor (#4173)
* refector rendering a reduceop into it's own function (will help for kernels with multiple reduceops)

* linters

* addressing concerns
2024-04-14 20:16:54 -04:00
..
codegen Multireduce Kernels - prereq refactor (#4173) 2024-04-14 20:16:54 -04:00
engine optionally use a copy kernel instead of SDMA (#4116) 2024-04-12 23:10:41 -07:00
features move sum acc_dtype into lazy so it applies to backward (#4149) 2024-04-11 14:43:56 -04:00
nn Resnet fp16 training with fp32 master weight copy (#4144) 2024-04-14 11:25:08 -04:00
renderer Update ssa input order and annotate types in cstyle and assembly (#4117) 2024-04-09 13:10:29 -04:00
runtime hotfix: CUDA_P2P works (#4155) 2024-04-12 18:20:12 +03:00
shape assert if expr_idxs return might be outside of int32 (#4157) 2024-04-12 14:18:35 -04:00
__init__.py spend 5 lines to bring mnist into the repo (#4122) 2024-04-09 19:24:57 -07:00
buffer.py use Buffer.ensure_allocated in search _ensure_buffer_alloc (#4132) 2024-04-10 13:11:50 -04:00
device.py multitensor shouldn't recompile (#4164) 2024-04-13 00:03:48 -07:00
dtype.py rename Scalar to ConstType and cast_scalar to as_const (#3946) 2024-03-26 22:39:58 -04:00
function.py move sum acc_dtype into lazy so it applies to backward (#4149) 2024-04-11 14:43:56 -04:00
helpers.py PADTO SUM if parents of sum are all zero-preserving (#4140) 2024-04-10 22:16:12 -04:00
lazy.py optionally use a copy kernel instead of SDMA (#4116) 2024-04-12 23:10:41 -07:00
ops.py optionally use a copy kernel instead of SDMA (#4116) 2024-04-12 23:10:41 -07:00
tensor.py cleanup lbs (#4163) 2024-04-12 22:32:16 -07:00