mirrors/tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-06-24 02:14:17 +00:00

Author	SHA1	Message	Date
George Hotz	b245f1307e	add exp2 (#2192 )	2023-10-31 17:48:42 -07:00
qazal	e2428b63a6	external (#2191 )	2023-10-31 13:57:24 -07:00
Elias Wahl	7e8c5f1a0f	Modernize setup.py (#2187 ) * Added pyproject.toml * Pin onnx	2023-10-31 13:55:45 -07:00
nimlgen	8c07c73a9b	Fix cl map buffer (#2190 ) * fix gpu enqueue_map_buffer out of space * add test	2023-10-31 12:02:46 -07:00
George Hotz	c59ea32f90	prevent over unrolling in optimzer	2023-10-31 11:45:18 -07:00
George Hotz	5aaa8a0cc1	fix shape	2023-10-31 11:36:19 -07:00
George Hotz	a27c9f9de5	openpilot compile2 (#2189 ) * try compile2 * pass to thneed * fix tanh onnx	2023-10-31 11:08:58 -07:00
qazal	be5f185ac0	Higher test coverage for dtypes (#2156 ) * refactor unit tests for dtypes * add missing dtypes in llvmir.py and lib.py * skip torch tests * webgpu * cleaner skips * fix llvm bool casting issue using compare * llvm 100% passing * llvm segfault * TEMP decrease timeout mins to 11 debug * add bf16 to setup * skip half tests in cuda cpu * check for CUDACPU insetad * add int16 to triton dtypes * u16 for triton * remove debug - diff is still hard to read * derive from base class TestDType * enhance test_upcast and downcast by running on every possible version * dummy commit to rerun the flakey test * skip the correct tests for CUDA * bf16 should be skipped in the common TestDType cases * re-enable bf16 * more consistent structure * tiny changes to is_dtype_supported 1 * tiny changes 2 add reason * fuzz * fuzzer p2 * run fp32 twice * remove duplicate fp32 run * clang: use stdbool * skip triton on bool casts * merge and resolve conflicts	2023-10-30 22:38:42 -07:00
forcefieldsovereign	f294bdd681	fixed imports (#2185 )	2023-10-30 22:07:17 -07:00
Akshay Kashyap	018bd29e37	Enable Multi-Output Export (#2179 ) * Enable Multi-Output Export * Add test * Update examples and lint * fix padding * test ops * dummy commit to rerun test * revert cuda lint * Enforce tuple/list of tensors * subscripted generics * put back webgpu test * Re-enable WebGPU Efficientnet test	2023-10-30 18:42:26 -07:00
qazal	a7439af786	Fix llvm int->bool cast (#2164 ) * add to ir * add test case * minimize diff * todo * enable fast math * added both False and True case	2023-10-30 15:28:23 -07:00
George Hotz	94cf652b6b	don't use locals applies to GROUP also	2023-10-30 13:56:43 -07:00
George Hotz	5cc536bcc0	don't use locals applies to LASTLOCAL	2023-10-30 13:53:42 -07:00
chenyu	3c88af5071	use unique table name for each disk_cache test (#2184 )	2023-10-30 13:49:49 -07:00
George Hotz	608e3ee800	fix no locals search and search both (#2171 ) * fix no locals search and search both * pretty print * nolocals default no other search	2023-10-30 10:22:50 -07:00
George Hotz	194e4ad6f8	Revert "optimizer: simplify GROUP and LOCAL to have one of each (#2162 )" (#2182 ) This reverts commit `8cf0bb9351`.	2023-10-30 10:22:26 -07:00
Ahmed Harmouche	95f7183c3a	Reenable global, local limiting (#2095 )	2023-10-30 10:17:23 -07:00
chenyu	8548b20b23	fix codellama params and repeat_kv (#2181 )	2023-10-30 10:16:26 -07:00
George Hotz	c7f4dd6cb0	CACHELEVEL for smaller caches	2023-10-28 07:26:03 -10:00
chenyu	6c58bf3e9c	in time_linearizer, allocate a scratch buffer if output buffer is also input (#2152 ) * in time_linearizer, allocate a scratch buffer if output buffer is also input * move scratch buffer creation outside search	2023-10-28 07:17:41 -10:00
Yixiang Gao	902f00b095	adding cuda TC headers (#2165 ) * split cuda to renderer and add headers for tc * fix TritonRenderer * remove unused import	2023-10-27 14:25:59 -10:00
David Hou	7f4f925385	fix hip del on compile fail (#2163 ) * fix hip del on compile fail * the test doesn't actually work	2023-10-27 11:38:07 -10:00
Francis Lam	8cf0bb9351	optimizer: simplify GROUP and LOCAL to have one of each (#2162 ) * optimizer: simplify GROUP and LOCAL to have one of each Now that tensor cores only use LASTLOCAL, we can simplify to use only that op everywhere. The only use of GROUP is in matvec hand-coded opts and it doesn't make a performance difference so switching to use only the top behavior. Also adds additional asserts to prevent tensor core dims from being altered which causes bad kernels to be generated. * search: remove duplicated actions	2023-10-27 11:37:44 -10:00
George Hotz	e0201922e3	Q network for pruning BEAM / uops deduping / BEAM_ESTIMATE (#2142 ) * stable diffusion < 324ms * revert swap action * fix tests due to more sum splitting * REDUCEOP_SPLIT_THRESHOLD env var * added from unaligned np test (#2134) * align cpu buffer before copy into cl buffer (#2135) * remove shelve from handcode_resnet50_opt.py (#2139) * Add dictionary keys to reduce db size (#2131) * work * ignore beam cache * dictionary keys are generic * minor db cleanups * fix baseline and extract dataset * fix training * log likelihood * more lin to feats * sts * training policynet * net sort of works * dedup * refactor, stupid new actions * fix uops deduping * BEAM_ESTIMATE --------- Co-authored-by: chenyu <chenyu@fastmail.com> Co-authored-by: imaolo <56898718+imaolo@users.noreply.github.com>	2023-10-27 10:53:06 -10:00
will	bc0829b677	Fix llama json loading (#2160 )	2023-10-27 10:21:56 -10:00
nimlgen	8d41b3eb3f	beam=16 makes gpt2 gpu-time < 5ms on 3090 (#2154 )	2023-10-27 10:21:27 -10:00
nimlgen	5204864eca	init cudagraph (#2153 ) * init cudagraph * linter happy * print warning when cuda graph creation failed	2023-10-27 16:19:50 -04:00
chenyu	9215bccb41	Tensor.uniform set default to standard uniform (#2158 ) * Tensor.uniform set default to standard uniform * clean up test to reuse function	2023-10-27 16:15:30 -04:00
Roelof van Dijk	36ab04ae35	perf: lazyop as dataclass (#1603 ) * perf: lazyop as dataclass fix: linter fix: restore eq * use builtin methods, buffers to property to allow freezing * fix: reduce diff * fix: can't freeze due to KOPT tests, mypy * fix: explicit hash * can freeze if tests are fixed * fix: typo --------- Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-10-25 17:54:30 -04:00
chenyu	0ca0e9ee5e	exclude ast with variables from beam search (#2140 ) * exclude ast with variables from beam search * test that * add to CI	2023-10-25 16:35:29 -04:00
Szymon Ożóg	a52b420fb3	switch ocelot back to main repo (#2147 ) * return to ocelot main branch * cd before checkout	2023-10-25 15:14:26 -04:00
George Hotz	12dd165d38	add WINO/HALF/HIP to AMD benchmark	2023-10-25 13:22:45 -04:00
Francis Lam	bf3490cdf9	wmma: refactor tensor cores using existing local dims (#2097 ) * wmma: refactor tensor cores using existing local dims * optimizer: fix bad rebase and break after one late local --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2023-10-25 13:10:46 -04:00
wozeparrot	c29653605e	hip multigpu training (#1878 ) * feat: move to hip * feat: special path for RawBufferTransfer * feat: initial rawbuffertransfer * feat: hip ipc * feat: working hip ipc * feat: need to base device without args * feat: close mem handle * feat: modified test * feat: more multihip stuff * clean: cleanup * feat: cleaner * feat: don't crash * feat: test more * clean: way cleaner hip wrapper * feat: barrier * feat: barrier * feat: this breaks stuff * feat: we can use empty here * feat: maybe fix tests * feat: maybe fix tests again? * fix: probably fix tests * feat: no waiting here * feat: wait here * feat: much larger test * feat: need to sync here * feat: make this async * feat: no waiting! * feat: cut here * feat: sync copy * feat: random imports * feat: much cleaner world * feat: restore this * feat: restore this * clean: cleanup * feat: set this	2023-10-24 17:35:53 -04:00
nimlgen	2e89fd264f	Refactor hipgraph (#2141 ) * refactor hip graph * linter happy * happy liner	2023-10-24 15:45:56 -04:00
nimlgen	e21bf776c8	fix debug=1 llama/gpt2 timings (#2143 )	2023-10-24 15:45:00 -04:00
chenyu	4444e6d4b3	stable diffusion < 324ms (#2129 ) * stable diffusion < 324ms * revert swap action * fix tests due to more sum splitting * REDUCEOP_SPLIT_THRESHOLD env var	2023-10-24 14:56:12 -04:00
George Hotz	cea2bc7964	Add dictionary keys to reduce db size (#2131 ) * work * ignore beam cache * dictionary keys are generic * minor db cleanups * fix baseline and extract dataset * fix training * log likelihood	2023-10-24 10:49:22 -04:00
chenyu	d5e2fdea22	remove shelve from handcode_resnet50_opt.py (#2139 )	2023-10-24 10:37:30 -04:00
imaolo	228b310478	align cpu buffer before copy into cl buffer (#2135 )	2023-10-23 21:04:35 -04:00
imaolo	6ee0435263	added from unaligned np test (#2134 )	2023-10-23 11:38:57 -04:00
George Hotz	3c56c181f6	string formatting 25 -> 30 to fit	2023-10-22 10:57:34 -07:00
George Hotz	6dc8eb5bfd	universal disk cache (#2130 ) * caching infra for tinygrad * nons tr key * fix linter * no shelve in beam search * beam search caching * check tensor cores with beam too * pretty print * LATEBEAM in stable diffusion	2023-10-22 10:56:57 -07:00
Francis Lam	ace6b2a151	optimizer: add test for correctness of opts (#2124 ) * optimizer: add test for correctness of opts Also added OptOps.UPCASTMID to constrain valid axes for opts with group_for_reduce. * llvm: fix LinearizerOptions to correctly not has_shared * optimizer: remove premature test scaffold for TC opts * search: fix the action space	2023-10-22 08:02:22 -07:00
George Hotz	abeba8f1fc	optimization: get actions in CI (#2125 ) * get actions in CI * actually run the test * pythonpath	2023-10-20 12:22:01 -07:00
qazal	14625721e9	minor triton casting refactor (#2118 ) * minor refactor * render_cast taking an x like cstyle * fix fmt strings * tl.where * fix alu render * use dtype * newline eof * better diff	2023-10-20 12:11:55 -07:00
George Hotz	cb508e6923	uops graphing + phi (#2120 ) * uops graphing * add_phi_node * less phi nodes * where graph uops should live * naming * move it to external * fix triton yolo * fix clang and preserve behavior	2023-10-19 22:26:28 -07:00
20kdc	bedd028061	waifu2x vgg7: testcase, auto-RGBA->RGB, function to grab pretrained models, training "fix" (#2117 )	2023-10-19 22:07:15 -07:00
Szymon Ożóg	e0b2bf46b4	Improve triton generated code quality (#2119 )	2023-10-19 22:06:19 -07:00
qazal	36d4001b4f	add test coverage for search (#2104 ) * add test coverage for search * only in compiled backends * dont use device.default in decorator * time_til is the other way around xd	2023-10-19 17:06:47 -07:00

... 167 168 169 170 171 ...

11,106 commits