mirrors/tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-06-24 02:14:17 +00:00

Author	SHA1	Message	Date
geohotstan	9268a8b154	remove MULACC (#3459 ) * init * removed mulacc * is uoptimize the problem? * lol hax make work temporarily fix l8er * revert extra/ changes * clean up * flaky metal tests? * add back mulacc for metal * revert last commit * try skipping linearizer_failure tests * skip flammit tests... cuz tests all work locally * try narrow down exact linearizer failure test * try 2 * try 4 * generated code is the exact same wtf why CI fails * code for 15 and 17 are exact same with or without mulacc, this should pass * try only 1 failure * try garbage collecting lol... * try del variables lol * try gcing after del lol... * is diskcache the problem??? * try disabling opts cache idk * try remove hack * try disable github metal cache... * try CACHELEVEL=0 :D idk anymore * try increase newCommandQueueWithMaxCommandBufferCount_, im almost out of ideas... * revert * actually not a HACK * oops	2024-02-29 07:40:40 -05:00
qazal	94fc0fd546	uop the float4 acc upcast in group_for_reduce kernels (#3466 ) * simplest one * but i can trust this will be cached correctly * wait that was wrong too * cleanup * test_reduce_upcast for single reduce case * a late accumulator always outputs to gds lint	2024-02-28 17:33:47 -08:00
George Hotz	48918fa75a	fix disktensor offset issue (#3532 )	2024-02-28 17:22:17 -08:00
David Friehs	275971e616	fix: align .split, .chunk and .unsqueeze with torch, add fuzz tests (#3505 ) this fixes .split where self.shape[dim] is not perfectly divisible by sizes - .chunk is always the wrong choice here: - tensor((5,)).split(4) should result in (tensor((4,)), tensor((1,))) was (tensor((3,)), tensor((2,))) this also fixes issues in .split and .chunk where tensors with shape[dim]==0 lead to empty tuples/lists when the tensor itself should have been returned instead because tinygrad is expected to fail in all cases where torch fails tinygrad will now be strict regarding sizes having to sum up to passed dimension in .split, num having to be non-null for .chunk and only allowing valid dims in .unsqueeze	2024-02-28 17:06:39 -08:00
chenyu	0c6846f9fc	failed test case for disk tensor assign into dtype int64 (#3527 ) failed case for #3510, mark as expectedFailure for now	2024-02-28 17:52:21 -05:00
chenyu	d89e3c4e08	enable METAL tests now runner is M1 and no fast-math (#3523 )	2024-02-28 14:14:23 -05:00
chenyu	1136e2a82a	`skipIf(not(` -> `skipUnless(` in test_linearizer_failures (#3519 ) if these behaves weirdly in CI might need to disable them in CI	2024-02-28 13:48:47 -05:00
chenyu	2127c1c6c2	test for the split reduce kernel (#3515 ) somehow this was not tested	2024-02-27 21:29:25 -05:00
chenyu	88939c3347	fix Node.max can be symbolic (#3514 ) Also made sure taking max twice can get int.	2024-02-27 17:21:31 -05:00
chenyu	969b57f0fe	enable symbolic_ops and jits test of two vars (#3513 )	2024-02-27 11:17:46 -05:00
Francis Lam	11da65bccd	test/external/fuzz_linearizer: add a FUZZ_MAX_SIZE option (#3455 ) * test/external/fuzz_linearizer: add a FUZZ_MAX_SIZE option this allows us to limit the size of the kernel and reduce running times by avoiding ones that take a long time * fix spacing and re-order to put parameters together	2024-02-27 07:34:59 -05:00
qazal	a29cd6d464	run f64 increased precision tests on remu (#3509 ) * run the test in CI * temp: use the pre-release * Revert "temp: use the pre-release" This reverts commit `28e8571421`.	2024-02-26 18:01:07 -05:00
chenyu	61605ccc69	Remove special case of SumNode div SumNode (#3502 )	2024-02-26 09:42:06 -05:00
Francis Lam	39d75f0d58	test_linearizer_failures: add more METAL examples (#3495 ) these were obtained from running fuzz_linearizer on METAL	2024-02-26 10:19:05 +01:00
chenyu	b154089884	float64 function support for HIP (#3492 ) * float64 function support for HIP * not CI	2024-02-24 09:46:20 -05:00
chenyu	35aff8b0c2	properly exclude PYTHON backend and support of half (#3491 ) should be able to run in CI with python 3.12	2024-02-24 09:22:06 -05:00
David Friehs	2fe98b64bb	fix Tensor.split not passing dim to Tensor.chunk (#3490 )	2024-02-24 07:53:11 -05:00
Carson Radtke	15df9406d6	fix exec_alu(UnaryOps.SQRT, <...>, (0,)) + add test (#3487 ) * fix exec_alu(UnaryOps.SQRT, <...>, (0,)) + add test * sqrt(0) != nan * fix tabs	2024-02-23 18:28:00 +01:00
David Hou	5cfcc2a8d7	support MLB reshaping on-axis for evenly sharded (#3484 ) * support MLB reshaping on-axis for evenly sharded * update test * not -> !=	2024-02-23 07:51:36 -05:00
David Hou	f513c37e64	support same uidx in multiple shape positions (#3205 ) * support same uidx in multiple shape positions * rename var * update comment * add contiguous index check to global_store too * update comment * small change * is this better? * smh * smaller change? * get rid of more changes * get rid of more changes * is this even making anything better * comment * fix test --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-02-21 19:37:03 +01:00
chenyu	1eb24af63b	fix softmax and log_softmax for 0d tensor (#3463 ) matched torch to take axis \in [-1, 0] and used axis=None internally	2024-02-21 11:30:30 -05:00
George Hotz	871ba73e65	_reduce_op is axis based now (#3462 ) * _reduce_op is axis based now * axis_ * update lin failures * disable that * fix shape	2024-02-21 16:36:31 +01:00
chenyu	0d326a48b8	fix LtNode simplification when lhs and rhs contain same variables (#3451 ) * fix LtNode simplification when lhs and rhs contain same variables `(Variable("a", 1, 5) < Variable("a", 1, 5))` should eval to `NumNode(0)` * fix with less perf impact	2024-02-20 09:06:55 -05:00
George Hotz	1b6e890ef2	uops flop counter (#3373 ) * factor out winograd functions * test counter * uops flop counter * more correct * ish * correct * cleanup * tests for uops flop counter * tests still fail * fix symbolic uops flop cnt * fix symbolic uops flop cnt * hmm, it's an alu * uops alu resolve * relax that	2024-02-20 09:36:30 +01:00
Patrick Tsai	9dd64b1f5f	Fix python cast uint/int overflow (#3448 ) * Fix numpy uint/int overflow * lol * Works * Update * Move overflow test to float64/float32 * One line * Update * One more --------- Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com>	2024-02-20 09:20:43 +01:00
chenyu	86efdf0b34	remove create_rednode (#3444 ) handle Node collapsing into NumNode similar to OpNode	2024-02-18 21:08:19 -05:00
chenyu	2da734920e	use __getnewargs__ to fix unpickling Variable (#3441 ) it's recommended to use __getnewargs__ to update the args of classes that use __new__ when unpickling. It's preferred because it does not change the __new__ behavior.	2024-02-18 10:28:37 -05:00
zku	2d702ca073	If feasible, do not truncate float64 down to float32 in cstyle renderer (#3420 ) * do not truncate float64 precision * use l suffix to try avoid overload confusion * long line, ruff bloats the function otherwise * fmt * remove long double suffix (l), it's sufficient to have the float32 (f) suffix to avoid function overload ambigouity; add test showcasing rtol=1e-12 precision increase, the test fails without the renderer changes * use more reasonable test values, same as test_int_to_float_unary_func * disable test for CUDACPU, does not support half and segfaults on some operations per dtypes_alu test * disable test for HIP, renderer does not support f64 precision * do not use noqa E501, break up condition	2024-02-16 10:08:59 +01:00
chenyu	30f26279c5	add back "CPU" in test_onnx_backend supports_device (#3426 ) the onnx tests were all skipped.	2024-02-16 00:49:30 -05:00
xarkes	28a8b72024	Remove Interpreted device & remaining CPU/TORCH ref (#3423 ) * Remove Interpreted device & remaining CPU/TORCH ref * Oops * supports_device was useful * Fix doc wording --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-02-16 00:30:21 -05:00
geohotstan	5eb4c902f6	correct division dtype casting (#3405 ) * 新年快乐 * fix: exclude floordiv onnx tests * fix: less weird if statements in div * 龙年大吉 * fix: tempfix onnx div * fix: use reference impl for div	2024-02-15 19:34:40 -05:00
qazal	e1a57fe58a	test the behavior, not the implementation (#3419 )	2024-02-15 17:23:42 +01:00
George Hotz	b1c0d8c99d	remove cpu and torch backends (#3399 ) * remove cpu and torch backends * don't copy to cpu * use clang instead of cpu * multitensor gathers on the first device * clang is cpu + use default * fixup * bugfix	2024-02-15 16:55:39 +01:00
Obada Khalili	18bb6a22e0	make tensors sizes smaller in maxpool2d tests (#3417 )	2024-02-15 15:53:52 +01:00
qazal	7919a1e6ec	dtypes: delete the float cast in realize.py (#3401 ) * remove float cast * cast scalars to the correct value in creation time * cast scalar in the correct place * wrong, use y_dtype * make consts have a unique cache key * add cast_scalar back * test_load_cache_const_bufs * add bool dtype * test_const_dtype * fix linters	2024-02-15 14:20:30 +01:00
George Hotz	93eceef727	remove cpu prereqs (#3410 )	2024-02-15 13:45:06 +01:00
George Hotz	a40df14fef	ops_ext to replace cpu import (#3409 ) * ops_ext to replace cpu import * don't allow zero copy with as buffer * memoryview(bytearray * reenable test * fix jit issue	2024-02-15 13:03:42 +01:00
George Hotz	ede4fd4705	hotfix: test_jit_copyin	2024-02-15 12:37:53 +01:00
George Hotz	6356474d6d	Revert "ops_ext to replace cpu import (#3406 )" (#3408 ) This reverts commit `91eb93f85a`.	2024-02-15 12:16:10 +01:00
George Hotz	91eb93f85a	ops_ext to replace cpu import (#3406 ) * ops_ext to replace cpu import * don't allow zero copy with as buffer * memoryview(bytearray * reenable test	2024-02-15 12:14:58 +01:00
qazal	27f4de2ce4	delete half_prekernel (#3388 ) * generic rendering of half and bf16 hotfix * fix uops + regression test * fix the test for metal's half4 * uop.uop fixup * mypy with --strict-equality, fix ops_gpu	2024-02-14 15:40:48 +01:00
chenyu	078a2603d5	set metal fast math default to 0 (disabled) (#3370 ) * set metal fast math default to 0 (disabled) It's a correctness fix because we use inf and nan. Let's see how slow it is * skip failed onnx tests * tmp DISABLE_COMPILER_CACHE=1 in metal benchmark * Revert "tmp DISABLE_COMPILER_CACHE=1 in metal benchmark" This reverts commit `22267df380`.	2024-02-14 11:42:33 +01:00
Francis Lam	668324d92b	wmma: protect TC locals from modification and use only LOCAL (#3379 ) also remove unnecesssary upcast_dim from tensor_core and calculate it from the dimensions and thread sizes	2024-02-13 10:19:35 +01:00
Francis Lam	f1ad01fd91	test_linearizer_failures: add new linearizer compile failure on METAL (#3380 )	2024-02-12 20:28:34 -05:00
George Hotz	2e60012bcf	move create schedule and delete old API (#3377 ) * move create schedule and delete old API * fix test multitensor	2024-02-12 18:10:45 +01:00
George Hotz	41efaa848c	move graph.py and jit.py into features (#3376 ) * move graph.py into features * move jit into features * fix quickstart	2024-02-12 17:34:34 +01:00
George Hotz	0f6cde243d	import from wino_cleanup (#3374 )	2024-02-12 16:26:50 +01:00
Jyotirmaya Mahanta	b6a2600c86	fix merging condition in merge_dims (#3363 ) * fix merging condition in merge_dims * add tests * set contiguous after mask is canonicalized * minor fix	2024-02-12 11:50:26 +01:00
qazal	c8fd66a131	Run RDNA3 tensor core tests in CI (#3367 ) * add test_linearizer * skip test_padto_matmul	2024-02-11 19:54:06 -05:00
chenyu	f798b60338	add METAL_FAST_MATH env var to disable metal fast math (#3369 ) * env var METAL_FAST_MATH to disable fastmath for metal use this to test impact of fast math. might need to disable compiler cache with DISABLE_COMPILER_CACHE * failed onnx test with fast math METAL_FAST_MATH=0 DISABLE_COMPILER_CACHE=1 NOOPT=1 python -m pytest -n=auto test/external/external_test_onnx_backend.py -k test_MaxPool3d_stride_padding_cpu	2024-02-11 04:26:09 -05:00

... 64 65 66 67 68 ...

4,667 commits