mirrors/tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-06-24 02:14:17 +00:00

Author	SHA1	Message	Date
chenyu	90e55a9fd1	fix buf_index not found case in _apply_tc_opt (#3739 ) ValueError if src.src[0] is not a LOAD. Replaced with returning None in _apply_tc_opt and test to make sure the net output is KernelOptError.	2024-03-14 14:27:05 -04:00
nimlgen	6bf11a2ce3	fix incorrect direct store with gep (#3735 ) * fix incorrect direct store with gep * better comment * phi as well * dtype check there * mypy happy? * not used * renames * phi in phi	2024-03-14 20:58:50 +03:00
qazal	00c56db1a4	Fix JITItem count assert for HSAGraph (#3734 ) * exclude HSA graph * cant import HSAGraph directly	2024-03-14 14:12:35 +03:00
qazal	43953c0ba9	skip grouped store for umatching upcasts (#3723 ) * skip if upcasts dont match * outputs match now * this ast is hardcoded --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-03-14 01:18:31 -04:00
David Hou	199f7c4342	MLPerf Resnet (cleaned up) (#3573 ) * this is a lot of stuff TEST_TRAIN env for less data don't diskcache get_train_files debug message no lr_scaler for fp32 comment, typo type stuff don't destructure proc make batchnorm parameters float make batchnorm parameters float resnet18, checkpointing hack up checkpointing to keep the names in there oops wandb_resume lower lr eval/ckpt use e+1 lars report top_1_acc some wandb stuff split fw and bw steps to save memory oops save model when reach target formatting make sgd hparams consistent just always write the cats tag... pass X and Y into backward_step to trigger input replace shuffle eval set to fix batchnorm eval dataset is sorted by class, so the means and variances are all wrong small cleanup hack restore only one copy of each tensor do bufs from lin after cache check (lru should handle it fine) record epoch in wandb more digits for topk in eval more env vars small cleanup cleanup hack tricks cleanup hack tricks don't save ckpt for testeval cleanup diskcache train file glob clean up a little device_str SCE into tensor small small log_softmax out of resnet.py oops hack :( comments HeNormal, track gradient norm oops log SYNCBN to wandb real truncnorm less samples for truncated normal custom init for Linear log layer stats small Revert "small" This reverts commit `988f4c1cf3`. Revert "log layer stats" This reverts commit `9d98224585`. rename BNSYNC to SYNCBN to be consistent with cifar optional TRACK_NORMS fix label smoothing :/ lars skip list only weight decay if not in skip list comment default 0 TRACK_NORMS don't allocate beam scratch buffers if in cache clean up data pipeline, unsplit train/test, put back a hack remove print run test_indexing on remu (#3404) * emulated ops_hip infra * add int4 * include test_indexing in remu * Revert "Merge branch 'remu-dev-mac'" This reverts commit `6870457e57`, reversing changes made to `3c4c8c9e16`. fix bad seeding UnsyncBatchNorm2d but with synced trainable weights label downsample batchnorm in Bottleneck :/ :/ i mean... it runs... its hits the acc... its fast... new unsyncbatchnorm for resnet small fix don't do assign buffer reuse for axis change * remove changes * remove changes * move LARS out of tinygrad/ * rand_truncn rename * whitespace * stray whitespace * no more gnorms * delete some dataloading stuff * remove comment * clean up train script * small comments * move checkpointing stuff to mlperf helpers * if WANDB * small comments * remove whitespace change * new unsynced bn * clean up prints / loop vars * whitespace * undo nn changes * clean up loops * rearrange getenvs * cpu_count() * PolynomialLR whitespace * move he_normal out * cap warmup in polylr * rearrange wandb log * realize both x and y in data_get * use double quotes * combine prints in ckpts resume * take UBN from cifar * running_var * whitespace * whitespace * typo * if instead of ternary for resnet downsample * clean up dataloader cleanup a little? * separate rng for shuffle * clean up imports in model_train * clean up imports * don't realize copyin in data_get * remove TESTEVAL (train dataloader didn't get freed every loop) * adjust wandb_config entries a little * clean up wandb config dict * reduce lines * whitespace * shorter lines * put shm unlink back, but it doesn't seem to do anything * don't pass seed per task * monkeypatch batchnorm * the reseed was wrong * add epoch number to desc * don't unsyncedbatchnorm is syncbn=1 * put back downsample name * eval every epoch * Revert "the reseed was wrong" This reverts commit 3440a07dff3f40e8a8d156ca3f1938558a59249f. * cast lr in onecycle * support fp16 * cut off kernel if expand after reduce * test polynomial lr * move polynomiallr to examples/mlperf * working PolynomialDecayWithWarmup + tests....... add lars_util.py, oops * keep lars_util.py as intact as possible, simplify our interface * no more half * polylr and lars were merged * undo search change * override Linear init * remove half stuff from model_train * update scheduler init with new args * don't divide by input mean * mistake in resnet.py * restore whitespace in resnet.py * add test_data_parallel_resnet_train_step * move initializers out of resnet.py * unused imports * log_softmax to model output in test to fix precision flakiness * log_softmax to model output in test to fix precision flakiness * oops, don't realize here * is None * realize initializations in order for determinism * BENCHMARK flag for number of steps * add resnet to bechmark.yml * return instead of break * missing return * cpu_count, rearrange benchmark.yml * unused variable * disable tqdm if BENCHMARK * getenv WARMUP_EPOCHS * unlink disktensor shm file if exists * terminate instead of join * properly shut down queues * use hip in benchmark for now --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-03-14 00:53:41 -04:00
George Hotz	56b914fc8c	hotfix: test_assign_contiguous	2024-03-13 17:49:54 -07:00
chenyu	4d6ec41adb	failed test cases for bf16 Tensor.full (#3729 ) fixable with float const then cast to bf16. cast folding with bitcast is incorrectly skipped	2024-03-13 20:46:45 -04:00
George Hotz	838afbc351	assign tests (#3728 )	2024-03-13 17:04:55 -07:00
chenyu	6793db169b	bfloat16 tensor creation from list and numpy (#3724 )	2024-03-13 18:44:05 -04:00
qazal	337cd53444	multioutput ScheduleItem (#3699 ) * refactor realize.py * update docs * update test_sched * update runners and devices * update openpilot and unit tests * cleanup runner lowering * update more tests	2024-03-13 08:59:38 -07:00
nimlgen	08064a0e29	add SEED env to fuzz_linearizer (#3713 ) * add SEED env to test/external/fuzz_linearizer.py * found some * more platforms	2024-03-13 18:08:42 +03:00
chenyu	e1b2a82d89	fix st.real_size can be nagative if valid is always false (#3708 ) two followups after this. (1) if a buffer is never accessed in kernel, it can be removed from input (2) real_size can be smaller conditional on valid being true (the old validhack stuff)	2024-03-12 20:34:07 -04:00
Francis Lam	b6e2495fdd	kernel: limit shared memory usage when adding opts (#3705 ) * kernel: limit shared memory usage when adding opts * search: remove unnecessary limit on search space apply_opt will do the more correct check	2024-03-12 17:06:21 -04:00
George Hotz	2024b24f35	add some graph tests (#3702 ) * add some graph tests * PatternMatcher class * speedup * const cast test * fix tests * itertools chain	2024-03-12 09:49:47 -07:00
chenyu	f599c6e7f4	test output dtypes matche in test_ops (#3703 ) need to cast some torch output to int32 because torch default returns int64 for index related function close #2797	2024-03-12 12:44:40 -04:00
chenyu	02ca067bdf	use default_float.np to construct test data in test_ops (#3701 ) first step of #2797	2024-03-12 11:58:20 -04:00
Patrick Tsai	971d7f5d7c	O(n) arange attempt (#3530 ) * It works? * Clamp correctly * Refactor * Make code better * Undo some stuff * First step to trying to make floats work * Floats work in Python op but not metal because int div is different Python integerdivision was implemented as // which rounds towards negative infinity, but C integer division rounds towards 0 so there is an off-by-1 division error * arange does cumsum with ints and then multiplies by step This is so loop optimization can remain int only * Undo a lot of symbolic changes * Final check * Cleanup * There can be multiple phis * Fix multiple phi op removal * const sets dtype correctly * Fix bugs * Fix a couple bugs and add loop vars to resolve * missed one * Don't trim too many ops * Fix symbolic test * Use ones instead of full * Delete test * Lint passes * max node error * Small updates to loop logic * Remove unnecessary changes * We are getting somewhere * Simple case * Fix * rm, prn * Better * If NumNode doesn't work then continue * clamp is needed for arange(256) * Move everything into the optim fn * Replace correctly * Order optimizations better * Delete * mypy * Test for simplification * Rename * Fix test * update test description * Undo more * Cleanup * No replaced_ops map * Fix lint * AssertionError * back again * Reinstate assertion * Return true and make diff not as big * Bigger range for test * Change cumsum impl * fix bug * make big cumsum work * lint * Undo cumsum 2-stage removal * No while helper * optional min/max clamping * floats work * rm giant arange test * fix python cast None * Check phi parents * one phi allowed per where * Fix one phi per where * Rework iteration * Delete assertions * convert to int * Try mul -1 instead of neg for hip..? * Remove one phi per where requirements * one accum only * Lint * should simplify a loop at a time * Don't get rid of loop explcitly * Need to iterate backwards * lint * unary neg * Make optim work for onnx and sum_pad_collapse * Better message * filter alu ops correctly * Fix the limiter * lint and simplify * Add it back * off by one error * test wheres and phis * test max ops and non-if stuff * <= * cast_scalar * Oops * Change test * Pass loop uops instead of a modified map * Cut param transfer between linearizer and uops * Fix issues * Fix lint * fix efficientnet python 3.8 invalid syntax * distinct vars in seen_vars * accurate var names --------- Co-authored-by: Patrick Tsai <patosai@users.noreply.github.com> Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-03-11 16:09:20 -07:00
qazal	aec4c4f01b	linearizer ast as a tuple of lazyops (#3689 ) * multi store op linearizer * currently we do only one output per kernel * named opts	2024-03-11 15:39:04 -07:00
Skosh	e8c350fdac	fix: make Tensor.rand produce correct values for float16 (#3654 ) * fix: make Tensor.rand produce correct values for float16 Due to precision loss when casting to float16, the data distribution created by custom_random isnt correctly in the interval ]0, 1[, but instead in the interval ]0, 1], which causes the Tensor.randn to incorrectly generate values of infinity. The solution uses a scaling value to make sure the values stay under 1, when using half precision. Closes #3611 * update implementation to truncate to closest f16 value to 1 * chore: fix whitespace * test larger distribution --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-03-10 18:48:00 -04:00
George Hotz	44a67bf783	constant folding (#3675 ) * constant fold * bool math * fix ptx	2024-03-10 14:47:24 -07:00
George Hotz	25aede6fd9	truncate for exec_alu (#3674 )	2024-03-10 14:19:04 -07:00
Francis Lata	957ae9b594	Fix Tensor's __repr__ for printing out grad (#3673 ) * update check for Tensor's __repr__ with grad * add test for repr with grad bugfix	2024-03-10 17:04:29 -04:00
George Hotz	69ca7f7bf9	changes for teenygrad (#3665 ) * changes for teenygrad * upd * simpler test	2024-03-09 15:30:34 -08:00
Maximilian Wolf	8ae85b2cf5	add inference_mode context manager with decorator support (#3621 ) * add inference_mode context manager with decorator support * change val to mode for train and inference_mode * fix wrong rename	2024-03-09 08:38:26 -08:00
Obada Khalili	b5cbf1792a	Fix `Tensor.cumsum` when axis of length 0 is selected (#3473 ) * fix Tensor.cumsum when axis of length 0 is selected * add cumsum regression test * define padding left size in a seperate line	2024-03-09 08:26:41 -08:00
chenyu	915f98791c	use custom KernelOptError in kernel opt (#3661 ) be more specific about invalid kernel opt, used that in test_linearizer_failures. make BEAM kernel search work even with assertion disabled. `BEAM=2 python3 -O examples/llama.py --temperature=0 --count=10 --prompt="Hello." --timing`	2024-03-08 15:36:16 -05:00
George Hotz	ac02e7347d	ptx timing vs cuda timing (#3659 )	2024-03-08 10:17:49 -08:00
chenyu	e25879d50e	don't get new var_val for the same ast in fuzz_linearizer (#3657 ) fixed result comparison for kernels with variables	2024-03-08 09:49:24 -05:00
chenyu	1130c73844	add FUZZ_NTH to fuzz_linearizer (#3656 ) * add FUZZ_NTH to fuzz_linearizer also update tests in test_linearizer_failures to not just run on METAL * update failures for HIP/HSA * test_failure_21 LLVM PADTO	2024-03-08 09:16:49 -05:00
David Hou	9f66dcf718	PolynomialDecayWithWarmup + tests (#3649 ) * working PolynomialDecayWithWarmup + tests....... add lars_util.py, oops * keep lars_util.py as intact as possible, simplify our interface * whitespace * clean up * clean up * asserts * test polylr for full resnet training run * add comment * rename * fix do_optim * don't cast lr * info * calculate from train_files * skip it	2024-03-07 18:53:36 -05:00
chenyu	57df8e8d82	update fuzz_linearizer (#3648 ) included non-reduce kernel and kernel with variables. green msg when everything passed it's possible that creating rawbufs failed due to memory error, included that in failure cases	2024-03-07 18:41:22 -05:00
chenyu	b282a45e39	fix direct store float4 with same vin (#3652 ) In a kernel that stores expanded value, the vin of float4 can come from same source, and we only remove once in that case.	2024-03-07 18:11:50 -05:00
Zaffer	1853ec9a02	add tests for bfloat16 on HIP (#3638 ) * Fix bug in login functionality * Remove HSA backend test and add bfloat16 dtype tests that run in CI * Skip tests on HIPCPU * skip tests causing segfault on LLVM backend * Exclude bfloat16 tests causing segfaults in LLVM backend * move bf16 cast tests to only test on HIP	2024-03-07 10:45:36 -08:00
chenyu	906cc3a69b	cleanup tests Device[Device.DEFAULT] is always Compiled (#3645 )	2024-03-07 11:15:42 -05:00
qazal	bdd62c7fd8	make the bf16 include dynamic (#3642 ) * dynamic prefix * add common ones above these are common dtypes aesthetics * regression test fuzz it test * run in CI * use .append * faster	2024-03-07 10:31:35 -05:00
chenyu	4552248c84	fix Tensor.to preserves grad.data (#3636 )	2024-03-06 21:44:49 -05:00
chenyu	d33311ebe0	remove parens of ALU if it has associative property (#3635 ) need to remove SUB since it's possible to have (const - (const - const)) in test/test_ops.py::TestOps::test_cos, in which case cannot remove the parens of children	2024-03-06 21:12:11 -05:00
chenyu	fe6b6e38c1	remove parentheses of GEP if it's from SSA (#3634 ) fixed some bracket nesting level exceeded maximum of 256 errors	2024-03-06 20:22:46 -05:00
David Hou	0afaf70d57	lars optimizer + tests (#3631 ) * lars optimizer + tests * fix skip list! * use id to compare in skip list * go back to using set * Tensor(bool) * Tensor(bool) is and * don't lint external/mlperf_resnet * whitespace * add external_test_optim to opencl tests * give mlperf task a name * mlperf under onnx * remove track_gnorm * contiguous instead of realize * assert momentum and weight decay positive --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2024-03-06 18:11:01 -05:00
chenyu	b2e92d44fa	skip METAL sin test in test_dtype_alu (#3633 ) revert this part of #3629. this is flaky	2024-03-06 17:29:19 -05:00
chenyu	8f10bfa2ff	ban __bool__ on Tensor (#3632 ) * ban __bool__ on Tensor avoid misuse * test case * fix tests * fix more tests	2024-03-06 17:12:35 -05:00
George Hotz	81baf3eed3	bring ptx back (#3623 ) * bring ptx back * ptx back * fix define var * fix a few bugs * bugfixes * fixes * fix llvm bug * fix test bug	2024-03-06 13:34:21 -08:00
chenyu	c270d54c32	update test_dtype_alu for METAL (#3629 )	2024-03-06 14:55:19 -05:00
qazal	abc5f3a6a0	hip bf16 hotfix (#3630 ) * hip bf16 * remu dev mac * Revert "remu dev mac" This reverts commit 465069a0dc3c7f2045f3348b312a1dcbf1587acd. * skip disk tests in CI * bring float8 back	2024-03-06 11:42:30 -08:00
chenyu	bc2a13a5f7	test case to show clang and python doing math in double (#3628 )	2024-03-06 13:49:03 -05:00
Elias Wahl	a1507c7fd4	Fix Tensor.dropout() with multigpu (#3619 ) * Tensor.rand with multilazybuffer * remove recursive + test * whitespace * another whitespace. Sorry * remove else * Conconicalize multidevice tuple + Remove src	2024-03-05 18:26:21 -05:00
George Hotz	8500265561	this mem fault still happening (#3620 ) * this mem fault still happening * smaller * that print doesn't work * overflows test * hip doesn't uses_ptr_arithmetic * only with locals * test overflow new name * it's not ptr arith * simpler * simple repro * old compiler * simpler * put that back	2024-03-05 10:39:32 -08:00
George Hotz	f500be1313	out of bounds access caused by launch bounds (#3615 ) * lin overflow * remove launch bounds * remove launch bounds infra * oops, fix bufs type	2024-03-05 06:34:00 -08:00
qazal	eb83e2d3a0	decouple buffer mutability from cstyle (#3617 ) * buffer mutability as an arg * update test_uops	2024-03-05 06:20:59 -08:00
chenyu	3275260c98	Revert "test: add failing bfloat16 test case for metal backend (#3481 )" (#3618 ) This reverts commit `1e12a2ae80`.	2024-03-05 09:08:42 -05:00

... 83 84 85 86 87 ...

5,694 commits