mirrors/tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-06-24 02:14:17 +00:00

Author	SHA1	Message	Date
chenyu	f5090192c8	reorder AMD tensor core benchmark test (#13860 ) * reorder AMD tensor core benchmark test * disable that	2025-12-28 12:29:51 -05:00
George Hotz	4702da41d5	hotfix: mkdir for extra/disassemblers	2025-12-19 17:18:37 -04:00
Christopher Milan	97103831c5	Revert "remove image from BufferSpec (#13636 )" (#13761 ) This reverts commit `2571a1eb47`.	2025-12-19 13:54:36 -05:00
Christopher Milan	2571a1eb47	remove image from BufferSpec (#13636 ) * remove image from BufferSpec * cl tiny_gemm (64) works * mypy * padding * openpilot CL * reshape properly * remove extra qcom checks * pad output * mypy * update compile test * move undo * TestImageCopy valid images * TestImageRealization valid images * TestImageDType valid images * cleanups * test_renderer_failures * ruff * mypy * simplify ops_qcom * bump step time	2025-12-19 13:41:20 -05:00
George Hotz	4b741e893f	remove REMOTE=1 (#13722 ) * remove REMOTE=1 * leave ibverbs	2025-12-16 15:58:10 -04:00
George Hotz	7589c897b2	split usbgpu tests into their own benchmark [pr] (#13711 )	2025-12-15 21:42:40 -04:00
qazal	6bafd90248	remove unused process replay input [pr] (#13712 )	2025-12-16 09:29:35 +08:00
nimlgen	cbae33003d	ci: add usb4 (#13643 ) * ci: add usb4 * debug=3 * undef * revert	2025-12-11 19:41:41 +03:00
chenyu	2471b49e45	minor bert / llama change from grad acc branch (#13622 ) * minor bert / llama change from grad acc branch * revert those	2025-12-08 16:04:14 -05:00
chenyu	ac1227575f	IMAGE=1 driving_vision in benchmark (#13587 )	2025-12-05 10:20:54 -05:00
chenyu	8902781dc1	enable more benchmarks (#13540 ) * enable more benchmarks * disable some * adjust ASSERT_MIN_STEP_TIME * mac NOCLANG=1	2025-12-02 20:31:14 -05:00
nimlgen	455dd88236	nv: minimal hevc (#13502 ) * nv: minimal hevc * validate * not needed * tralin * var * cpu * fxi * desc * move * cleanup	2025-11-30 16:46:55 +03:00
wozeparrot	1f648bb1ba	feat: reenable mobilenetv2 dsp (#13320 )	2025-11-21 15:21:49 -08:00
chenyu	6372c95094	disable benchmark MobileNetV2 on DSP (#13305 ) failed on tinyc2	2025-11-16 09:42:52 -05:00
Harald Schäfer	3af231904e	openpilot compile tests: assert pre-rangify speeds (#12775 ) * assert pre-rangify speeds * typo --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-11-13 09:39:06 -08:00
chenyu	3f939f3d3c	update pm_simplify_valid (#13241 ) * update pm_simplify_valid fixed openpilot conv regression * IMAGE training is broken	2025-11-12 19:40:02 -05:00
George Hotz	ab9fa964d8	DISABLE_COMPILER_CACHE -> CCACHE (#13234 ) * DISABLE_COMPILER_CACHE -> CCACHE * Fix cachekey assignment in Compiler constructor	2025-11-12 15:07:09 -08:00
chenyu	23b90945c3	add a benchmark for openpilot vision with DEBUG=2 (#13219 ) see per kernel speed, also disable the jobs for 0.9.9	2025-11-11 14:41:52 -05:00
chenyu	6c48c87e51	improved ASSERT_MIN_STEP_TIME (#13182 ) * improved ASSERT_MIN_STEP_TIME getting close, current time +1ms then round up * relax	2025-11-09 16:41:12 -05:00
George Hotz	42b34cf83d	bottom up linearizer (#13133 ) * bottom up linearizer * late stores * more complete * remove broken heuristic * upcast size * opt * more conservative * it needs that * disable opencl half on QCOM * fix * make that a real test * cpu test okay * ptx skip * end is after the range	2025-11-06 15:30:32 -08:00
chenyu	54141e9cb9	DISABLE_COMPILER_CACHE=1 in speed_v_theoretical (#13096 )	2025-11-04 11:28:18 -05:00
George Hotz	5eb87ab131	hotfix: bump cifar time to 350	2025-10-30 17:29:20 +08:00
b1tg	bb307b9e81	fix fp8 vectorization (#12977 ) * fix fp8 vectorization * add fp8 tc to benchmark	2025-10-28 13:55:30 -04:00
b1tg	45e2f916a3	add quantize fp8 in llama3 (#12893 ) * add quantize fp8 in llama3 * don't truncate fp8 alu result * cast to float32 before matmul * --model weights/LLaMA-3/8B-SF-DPO/ --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-10-27 10:22:57 -04:00
wozeparrot	6e00dec95d	feat: pin openpilot 0.10.1 models (#12878 )	2025-10-22 14:57:54 -07:00
chenyu	f0831c8c30	add 0.10.0 to comma benchmark (#12875 ) * add 0.10.0 to comma benchmark disabled the 0.10.1 ones which are pinned to master. it does not work because benchmark uses the cached old version * that's pinned	2025-10-22 15:18:21 -04:00
George Hotz	726988fa4b	late ifs try 2 (#12865 ) * late ifs try 2 * fix image * fix that test * panic * ptx fixups * preserve toposort * those pass locally * Revert "those pass locally" This reverts commit `063409f828`. * no ls * make that explicit	2025-10-22 18:49:27 +08:00
chenyu	6d86e962c7	update ASSERT_MIN_STEP_TIME (#12857 ) 0.10.1 driving_policy is good now, still need driving_vision and dmonitoring to be fast	2025-10-21 22:46:07 -04:00
wozeparrot	62e7b8b870	feat: just use compile3 (#12849 )	2025-10-21 07:56:50 -07:00
wozeparrot	990e8b97ee	feat: log openpilot 0.10.1 times (#12816 )	2025-10-20 18:30:34 -07:00
chenyu	350a4754a9	Update openpilot models (#12780 ) * Update openpilot models * Update slower model * fix that --------- Co-authored-by: Bruce Wayne <harald.the.engineer@gmail.com>	2025-10-18 20:32:35 -04:00
Harald Schäfer	addc54b96c	Simplify openpilot compile3.py (#12748 ) * Simpler compile3 * tests * remove default args * onnx file is still fp16 * self-test FP16 too * allow test disable * absurd tolerance * Just do latest * Try simplest * use later models * kernel count not relevant if speed is good * dead improts * Revert "dead improts" This reverts commit `f68c2cd15d`. * Revert "kernel count not relevant if speed is good" This reverts commit `0955ca4ee0`. * add back kernal count check on latest model	2025-10-18 10:12:22 -04:00
chenyu	285534ce64	delete DONT_REALIZE_EXPAND and DONT_GROUP_REDUCES (#12744 ) does nothing now	2025-10-16 14:11:33 -04:00
chenyu	53478c741d	relax ASSERT_MIN_STEP_TIME for space lab policy (#12742 )	2025-10-16 11:40:36 -04:00
chenyu	b8cf35fb77	print macOS version in CI (#12705 )	2025-10-15 15:05:33 -04:00
chenyu	89df6f611d	reenable sdxl mac benchmark (#12680 ) also updated faster sd step times	2025-10-14 17:36:17 -04:00
Sieds Lykles	e625c27598	update min step times openpilot (#12600 )	2025-10-10 11:24:27 +02:00
chenyu	be05028419	move ASSERT_MIN_STEP_TIME to compile3 (#12535 ) threshold is current time +20%	2025-10-08 22:16:59 -04:00
chenyu	5986d656a2	tighter ASSERT_MIN_STEP_TIME (#12531 ) set to about 1.2x of actual time now	2025-10-08 21:22:54 -04:00
George Hotz	3b0b3a2e64	fast RANGEIFY (#12504 ) * rtoposort is fast, can replace rangeify with this * fast rangeify * work * fast rangeify works for mnist * should work * progress * pad fix * FAST * tests passing * don't delete those shape ops * put in rangeify map * ending ranges fix * tests * mstack/mselect no hacks * move to indexing.py * touch up tests + add comments * disable failing test * actually make the file readable * failing * error	2025-10-08 19:38:06 +08:00
chenyu	eb3bc277b3	remove ASSERT_MIN_STEP_TIME in external_benchmark_openpilot (#12495 ) should add for compile3 and compile 3 only	2025-10-07 22:13:42 -04:00
chenyu	fe774a4319	more skip WINO on benchmark (#12482 )	2025-10-07 03:43:51 -04:00
chenyu	8ad5f9e74f	skip slow benchmarks (#12481 ) * skip slow benchmarks padded tc is already slow, rest are slow with rangeify (correct if run locally) * relax more	2025-10-07 03:28:56 -04:00
Sieds Lykles	e74be4a140	UOp.factor and add chain sorting (#12413 ) * add ordering * fix some tests * fix more tests * shorten comment * update test * add rule and test * add rule and test * remove check * use fold_divmod_congruence instead of simplify * adjust tests * shorten line * new algo * add test * add function to un-nest the div * add UOp.factor * test UOp.factor * uop_given_valid tries to factor simplex expression * shorten line * symbolic_flat is back * change that back * fix those new tests * new rule for ordering * factor multiple factors * no symbolic_flat * symbolic_flat to there * move that back * fix imports * merge correctly * linter happy * add rule * add a test * cleanup * revert that for now * UOp.factor returns self instead of None * try all_candidates * remove or_else * post index symbolic * add test * maket this closer to the original * increase mac hlb_cifar min step time * add some ordering tests * cleanup * increase pytest timeout time * check dtype	2025-10-04 06:05:38 +02:00
chenyu	494bb12500	skip slow cifar bf16 on red benchmark (#12213 ) very slow to compile the fake bf16	2025-09-16 14:55:01 -04:00
chenyu	419e997187	increase benchmark timeout (#12212 ) account for compile cache, and it's annoying that job died due to timeout also messes the machine	2025-09-16 14:09:02 -04:00
nimlgen	fb96394ff5	auto-select available compilers (#12094 ) * device: auto select compilers * fix * metal+opencl * nv/cuda * test without ptx * ptx * fix tests * fix * fix test * rename * test + cleaner * xx * ops * better test * win? * um? * types * debug * win?? * sep rung * wtf? * debug * skip win * revert this * types	2025-09-10 19:52:01 +03:00
Sieds Lykles	5b73076e48	assert benchmark times (#12042 ) * assert jitted times in openpilot * better error * better error * add ASSERT_MIN_STEP_TIME to more models * t is step_times * update benchmark times * update times	2025-09-09 23:40:02 +02:00
nimlgen	1c6c42715f	unify cpu and llvm (#11982 ) * try unify cpu and llvm * fixes * fix * ops * no llvm * fix * rm * lvmm is ot * oops * override * no llvm * ignore * skip llvm * ooops	2025-09-09 13:54:44 +03:00
George Hotz	433581f8ed	make POSTOPT=2 the default (#12034 ) * make POSTOPT=2 the default * more matching tc * fix winograd * fix that test * add matvec to Scheduler * flip tc sort order * similar speed * fix beam on image * disable slow tests * slow	2025-09-05 14:34:05 -07:00

1 2 3 4 5 ...

293 commits