mirrors/tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-06-24 02:14:17 +00:00

Author	SHA1	Message	Date
chenyu	b232c60def	benchmark openpilot 0.9.9 (#11575 ) * benchmark openpilot 0.9.9 not sure what to do with the 0.9.7 ones with IMAGE=2 and validate * name	2025-08-08 01:26:14 -04:00
chenyu	702e38dc19	remove FUSE_ARANGE_UINT (#11567 ) also add IGNORE_OOB=1 to bert runs. lowered BS on tinybox to 90 since 96 oom during eval without reset	2025-08-07 16:49:06 -04:00
chenyu	594cbdc66f	skip AM ResNet50 benchmark (#11565 ) hanging with FUSE_ARANGE?	2025-08-07 14:07:01 -04:00
nimlgen	1afb290027	ci: fix runner in nv (#11527 )	2025-08-06 10:38:04 +03:00
chenyu	3f742a5a7c	comma space lab models benchmark (#11461 )	2025-07-31 19:06:18 -04:00
nimlgen	5fc5bb5237	ci: clear processes (#11434 ) * unified hcq_smi for managment * fix * fix * no reset for amd	2025-07-30 22:15:18 +03:00
nimlgen	4b4ba5454c	ci: move driver start higher (#11431 )	2025-07-30 10:48:38 +03:00
chenyu	204da24cfc	increase driverbenchmark timeout-minutes to 15 (#11428 )	2025-07-29 19:45:05 -04:00
nimlgen	c88e401d0e	ci: fix typos in h machine benchmarks (#11423 )	2025-07-29 22:11:47 +03:00
George Hotz	1f1f99c287	hotfix: add DEBUG=3 to driver CI	2025-07-29 11:03:47 -07:00
nimlgen	d38d285489	ci: add h machines (#11416 ) * ci: add h machines * more * fix names * names not collide * 20 * 10	2025-07-29 19:21:51 +03:00
chenyu	2b48b961be	fix a few broken AMX tests (#11204 )	2025-07-12 21:42:38 -04:00
George Hotz	0597735f28	remove TC=3 not porting this (#11045 )	2025-06-30 15:12:49 -07:00
chenyu	126fcf4129	clean up AMD_LLVM in tests (#11021 )	2025-06-28 22:45:47 -04:00
chenyu	d71bb6a7b2	remove comma 0.9.4 from benchmark (#10867 )	2025-06-18 12:43:59 -04:00
chenyu	4f535641f7	add one huggingface_onnx test to mac benchmark ci (#10700 ) this crashed for me on onnx parser pr but seems fine for the author. see if ci mac is fine	2025-06-08 12:26:12 -04:00
wozeparrot	37e1ef1be3	feat: cleanup old AM processes (#10653 )	2025-06-05 15:41:00 -07:00
wozeparrot	5e3c4a8431	fix: comma testsig (#10568 )	2025-05-29 19:00:07 -07:00
George Hotz	6b8eb5fec2	split mlperf to its own red benchmark run (#10492 ) * Add mmapeak implementation for 7900 XTX * Change identation * Use a template instead of multiple assebly files * Fix output formatting * Reduce register file bank conflicts * More accurate measurement for quick instructions * Add support for gfx1201 * RDNA4 wmma requires less VGRPs * RDNA4 does not have s_cmpk instructions * Add v_wmma_i32_16x16x32_iu4 for gfx1201 * Add sparse wmma instructions * split to tinybox red MLPerf Benchmark --------- Co-authored-by: Panagiotis Kourouklidis <panagiotis.kourouklidis@gmail.com>	2025-05-23 17:12:41 -07:00
uuuvn	3ca5680920	Test remote in benchmark (#10304 ) hlb cifar is fast so added it, can add bert too if you think it's ok 6 real gpus to test multigraph and transfers + accuracy validation should probably be added to tinystats too, i don't know how though Co-authored-by: chenyu <chenyu@fastmail.com>	2025-05-23 12:12:57 -04:00
qazal	90eb3c0e5d	add MobileNetV2 benchmark to comma CI (#10250 ) * add MobileNetV2 to comma CI * symlink imagenet * also the signature * comment that out * need imagenetmock * same train and test set * quantize on CPU=1 * verbose * need __hexagon_divsf3 * 0x858d6c15 * quant cpu + CC=clang-19	2025-05-19 18:22:50 +03:00
George Hotz	b06291077c	no amdgpu kernel driver (#10408 ) * no amdgpu kernel driver * don't test hip * lower req	2025-05-18 20:52:39 -07:00
wozeparrot	1ed04f993b	move benchmark stat tracking to influxdb (#10185 )	2025-05-15 16:14:56 -07:00
Ignacio Sica	47b3055fe2	set fail-fast behavior (#10336 )	2025-05-15 11:24:45 -07:00
George Hotz	7a3d4de59a	hotfix: add GRAPH_ONE_KERNEL=1 to UsbGPU openpilot test	2025-05-14 14:50:37 -07:00
George Hotz	f1130ab3d3	openpilot benchmark test (#10290 ) * openpilot benchmark test * that	2025-05-13 22:49:28 -07:00
chenyu	ad5cb2717d	FUSE_ARANGE=1 in bert bench (#10263 ) still fails, something multi related maybe Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-05-13 09:12:19 -04:00
chenyu	0015b3921f	sleep more in CI Remove amdgpu (#10261 ) see if this is less flaky	2025-05-12 08:13:44 -04:00
nimlgen	7d6ed1b1e9	hotfix: mac ci (#10210 ) * fixed? * cmnt	2025-05-08 14:13:23 +03:00
nimlgen	ba52fce4b2	usbgpu: benchmark in ci (#10208 ) * usbgpu: benchmark * usbgpu: benchmark	2025-05-08 12:02:04 +03:00
Ignacio Sica	bf5fb97498	fix `AMD_LLVM` bf16 tc for `gfx1100` (#10102 ) * fix amd_llvm bf16 tc * cleanup pattern	2025-04-30 20:06:38 -03:00
chenyu	4a04098389	fix llama3 with nf4 quantize (#10107 ) also int8 outputs is wrong	2025-04-29 15:14:36 -04:00
Ignacio Sica	9d5677c12c	fix `ptx` linearizer bug 2 [pr] (#9967 ) * check for local buffer * hotfix * add test_tensor_cores_emulation run for ptx	2025-04-29 14:30:07 -03:00
Ignacio Sica	58cf8cd493	add support for "shared_mem" for `LLVM` (#10093 ) * init llvm shared * add test_tensor_cores_emulation run for llvm	2025-04-29 08:56:36 -04:00
Ignacio Sica	bda116d773	fix `use_tensor_cores` propagation (#10048 ) * propagate use_tensor_cores * add use_tensor_core to arg in test and search * bugfix * get TC val from ContextVar in search * revert minor space change * add tc emulation test to ci and benchmark * revert * revert whitespace change * remove test for ptx * add comment and remove llvm test run	2025-04-28 19:30:50 -03:00
chenyu	e996584685	olmoe in mac benchmark (#10077 )	2025-04-27 21:07:02 -04:00
George Hotz	b6d2effaf5	assign is contiguous (#10066 ) * assign is contiguous * disable process replay for SDXL	2025-04-27 08:40:33 -04:00
Ignacio Sica	023b1c28a2	`test_tensor_cores_padded` refactor (#9724 ) * set pad t 3 for amd padded tc test * change pad for amd regardless CI * test tc padded uops and correctness separately * add test_tensor_cores_padded_uops test to ci * remove redundant chack for amd device * cleanup	2025-04-18 17:05:54 -03:00
chenyu	c5db5b83b9	add SHOULD_USE_TC=1 check to simple_matmul (#9802 ) * add SHOULD_USE_TC=1 check to simple_matmul also zero centered the random input and update atol for tf32 * ATOL=2e-2 for HALF	2025-04-09 02:24:42 -04:00
George Hotz	14928fecff	Revert "fix TF32 tensor core dropped in tc_sm89 (#9798 )" This reverts commit `7c9a96824f`.	2025-04-09 12:27:39 +08:00
chenyu	7c9a96824f	fix TF32 tensor core dropped in tc_sm89 (#9798 ) also add `SHOULD_USE_TC=1` to verify TC is applied in simple_matmul	2025-04-08 23:20:50 -04:00
Ignacio Sica	58785181a8	AMD `bf16xf32` TC (#9717 ) * dont test bf16 for emulated amd tc * skip bf16 tc test in ci * skip bf16 for AMD in test_tensor_cores_codegen * add simple bf16 gemm test to benchmark	2025-04-07 11:41:04 +08:00
chenyu	1d25844d44	Revert "disable CI red llama 3 4 gpu beam (#9690 )" (#9709 ) This reverts commit `6a5eacba8b`.	2025-04-03 02:34:39 -04:00
chenyu	6a5eacba8b	disable CI red llama 3 4 gpu beam (#9690 ) device hangs and ci would fail	2025-04-02 03:19:09 -04:00
qazal	4df2b6347d	hotfix: bump tinybox red training CI timeout to 30 minutes (#9426 )	2025-03-13 09:31:44 +01:00
nimlgen	cd9d74f7ea	use am in training benchmarks (#9357 ) * am in training benchmarks * fix * not needed anymore	2025-03-05 19:13:47 +03:00
chenyu	2e7c2780a9	CLANG -> CPU (#9189 )	2025-02-20 18:03:09 -05:00
Ignacio Sica	aaed315fee	add AMX support to LLVM (#8957 ) * init amx support for llvm * revert elf changes * fix attributes for AMX asm calls * add comments * add llvm amx job to benchmarks * cleanup * cleanup * hotfix: improve comments * comment for aux buffers * hotfix: * move amx_tc to ClangRenderer * merge master * refactor * add docs * add corsix docs reference --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-12 16:01:18 +08:00
nimlgen	52a69dd5e9	Revert "use am in training benchmarks (#8965 )" (#8981 ) This reverts commit `107e616857`.	2025-02-09 15:43:45 +03:00
nimlgen	107e616857	use am in training benchmarks (#8965 ) * am in training benchmarks * fix * not needed anymore	2025-02-08 20:20:47 +03:00

1 2 3 4 5

236 commits