mirrors/tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-06-24 02:14:17 +00:00

Author	SHA1	Message	Date
qazal	660c034da6	KERNEL op try 3 (#9061 ) * work * tolerate shape, maybe this is ASSIGN(RESHAPE(BUF), KERNEL) * err, it's not ASSIGN(BUF, KERNEL), it's ASSIGN(VIEW(BUF), KERNEL) * burn the boats * assign slightly works * assign works * cleanup + var_vals can exist * fine image + fix metadata * metadata, without making everything 30% slower * diff pruning * faster assign schedule * add_buffer_ops stage * add kernel_spec back * add viz display * more strict kernel_spec	2025-02-17 14:47:54 +01:00
qazal	ec80df5115	add PROGRAM renderer to viz [pr] (#9137 )	2025-02-17 14:46:08 +01:00
qazal	7b09a72682	don't display void dtype in viz nodes [pr] (#9136 ) * don't display void dtype in viz nodes [pr] * extra	2025-02-17 13:49:36 +01:00
George Hotz	4dd10d03b7	move is_increasing to ops [pr] (#9134 )	2025-02-17 19:27:48 +08:00
qazal	22c571d3cb	add kernel axis colors to viz [pr] (#9129 ) * add kernel axis colors to viz [pr] * slightly blending with white makes this nicer * space	2025-02-17 12:21:35 +01:00
George Hotz	1bf66d62cf	symbolic gets its own file [pr] (#9132 )	2025-02-17 18:55:21 +08:00
George Hotz	bd694faf6c	factor out the expander logic [pr] (#9131 )	2025-02-17 18:09:48 +08:00
quortus	5bdf0c7951	Bitcast constant folding 2.0 (#9089 ) * Prevent const folding in test_payne_hanek_reduction * Do not use list as a default parameter * Bitcast constant folding --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-17 18:08:20 +08:00
quortus	2be4529f14	Test broken const folding wraparound behavior (#9080 ) * Test broken const folding wraparound behavior * Add repro for test_payne_hanek_reduction const folding bug --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-17 17:44:56 +08:00
George Hotz	7eea9b639d	hotfix: add replay_pkl debugging env	2025-02-17 17:34:58 +08:00
George Hotz	af9d8d39d2	dsp matchers + bump line count to 11300 (#9130 )	2025-02-17 17:31:54 +08:00
quortus	638d925e4e	Prevent const folding in test_payne_hanek_reduction (#9088 ) * Prevent const folding in test_payne_hanek_reduction * Do not use list as a default parameter	2025-02-17 17:31:10 +08:00
George Hotz	9289425170	add ast to ProgramSpec + pre matcher [pr] (#9128 ) * add ast to ProgramSpec + pre matcher [pr] * cleaner cast + test fix	2025-02-17 16:39:14 +08:00
qazal	fe260ac4d7	viz/server cleanups [pr] (#9127 ) * viz/server cleanups [pr] * space	2025-02-17 09:59:41 +02:00
George Hotz	a38b47e026	hotfix: DSP doesn't use that path	2025-02-17 10:45:29 +08:00
quortus	edf7213f34	Make bitcast to the same dtype noop (#9121 )	2025-02-16 20:28:44 -05:00
Ahmed Harmouche	59fe45f947	Solve get_grouped_dims does not split issue (#9085 ) * Solve dims too large errors on webgpu * Simplify divisor find * Test square root divisor * Fix lint * Refactor into group_dims and split_dims * Refactor * Fix lint * Add back max check in _group_dims * Prefer grouping over split --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-16 19:57:29 -05:00
Ahmed Harmouche	84dc331dd1	Refactor async (#9126 )	2025-02-16 17:47:15 -05:00
qazal	6a9e5598f9	small viz touchups [pr] (#9123 )	2025-02-16 20:07:40 +01:00
qazal	b3127f38e6	faster viz data fetching with streaming [pr] (#9122 ) * refactor to generator * yield * switch to SSE * start client side + end events * start javascript work * need to redo this whole part * more correct * diff * works * diff cleanup * more diff cleanup	2025-02-16 19:31:11 +01:00
uuuvn	8926bac00a	am: profiling working (#9119 ) ops_amd.py registres device finalization via atexit.register after finalize_profile is registred in device.py leading to AM device being closed before finalizing profile leading to hang. (atexit.register is LIFO: https://docs.python.org/3.12/library/atexit.html#atexit.register) This pr moves registring device finalization to device.py before registring profile finalization	2025-02-16 18:51:08 +03:00
qazal	97cb9cb1ed	always viz the first graph + non blocking matches fetch [pr] (#9117 ) * always display the first graph in viz [pr] * simpler * progress indicator is the matches list style * remove extra * back * res.json is still slow	2025-02-16 13:39:51 +01:00
chenyu	1fda98d14f	fix import time_linearizer [pr] (#9118 ) only test that used it was skipped in CI due to being slow	2025-02-15 21:33:28 -05:00
chenyu	c1dfe5c00d	compact get_late_rewrite_patterns [pr] (#9116 )	2025-02-15 20:33:09 -05:00
qazal	2e97022e5e	remove extra block in viz [pr] (#9115 )	2025-02-16 02:38:09 +02:00
chenyu	fd95543ff1	user scatter_reduce in scatter [pr] (#9114 )	2025-02-15 18:21:01 -05:00
chenyu	c954419bc8	minor tweak to transcendental pow (#9112 ) also added more pow with const test cases	2025-02-15 18:03:25 -05:00
chenyu	8dfa0024f0	raise in scatter if self and src have different dtype [pr] (#9109 ) raise RuntimeError that matches torch instead of an implcitly cast	2025-02-15 11:21:34 -05:00
chenyu	d129ccda4c	add RAWAST back to DEBUG=3 [pr] (#9107 )	2025-02-15 09:12:51 -05:00
qazal	2e19976d03	assert views in tensor uops [pr] (#9106 )	2025-02-15 13:27:55 +02:00
George Hotz	81f5a7af7d	improve DEBUG=3 [pr] (#9105 )	2025-02-15 18:44:56 +08:00
qazal	41d143d27c	new order to prepare for becomes_map = tensor_map [pr] (#9104 )	2025-02-15 10:37:36 +01:00
George Hotz	4672d9af73	actual tests for the dsp backend [pr] (#9102 ) * actual tests for the dsp backend [pr] * fix name	2025-02-15 15:17:56 +08:00
George Hotz	7e09057afa	fixup clang devectorize (#9099 ) * fixup clang devectorize * __builtin_convertvector is some casts * dsp fixups	2025-02-15 09:29:47 +08:00
Marcello Fuschi	8824f7e9df	Make logcumsumexp numerically stable (#9050 ) * Make logcumsumexp numerically stable * Refactor * Refactor for special case ndim=0 * Refactor * Use the correct device for mask --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-14 19:25:17 -05:00
chenyu	81597ddd96	increase lr for bert (#9098 ) had one run that converged better https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/u66tv2hh/overview	2025-02-14 19:10:35 -05:00
b1tg	3ad39b247b	refactor LLVMRenderer (#9090 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-02-15 08:00:31 +08:00
b1tg	1f1362fd27	add truncate_bf16 (#9078 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-02-15 07:59:09 +08:00
Ahmed Harmouche	2dc8f1867c	Synchronize webgpu (#9093 )	2025-02-15 00:52:10 +03:00
chenyu	b58e7b1898	zero out the weight in bert init run (#9076 ) `DEFAULT_FLOAT=HALF BENCHMARK=10 BS=66 EVAL_BS=6 GPUS=6 MODEL=bert python3 examples/mlperf/model_train.py` no longer oom. I think the buffer of random init weights caused the oom.	2025-02-14 08:40:41 -05:00
qazal	82ad0d2e65	keep CONST/BUFFER uops in tensor_map [pr] (#9083 )	2025-02-14 14:50:08 +02:00
qazal	65297066c2	move buffer refcount increment to the toposort [pr] (#9081 )	2025-02-14 12:54:22 +01:00
chenyu	73af42aeab	fix pow backward when base is 0 (#9075 )	2025-02-13 21:06:01 -05:00
qazal	2d04a75a40	start tracking bottom_up_rewrite in viz [pr] (#9071 ) * start tracking bottom_up_rewrite in viz [pr] * use the tracking matcher in test_viz	2025-02-14 00:28:10 +01:00
chenyu	5ef48bbe0a	swap order in rsqrt (#9069 ) fixed backward for 0	2025-02-13 16:51:21 -05:00
Ahmed Harmouche	e83905696e	Show install instructions when dawn library is missing (#9059 ) * Show install instructions when dawn library is missing * Handle missing dawn in ops_webgpu * Simplify * Solve f-string backlash error	2025-02-14 00:30:20 +03:00
chenyu	9e91898941	bert eval at the end of training (#9070 ) always eval at the last epoch	2025-02-13 16:29:44 -05:00
chenyu	e02e3b94c3	remove SQRT hack in llvm (#9067 ) replaced with xpow 0.5 in transcendental. fixed sqrt(0) backward	2025-02-13 15:42:34 -05:00
chenyu	947c97e6ff	add test_sqrt to test_speed_v_torch (#9066 ) working on getting rid of llvm sqrt hack	2025-02-13 15:25:54 -05:00
chenyu	49abc09f77	remove the reshapes in test_arange_2_reduce [pr] (#9063 )	2025-02-13 12:33:25 -05:00

... 63 64 65 66 67 ...

11,106 commits