mirrors/tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-06-24 02:14:17 +00:00

Author	SHA1	Message	Date
qazal	bfb2d1f89a	Revert "fp8 gemm speedup (#16236 )" (#16245 ) This reverts commit `d95bf394e1`.	2026-05-19 02:01:44 +09:00
chenyu	5ae4dbd599	make slow tests faster (#16244 )	2026-05-18 11:42:02 -04:00
chenyu	981c12182f	remove requires_grad= in tinygrad/ (#16241 )	2026-05-17 16:55:37 -04:00
chenyu	fcdd1af880	remove Tensor.detach override [pr] (#16239 )	2026-05-16 23:58:12 -04:00
chenyu	dcee90aa3f	remove requires_grad use in extra/examples (#16238 ) except the ones fed into optimizer	2026-05-16 18:40:26 -04:00
chenyu	8631b6f17d	remove use of requires_grad in test/ (#16237 )	2026-05-16 17:21:07 -04:00
qazal	d95bf394e1	fp8 gemm speedup (#16236 ) * add asm_gemm option * milestone * work * edit * only the fast kernel * diff	2026-05-17 04:58:28 +09:00
chenyu	0ddc50d050	do not gate backward on requires_grad (#16230 ) DETACH is filtered in _deepwalk. instead of None, it gets 0 grad now	2026-05-16 12:29:49 -04:00
nimlgen	bef5f717bc	fix nolocals and beam (#16232 )	2026-05-16 18:09:19 +03:00
qazal	ebcb7b7cc0	fp8 gemm tests with scale args (#16231 ) * update atol * update fp8 path * more work * update profile.sh	2026-05-16 20:47:58 +09:00
nimlgen	e575f778f9	move debug prints (#16218 ) * move debug prints * x	2026-05-16 13:57:34 +03:00
wozeparrot	2d48d7ab09	remove more invalid (#16227 )	2026-05-16 02:52:27 -07:00
wozeparrot	159694347e	llama: fix running flat_llama (#16224 )	2026-05-15 20:16:48 -07:00
Christopher Milan	79c0ae5b89	metal: arch is GPU family (#16223 )	2026-05-15 21:22:48 -04:00
Christopher Milan	2c61f65211	cl: device extensions in arch (#16220 )	2026-05-15 18:59:20 -04:00
George Hotz	2549b14ec2	fix caformer onnx run (#16222 )	2026-05-15 15:08:36 -07:00
George Hotz	2570bded8b	update spec for LOAD (#16221 ) * add load to the spec * can	2026-05-15 14:46:00 -07:00
chenyu	d62c1d83c0	remove Tensor.eye override (#16219 ) * remove Tensor.eye override was only needed for requires_grad arg * README	2026-05-15 15:40:34 -04:00
chenyu	07a172dbbb	remove noop requires_grad_ calls (#16213 )	2026-05-15 13:31:10 -04:00
chenyu	c6cf9e8f0c	remove test_svd_nonfull_5_5 (#16217 ) flaky, kinda overlap with test_svd_general	2026-05-15 13:10:02 -04:00
qazal	d54fa86b71	viz/cli: select all calls in graph by default (#16214 )	2026-05-15 21:01:44 +09:00
nimlgen	28b98e529d	nv: move structs to vram (#16184 ) * nv: vram * x * 4090 * x * move and sysmem on macos * x * remove hp	2026-05-15 13:41:42 +03:00
chenyu	409bb0c9ad	requires_grad cannot be None (#16212 ) final goal is to remove requires_grad, first change the default to True, and don't allow None	2026-05-15 02:01:04 -04:00
Christopher Milan	c7870f11ff	mesa: suggest curl install tip (#16211 )	2026-05-15 00:29:06 -04:00
chenyu	a612b88abb	better assert when setitem a refed tensor (#16210 ) also decouple from requires_grad	2026-05-14 23:40:29 -04:00
chenyu	a75c14f010	some setitem tests (#16209 )	2026-05-14 22:36:25 -04:00
Christopher Milan	891a1ae7c2	onnx: remove dtype_fallback (#15717 )	2026-05-14 22:06:57 -04:00
wozeparrot	b4d267dfd4	llama: only save when small (#16208 )	2026-05-14 17:46:29 -07:00
chenyu	ffa1aac7b1	gradient for STORE/AFTER ala clone (#16205 )	2026-05-14 20:17:27 -04:00
chenyu	09096ea565	test_gradient_through_clone (#16203 ) backward through clone crashes now	2026-05-14 19:26:47 -04:00
George Hotz	d4dcd8487b	aggressive shape check to prepare for broadcasting (#16202 ) * add implicit broadcasting to shape * NOOP/ALLREDUCE fixes	2026-05-14 16:15:44 -07:00
George Hotz	83ec66da34	fix a fastdiv edge case (#16199 )	2026-05-14 13:12:18 -07:00
nimlgen	62ea73719d	hcq2: share more with graph (#16196 ) * share more with graph * comment	2026-05-14 22:28:11 +03:00
George Hotz	3b8cc31759	disable fast idiv by default, it's broken (#16197 ) * disable fast idiv by default, it's broken * fix fast idiv tests	2026-05-14 11:48:27 -07:00
Christopher Milan	8f811649ff	better compiler_cpu invalid arch errors (#16194 )	2026-05-14 14:36:14 -04:00
qazal	f03a7fd6d1	viz/cli: readable uop json (#16195 ) * viz/cli: readable uop json repr * work * better	2026-05-14 21:33:10 +09:00
C T	1b779a9058	add gelu approximate="none" (match pytorch) (#16162 ) * add gelu approximate="none" (match pytorch) * lint * pass through onnx Gelu approximate * type annotate * explicit math.sqrt * keep tinygrad's gelu approximate="tanh" default	2026-05-13 18:53:24 -07:00
chenyu	dd9187d9ee	minor hash cleanups (#16190 ) same kernels	2026-05-13 20:59:24 -04:00
wozeparrot	88ac2ac1fd	llama: cleanups (#16189 )	2026-05-13 17:08:06 -07:00
Christopher Milan	9a365d9978	ci: fix null image tests (#16188 )	2026-05-13 18:00:05 -04:00
nimlgen	ad1fb7c981	hcq2: graph (#16186 ) * keep this for now * early graph	2026-05-13 22:49:43 +03:00
chenyu	3f9f6a51b2	minor image_conv2d cleanup (#16187 ) remove some no-op slices	2026-05-13 15:47:40 -04:00
b1tg	59c34b9fe0	llm: precise device (#16159 ) * llm: precise device * llm: pass device to precompute_freqs_cis	2026-05-12 21:16:42 -07:00
b1tg	3c806ff406	clean up gguf (#16160 )	2026-05-12 21:16:10 -07:00
wozeparrot	e97f2c1114	llama: only gemm + fa custom kernel (#16180 ) * llama: tie store to grad directly * llama: set mp flags * llama: non fused grad fp8 quantize path	2026-05-12 21:03:49 -07:00
chenyu	38d407fd58	simplify svd more (#16181 ) all the slowness is scheduling	2026-05-12 23:48:22 -04:00
Christopher Milan	f1fdd2ccec	ci: add IMAGE=1 compile-only tests (#16182 ) * ci: add IMAGE=1 compile-only tests * fix	2026-05-12 23:40:32 -04:00
George Hotz	faf7fb7513	update nir renderer for new image style (#16179 ) * update nir renderer for new image style * don't cast image indexes	2026-05-12 20:25:01 -07:00
Christopher Milan	7d0c5ab689	ci: ocelot needs nvcc on linux (#16178 ) * ci: ocelot needs nvcc on linux * cudart	2026-05-12 23:13:48 -04:00
chenyu	32138c2418	svd to mixin (#16175 )	2026-05-12 22:29:01 -04:00

... 2 3 4 5 6 ...

13,471 commits