mirrors/tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-06-24 02:14:17 +00:00

Author	SHA1	Message	Date
Sachith Shetty	74567c1958	fix: pass input device to ONNX helper internal tensors (#16242 ) * fix: pass input device to onnx methods internal tensors * test: onnx helper internal tensors use input device	2026-05-19 11:16:33 -07:00
Christopher Milan	a178301dbe	PYTHONREMU: fix CDNA VOP3 conditional writes (#16258 )	2026-05-19 13:31:31 -04:00
nimlgen	b3dcf8f452	hcq2: split into schedule/realize (#16216 ) * hcq2: split into schedule/realize * missing * x * f * clean * cleaner * x * x * x * x * x	2026-05-19 16:40:17 +03:00
qazal	e4350e7de9	set hipcc mac docker to 7.1 (#16261 ) * set hipcc mac docker to 7.1 * pull from amd	2026-05-19 21:30:39 +09:00
George Hotz	a120709671	tighten shape spec for broadcasting (#16206 ) * tighten shape spec for broadcasting * use IndexError, not ValueError * needs size	2026-05-18 22:12:04 -07:00
George Hotz	3f2d401464	all tests pass with NOOPT=1 (#16257 ) * all tests pass with NOOPT=1 * fix a few more * noopt 100% pass * noopt 100% pass	2026-05-18 20:39:51 -07:00
chenyu	e694d7f222	more deviceless const prerequisites [pr] (#16256 ) * more deviceless const prerequisites [pr] * remove that * arange.contiguous -> arange.clone in tests arange will become deviceless const soon, update tests where it needs to be a buffer	2026-05-18 23:14:12 -04:00
chenyu	c1076ed56c	Tensor.device and UOp.device can be None (#16255 )	2026-05-18 22:08:10 -04:00
wozeparrot	a3d59faef6	llama: don't save weight (#16252 )	2026-05-18 17:05:45 -07:00
qazal	18b102f355	llama: also use 7.1 comgr, update startup_walltime.sh (#16253 )	2026-05-19 08:59:02 +09:00
chenyu	d532b4f533	multi alu with deviceless const (#16251 )	2026-05-18 19:31:53 -04:00
qazal	98b8a2b407	llama: use hipcc 7.1 version (#16250 )	2026-05-19 08:09:57 +09:00
Christopher Milan	7515824a6d	ci: actually use clang-20, enable bfloat16 (#16249 )	2026-05-18 19:06:43 -04:00
chenyu	754344087a	assign for deviceless const source (#16248 )	2026-05-18 17:39:53 -04:00
chenyu	73e6b4963b	to and shard is noop for deviceless uop (#16247 )	2026-05-18 16:11:10 -04:00
Christopher Milan	50481ec9b4	cl: check for cl_khr_fp64 (#16246 )	2026-05-18 14:42:43 -04:00
chenyu	db639ebe3e	deviceless const from UOp (#16243 )	2026-05-18 14:14:12 -04:00
qazal	bfb2d1f89a	Revert "fp8 gemm speedup (#16236 )" (#16245 ) This reverts commit `d95bf394e1`.	2026-05-19 02:01:44 +09:00
chenyu	5ae4dbd599	make slow tests faster (#16244 )	2026-05-18 11:42:02 -04:00
chenyu	981c12182f	remove requires_grad= in tinygrad/ (#16241 )	2026-05-17 16:55:37 -04:00
chenyu	fcdd1af880	remove Tensor.detach override [pr] (#16239 )	2026-05-16 23:58:12 -04:00
chenyu	dcee90aa3f	remove requires_grad use in extra/examples (#16238 ) except the ones fed into optimizer	2026-05-16 18:40:26 -04:00
chenyu	8631b6f17d	remove use of requires_grad in test/ (#16237 )	2026-05-16 17:21:07 -04:00
qazal	d95bf394e1	fp8 gemm speedup (#16236 ) * add asm_gemm option * milestone * work * edit * only the fast kernel * diff	2026-05-17 04:58:28 +09:00
chenyu	0ddc50d050	do not gate backward on requires_grad (#16230 ) DETACH is filtered in _deepwalk. instead of None, it gets 0 grad now	2026-05-16 12:29:49 -04:00
nimlgen	bef5f717bc	fix nolocals and beam (#16232 )	2026-05-16 18:09:19 +03:00
qazal	ebcb7b7cc0	fp8 gemm tests with scale args (#16231 ) * update atol * update fp8 path * more work * update profile.sh	2026-05-16 20:47:58 +09:00
nimlgen	e575f778f9	move debug prints (#16218 ) * move debug prints * x	2026-05-16 13:57:34 +03:00
wozeparrot	2d48d7ab09	remove more invalid (#16227 )	2026-05-16 02:52:27 -07:00
wozeparrot	159694347e	llama: fix running flat_llama (#16224 )	2026-05-15 20:16:48 -07:00
Christopher Milan	79c0ae5b89	metal: arch is GPU family (#16223 )	2026-05-15 21:22:48 -04:00
Christopher Milan	2c61f65211	cl: device extensions in arch (#16220 )	2026-05-15 18:59:20 -04:00
George Hotz	2549b14ec2	fix caformer onnx run (#16222 )	2026-05-15 15:08:36 -07:00
George Hotz	2570bded8b	update spec for LOAD (#16221 ) * add load to the spec * can	2026-05-15 14:46:00 -07:00
chenyu	d62c1d83c0	remove Tensor.eye override (#16219 ) * remove Tensor.eye override was only needed for requires_grad arg * README	2026-05-15 15:40:34 -04:00
chenyu	07a172dbbb	remove noop requires_grad_ calls (#16213 )	2026-05-15 13:31:10 -04:00
chenyu	c6cf9e8f0c	remove test_svd_nonfull_5_5 (#16217 ) flaky, kinda overlap with test_svd_general	2026-05-15 13:10:02 -04:00
qazal	d54fa86b71	viz/cli: select all calls in graph by default (#16214 )	2026-05-15 21:01:44 +09:00
nimlgen	28b98e529d	nv: move structs to vram (#16184 ) * nv: vram * x * 4090 * x * move and sysmem on macos * x * remove hp	2026-05-15 13:41:42 +03:00
chenyu	409bb0c9ad	requires_grad cannot be None (#16212 ) final goal is to remove requires_grad, first change the default to True, and don't allow None	2026-05-15 02:01:04 -04:00
Christopher Milan	c7870f11ff	mesa: suggest curl install tip (#16211 )	2026-05-15 00:29:06 -04:00
chenyu	a612b88abb	better assert when setitem a refed tensor (#16210 ) also decouple from requires_grad	2026-05-14 23:40:29 -04:00
chenyu	a75c14f010	some setitem tests (#16209 )	2026-05-14 22:36:25 -04:00
Christopher Milan	891a1ae7c2	onnx: remove dtype_fallback (#15717 )	2026-05-14 22:06:57 -04:00
wozeparrot	b4d267dfd4	llama: only save when small (#16208 )	2026-05-14 17:46:29 -07:00
chenyu	ffa1aac7b1	gradient for STORE/AFTER ala clone (#16205 )	2026-05-14 20:17:27 -04:00
chenyu	09096ea565	test_gradient_through_clone (#16203 ) backward through clone crashes now	2026-05-14 19:26:47 -04:00
George Hotz	d4dcd8487b	aggressive shape check to prepare for broadcasting (#16202 ) * add implicit broadcasting to shape * NOOP/ALLREDUCE fixes	2026-05-14 16:15:44 -07:00
George Hotz	83ec66da34	fix a fastdiv edge case (#16199 )	2026-05-14 13:12:18 -07:00
nimlgen	62ea73719d	hcq2: share more with graph (#16196 ) * share more with graph * comment	2026-05-14 22:28:11 +03:00

1 2 3 4 5 ...

13,338 commits