mirrors/tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-06-24 02:14:17 +00:00

Author	SHA1	Message	Date
George Hotz	6f792e8045	vmemu	2025-03-24 15:02:40 +08:00
George Hotz	b1f8018bf4	unaligned load	2025-03-24 14:54:11 +08:00
George Hotz	2eb9241329	better conv	2025-03-24 13:07:14 +08:00
George Hotz	554a490751	Merge branch 'master' into dsp_search	2025-03-24 12:29:22 +08:00
George Hotz	74d98eafb8	add onnx frontend stub [pr] (#9558 )	2025-03-24 12:24:34 +08:00
George Hotz	de7d6cec3a	hotfix: DEBUG 5 prints the ast	2025-03-24 11:43:11 +08:00
George Hotz	651c678edf	work	2025-03-24 09:49:53 +08:00
chenyu	ba41076e94	update embedding test to not use dtypes.long [pr] (#9556 )	2025-03-23 21:33:38 -04:00
chenyu	c965f4c20b	update bert config (#9555 ) BEAM 4->5 for green, 2% faster use AMD driver instead of AM for red, 5% faster	2025-03-23 16:14:41 -04:00
chenyu	d734e24c01	minor WEBGPU_PATH cleanup [pr] (#9552 ) also mypy recognizes `sys.platform == 'win32'` but does not recognizes it if wrapped inside a helper...	2025-03-23 09:10:02 -04:00
Ahmed Harmouche	7ce7fe0574	Refactor webgpu_dawn lib finding (#9547 ) * Refactor webgpu_dawn lib finding * Fix ruff	2025-03-23 08:23:29 -04:00
uuuvn	c631c72f22	HCQ: Increment timeline signal before submitting (#9550 ) `AMDComputeQueue.__del__` frees `hw_page` which is safe because `AMDAllocator._free` does `self.dev.synchronize()` which is supposed to wait for execution of IB to finish, however that doesn't happen if AMDComputeQueue is dropped right after submit before timeline signal is incremented, which it is in most places leading to a race if .bind() is also used (required for multi-xcc because bug in mec fw treats all PACKET3_PRED_EXECs outside IBs as if they had EXEC_COUNT of zero).	2025-03-23 18:30:38 +07:00
nimlgen	d5667419af	am: move out pte creation logic (#9548 ) * am: move out pte creation logic * emu * ops	2025-03-23 18:29:10 +07:00
George Hotz	3274bd2d81	output	2025-03-23 15:13:00 +08:00
geohotstan	309afa20b7	add Tensor.max_unpool2d (#9518 ) * why does max_unpool2d feel slower than out.gradient ... * slightly cleaner * what happened to ruff * need to think about this some more * slightly faster now? * clean up, 1 more failing edge case * ok good * working TINY_BACKEND * nit doc wording * retry CI	2025-03-22 12:11:33 -04:00
George Hotz	30f4d64148	rules	2025-03-22 19:17:16 +08:00
George Hotz	2634975d5a	5 and 8	2025-03-22 19:14:04 +08:00
George Hotz	fd73ec2b1b	knum	2025-03-22 18:59:54 +08:00
George Hotz	e1d2bec4a4	opt	2025-03-22 18:52:56 +08:00
George Hotz	1b4e9f5e91	more opt rules	2025-03-22 18:07:31 +08:00
George Hotz	25c023bcbe	more	2025-03-22 17:49:34 +08:00
George Hotz	07abf9e6bc	multi_add_int32	2025-03-22 17:33:56 +08:00
George Hotz	26b02a037c	fix 33	2025-03-22 17:17:47 +08:00
George Hotz	5089a601c6	name it	2025-03-22 14:44:01 +08:00
George Hotz	6b49a63c48	linearizer workaround	2025-03-22 14:18:02 +08:00
quortus	bdd44d4255	Fix DSP transcendentals (#9542 )	2025-03-22 11:08:18 +08:00
George Hotz	dca95428a5	touch	2025-03-22 11:05:36 +08:00
Ignacio Sica	eddafb84e5	Bugfix for `TC=3` (#9464 ) * wrong but uses less shared * for size 8 tc1 with devectorize in 0 loads into local before wmma and works * improvements over tc1 devectorize * fix tc=3 * works for handcoded tc opts * clean bugfix tc=3 * fix * revert changes	2025-03-21 16:43:42 -07:00
chenyu	6da78164f9	assert Kernel ast.op to be Ops.SINK [pr] (#9539 ) rest of the code assumes self.ast is defined anyway	2025-03-21 18:09:44 -04:00
chenyu	c33679c47b	increase size in test_multinomial_counterexample (#9540 ) should be less flaky	2025-03-21 17:46:52 -04:00
Francis Lata	1a1087e3a0	cleanups on losses and dataset tests (#9538 )	2025-03-21 17:03:18 -04:00
Francis Lata	8cbe4009fc	RetinaNet losses (#9536 ) * add sigmoid_focal_loss and l1_loss * update ref implementation comment	2025-03-21 15:52:54 -04:00
Francis Lata	e6389184c5	update comment for retinanet dataloader implementations (#9534 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-21 15:07:45 -04:00
chenyu	ee3d313b34	Revert "update ruff to 0.11.2 (#9531 )" (#9535 ) This reverts commit `d8d65e2747`.	2025-03-21 14:52:25 -04:00
chenyu	b46b8ee15e	add a flag to log when beam surpassed max limit [pr] (#9533 )	2025-03-21 13:37:02 -04:00
Francis Lata	eb95825eea	RetinaNet dataloader (#9442 ) * retinanet dataloader * remove batch_size from generate_anchors * refactor kits19 dataset tests * add tests for dataloader * fix testing setup and cleanups * remove unused import	2025-03-21 13:36:41 -04:00
b1tg	58206fa8a9	add amd llvm compiler (#9519 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com> Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-21 23:13:27 +08:00
chenyu	d8d65e2747	update ruff to 0.11.2 (#9531 ) 0.11.2 fixed the false alert from 0.11.1. also pinned the version in setup for now to prevent broken CI from ruff upgrade	2025-03-21 10:32:59 -04:00
George Hotz	8a477ba4e1	knum 3	2025-03-21 20:36:18 +08:00
George Hotz	264dd91b8a	70 GFLOPS	2025-03-21 20:31:14 +08:00
George Hotz	bdf716b915	mul work	2025-03-21 20:05:29 +08:00
George Hotz	cf41c803d0	fast 13	2025-03-21 18:10:59 +08:00
George Hotz	3cf9224df5	a scale and b scale	2025-03-21 18:07:53 +08:00
George Hotz	af94addb3a	ish	2025-03-21 17:46:45 +08:00
qazal	ee3ed73ed1	add reorder_view matcher to scheduler [pr] (#9528 )	2025-03-21 17:46:20 +08:00
George Hotz	dc1469a188	double reduce	2025-03-21 17:33:48 +08:00
George Hotz	0416b0998d	revert those	2025-03-21 17:15:38 +08:00
George Hotz	c715c25420	Merge branch 'master' into dsp_search	2025-03-21 17:13:10 +08:00
George Hotz	8e555c586c	switch quantization to unsigned/unsigned + add Ops.REDUCE (#9527 ) * switch quantization to unsigned/unsigned + add Ops.REDUCE * tests * nhwc + replay pkl	2025-03-21 17:02:37 +08:00
George Hotz	f66b03f0a6	dsp ish	2025-03-21 16:28:08 +08:00

1 2 3 4 5 ...

8,282 commits