mirrors/tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-06-24 02:14:17 +00:00

Author	SHA1	Message	Date
George Hotz	2eb9241329	better conv	2025-03-24 13:07:14 +08:00
George Hotz	554a490751	Merge branch 'master' into dsp_search	2025-03-24 12:29:22 +08:00
George Hotz	74d98eafb8	add onnx frontend stub [pr] (#9558 )	2025-03-24 12:24:34 +08:00
George Hotz	651c678edf	work	2025-03-24 09:49:53 +08:00
George Hotz	3274bd2d81	output	2025-03-23 15:13:00 +08:00
geohotstan	309afa20b7	add Tensor.max_unpool2d (#9518 ) * why does max_unpool2d feel slower than out.gradient ... * slightly cleaner * what happened to ruff * need to think about this some more * slightly faster now? * clean up, 1 more failing edge case * ok good * working TINY_BACKEND * nit doc wording * retry CI	2025-03-22 12:11:33 -04:00
George Hotz	30f4d64148	rules	2025-03-22 19:17:16 +08:00
George Hotz	2634975d5a	5 and 8	2025-03-22 19:14:04 +08:00
George Hotz	fd73ec2b1b	knum	2025-03-22 18:59:54 +08:00
George Hotz	e1d2bec4a4	opt	2025-03-22 18:52:56 +08:00
George Hotz	1b4e9f5e91	more opt rules	2025-03-22 18:07:31 +08:00
George Hotz	25c023bcbe	more	2025-03-22 17:49:34 +08:00
George Hotz	26b02a037c	fix 33	2025-03-22 17:17:47 +08:00
George Hotz	dca95428a5	touch	2025-03-22 11:05:36 +08:00
Francis Lata	eb95825eea	RetinaNet dataloader (#9442 ) * retinanet dataloader * remove batch_size from generate_anchors * refactor kits19 dataset tests * add tests for dataloader * fix testing setup and cleanups * remove unused import	2025-03-21 13:36:41 -04:00
George Hotz	af94addb3a	ish	2025-03-21 17:46:45 +08:00
George Hotz	0416b0998d	revert those	2025-03-21 17:15:38 +08:00
George Hotz	8e555c586c	switch quantization to unsigned/unsigned + add Ops.REDUCE (#9527 ) * switch quantization to unsigned/unsigned + add Ops.REDUCE * tests * nhwc + replay pkl	2025-03-21 17:02:37 +08:00
George Hotz	f66b03f0a6	dsp ish	2025-03-21 16:28:08 +08:00
George Hotz	2729a46ca6	don't do that	2025-03-21 16:04:21 +08:00
George Hotz	dbb50e4a00	knum 4	2025-03-21 15:48:50 +08:00
George Hotz	71c7c455a6	quantize	2025-03-21 14:55:29 +08:00
George Hotz	ff3438be4e	fast	2025-03-21 13:04:18 +08:00
George Hotz	c3c85c64ee	simpler	2025-03-21 09:24:33 +08:00
George Hotz	f6ed8f4a27	8 folds	2025-03-20 21:20:46 +08:00
George Hotz	87718170d2	more generic	2025-03-20 21:14:33 +08:00
George Hotz	b67af4049c	knum 20	2025-03-20 20:59:06 +08:00
George Hotz	16e425a4c0	work	2025-03-20 20:24:21 +08:00
George Hotz	e7402e6643	KNUM=13 will be fast like roadrunner	2025-03-20 18:45:53 +08:00
George Hotz	e5ccd9e846	work	2025-03-20 15:20:03 +08:00
George Hotz	223feb2118	Merge branch 'master' into dsp_search	2025-03-20 10:52:30 +08:00
George Hotz	68053d0510	dsp stuff / sniff ioctls from snpe (#9490 ) * sniff ioctls from snpe * dump input buffers * snpe logs from dsp * NHWC support * knum 3 * this run? * revert those --------- Co-authored-by: Comma Device <device@comma.ai>	2025-03-20 10:38:23 +08:00
geohotstan	8c0d0a122c	Add return_indices to max_pool (#9506 ) * wow argmax is so good * 1 less line * clean up and better variable names * is this torch thing right...? * add more tests * slap a TODO on it * clean ups * prettier looking code and fix ceil mode test * add return types and some docs * ok that was a bad example since indices == value, just no example	2025-03-19 15:25:37 -04:00
Francis Lam	1e5d9ad8f7	extra/gemm/max_matmul: start of custom kernels for GEMM (#6926 ) * extra/gemm/max_matmul: start of custom kernels for GEMM * add an unoptimized FP16/FP16 MMA example * add slow 3-stage fp16 acc example * add correct 3-stage pipeline with unswizzled/flat smem input (slow) * add acc fp16 example with 3 stages and swizzle (no bank conflicts) * add max version of NV fp16_fp16_fp16 * fix up comments and removed unused code in max variations * add start of no_xor example * fix to account for UOps to Ops	2025-03-19 15:04:57 +08:00
b1tg	a95b489a55	nanoGPT train works with tiny torch backend (#9283 ) * train_shakespeare_char.py works * move aten.where.self_out to tiny_backend_out * fix memory leak * corealize in the backward_hook * Update backend.py --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-03-19 11:51:02 +08:00
George Hotz	117b7a16ef	VALIDATE_WITH_CPU [pr] (#9488 ) * VALIDATE_WITH_CPU [pr] * fix test	2025-03-18 15:15:04 +08:00
nimlgen	a82c9332d3	am: rename soc21 to soc (#9482 )	2025-03-18 08:54:26 +08:00
Anish Umale	5e58f4b65b	Tiny backend test_ops fix part 3 (#9483 ) * extract straightforward things from https://github.com/tinygrad/tinygrad/pull/9302 * pass dtype and device for ones_like	2025-03-17 18:01:51 -04:00
TJ	9fcef4d009	add masked_select to tensor.py (#9468 ) * add masked_select to tensor.py * fix tests --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-03-17 16:05:36 -04:00
geohotstan	53d6f1e1bb	Add bitonic cat sort (#9422 ) * poc * repeated values fail, sigh * is this being timed out? * fix up down names * bitonic v2, does this run? * bitonic v3, faster * bitonic v3.1, faster * bitonic v3.1.1, same speed unlucky * support dim and indices * bitonic v3.2, simpler code, TODO repeated indices * bruv gimme green for once cmon * cat (stack) implementation, slow but maybe one day when cat is fast meow * revert to v3.2 * bitonic v4, who let the cats out edition * clean up variable names * figured out repeated indices :D * ruff check --fix * use sort for topk * add Tensor.sort everywhere * fix docs and add some types * slightly better variable names * am I doing torch inplace correctly? * delegate sort to values_stable * add a contig, faster first sort * maybe don't test_inplace --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-03-17 12:01:23 -04:00
George Hotz	8eb9093fb8	lil	2025-03-17 19:57:15 +08:00
George Hotz	45f7c08111	work	2025-03-17 19:22:12 +08:00
George Hotz	e57258b17b	prettier rendering	2025-03-17 18:46:25 +08:00
George Hotz	14c9f14125	dsp beam search	2025-03-17 16:42:32 +08:00
George Hotz	824c5f41ac	dsp work try 3 (#9475 ) * dsp work try 3 * padding	2025-03-17 16:42:12 +08:00
George Hotz	cc0041cb8c	padding	2025-03-17 16:30:29 +08:00
George Hotz	52ae9af4dd	Fast DSP for MobileNetV2 (try 2) (#9467 ) * Fast DSP for MobileNetV2 (try 2) * enable fast path on uchar * fix tests	2025-03-17 15:10:36 +08:00
George Hotz	09e7708b49	minimum change for rdna4 [pr] (#9455 )	2025-03-16 13:39:24 +08:00
George Hotz	cb7a7f69c7	quantization preprocessor from DSP, should be universal (#9437 ) * quantization preprocessor from DSP, should be universal * touchups * fix tests	2025-03-15 07:49:37 +08:00
chenyu	0e591baf43	redo simple_matmul change (#9450 ) numpy does not support bfloat16	2025-03-14 17:53:52 -04:00

1 2 3 4 5 ...

1,049 commits