mirrors/tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-06-24 02:14:17 +00:00

Author	SHA1	Message	Date
ttomsa	ae0c3cfff6	change clang -march flag to -mcpu on arm (#10970 ) Co-authored-by: wozeparrot <wozeparrot@gmail.com>	2025-08-11 13:38:48 -04:00
nimlgen	da0b955be4	hcq: cpu can be graphed (#11474 ) * hcq: cpu can be graphed * ops * new jit decisions * fix test * fix remote * cleaner * fix	2025-08-02 21:01:19 +03:00
nimlgen	9f2182f92f	cpu: start threading (#11324 ) * cpu: threading * syncs * llvm * fix * opt * fx * fix * missed sync * one line less * cleaner * fix	2025-08-01 15:35:07 +03:00
nimlgen	a5371f514b	cpu: copies in profile (#11392 ) * cpu: copies in profile * fix * rename to tiny?	2025-07-27 20:56:27 +03:00
nimlgen	0f374e10d2	cpu: use mmap for allocations (#11349 ) * cpu: use mmap for allocations * ops * fix mypy	2025-07-23 20:30:18 +03:00
nimlgen	ca09c180dc	cpu: remove del spam (#11343 ) * cpu: remove del spam * fix	2025-07-23 12:02:37 +03:00
nimlgen	cc3c1e4c14	hcq: move cpu to hcq (#11262 ) * hcq: move cpu to hcq * import time * upd * fix * windows support * hm * cleaner * fix timer * fix timing * std is ns * skip profiler * mypy * cleaner * cleanups * after merge * default is back	2025-07-21 15:10:38 +03:00
George Hotz	0f89660ce4	Revert "change clang -march flag to -mcpu on arm (#10841 )" (#10942 ) This reverts commit `897e42fd1b`.	2025-06-23 16:48:28 -07:00
ttomsa	897e42fd1b	change clang -march flag to -mcpu on arm (#10841 ) * change clang -march flag to -mcpu with fp16 disassembly test * fix * add capstone to macos dependencies * just check no cast in test * rm import * woops * lets check * move check * llvm init before cpu chcek * try this * bump autogen llvm version * also update libclang? * revert * add comment * skip llvm test and add comment * linter	2025-06-23 16:28:48 -07:00
George Hotz	0629e45332	remove cpu graph (#10836 ) * remove cpu graph, it's different from the others * remote was blacklisting CPUGraph * remove cpugraph from dsp	2025-06-16 11:40:58 -07:00
George Hotz	413e223d6e	Revert "remove cpu graph, it's different from the others (#10743 )" (#10745 ) This reverts commit `3d64a98432`.	2025-06-09 22:40:48 -07:00
George Hotz	3d64a98432	remove cpu graph, it's different from the others (#10743 ) * remove cpu graph, it's different from the others * remote was blacklisting CPUGraph	2025-06-09 22:17:10 -07:00
quortus	9e49721c47	CPUGraph support for clang (#10014 ) Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-04-24 07:52:35 -04:00
Alexey Zaytsev	3bce5ad2b4	clang should not emit the .comment section (#9859 ) This section gets included in the finanl image, and we get a lot of garbage with DEBUG=7	2025-04-12 10:59:11 +08:00
chenyu	2e7c2780a9	CLANG -> CPU (#9189 )	2025-02-20 18:03:09 -05:00
George Hotz	b1c0d8c99d	remove cpu and torch backends (#3399 ) * remove cpu and torch backends * don't copy to cpu * use clang instead of cpu * multitensor gathers on the first device * clang is cpu + use default * fixup * bugfix	2024-02-15 16:55:39 +01:00
chenyu	ca7973f61c	clean up einsum_mulacc (#3312 ) * clean up einsum_mulacc * push get_strides * stride * get_strides for ndim	2024-02-04 06:21:19 -05:00
Obada Khalili	b4ea0e18e3	Fix dot product on buffers with zero strides (#3303 ) * skip matacc opt if the all src buffers of mul op are const buffers * add noqa directive for long test * unskip MALACC opt * ensure that a_axes at least includes summation axes in order to perform np.einsum correctly * add regression test for mulacc op * compute a_slices using a_axes * refactor helper of function to retrieve axes and slices for nonzero strides as well as summation axes * include a regression test that uses and to test the behaviour indirectly	2024-02-04 05:15:06 -05:00
geohotstan	842053873d	fix neg logical_not inconsistencies (#3222 ) * try * test: add logical_not tests * gah im retarded, but this doesn't match types for const() * fix: can't we jsut do this? * big change: I don't actually know what I'm doing * WOOO IM JUST CHANGING EVERYTHING WOW probably gon revert later * BYE BYE noqa: E501 * fix: less lines and add test * fix: rm 2 redundant tests * fix: eq with False so we don't unintentionally implicit upcast, but it's bool anyways so w/e	2024-01-24 11:48:40 -05:00
George Hotz	23b084e70a	add device name to device, all are constructed (#3221 )	2024-01-23 20:34:56 -08:00
George Hotz	cc2969f690	simpler cstyle (#2966 ) * simpler cstyle * save lines	2024-01-01 16:20:10 -08:00
chenyu	b469fe3723	add CMPEQ (#2931 ) * CMPEQ * work * fix onnx * fix round * fix webgpu * prettier * no PADTO in actions	2023-12-25 00:15:55 -05:00
chenyu	677ae7673d	use np.less and torch.lt for CMPLT (#2899 ) also removed one unused output_type	2023-12-21 14:37:24 -05:00
chenyu	1500aca43d	remove output_type in ops_cpu and ops_torch (#2892 ) now the input types are matched and checked in lazy, we can remove these output_type. also remove the usage of least_upper_dtype in ops.py since we can just use the input type	2023-12-21 02:11:27 -05:00
chenyu	2d2c4980fe	assert for elementwise dtypes in lazy (#2888 ) * assert for elementwise dtypes in lazy * no image hack * check dtype of scalar for IMAGE=2	2023-12-21 01:42:32 -05:00
chenyu	959d9cfed4	clean up ops_torch and ops_cpu (#2819 )	2023-12-17 19:35:19 -05:00
chenyu	91adb119b8	remove match_type in ops_torch and ops_cpu (#2817 ) * remove match_type in ops_torch and ops_cpu input dtypes are aligned and casted in mlops * dict union only after python3.9 * fix that * fix Sigmoid forward cast	2023-12-17 15:32:30 -05:00
George Hotz	6d6eb9302d	ruff checks the max line length is 150 (#2734 ) * ruff checks the max line length is 150 * fix tensor.py * a lot more * done	2023-12-12 17:34:47 -08:00
wozeparrot	6d58c19736	binaryops xor (#2627 ) * feat: initial xor * feat: numpy xor * feat: llvm xor * feat: quick test for xor * feat: slightly working xor in torch * feat: xor in tensor * feat: slightly better test	2023-12-05 13:21:42 -08:00
George Hotz	d6b404ac11	No dtype alloc (#2570 ) * fix all allocs * improve docs * ugh fix fake alloc	2023-12-02 13:29:40 -08:00
George Hotz	f5de21e753	fast path for copy (#2548 ) * fast copy * ruff first * flat_mv on malloc * order + webgpu test	2023-12-01 11:34:47 -08:00
chenyu	7fec966b5e	bye bye NOOP (#2534 ) * bye bye NOOP * SIN * NEG	2023-11-30 23:10:35 -08:00
George Hotz	12fa846122	zero copy (#2531 ) * zero copy * zero copy test * loads coder in milliseconds * zero copy for cpu and torch * src_from_buffer is None * SLOW_METAL_COPY there	2023-11-30 18:38:41 -08:00
George Hotz	2c363b5f0b	new style device (#2530 ) * cpu tests pass * torch works * works * metal works * fix ops_disk * metal jit works * fix openpilot * llvm and clang work * fix webgpu * docs are rly broken * LRU works on metal * delete comment * revert name to ._buf. LRU only on Compiled * changes * allocator * allocator, getting closer * lru alloc * LRUAllocator * all pass * metal * cuda * test examples * linearizer * test fixes * fix custom + clean realize * fix hip * skip tests * fix tests * fix size=0 * fix MOCKHIP * fix thneed * copy better * simple * old style metal copy * fix thneed * np reshape * give cuda a device	2023-11-30 17:07:16 -08:00
George Hotz	6707f2588e	use copyin (#2500 ) * it's always copyin * all RawBuffer are RawBufferCopyIn * cleanups * this fixes it * requirements='C' * more correct	2023-11-29 09:34:00 -08:00
George Hotz	5629fc368c	Use Buffer.STORE at the end of ASTs (#2494 ) * work * store broken * interpreteds work * this passes * symbolic cpu * fix tests * fix opt tests * images fail * fix InterpretedFlopCounter * stupid hack for images	2023-11-28 20:11:37 -08:00
George Hotz	ab5d14d4ba	MEM -> LOAD (#2492 ) * MEM -> LOAD * keep legacy working	2023-11-28 16:46:37 -08:00
George Hotz	756b01f46f	why were these ever called buffer (#2483 )	2023-11-27 21:02:07 -08:00
George Hotz	9e07824542	move device to device.py (#2466 ) * move device to device.py * pylint test --disable R,C,W,E --enable E0611 * fix tests	2023-11-27 11:34:37 -08:00
George Hotz	8e9cdef61f	clean up the buffers (#2447 ) * clean up the buffers * remove allocate_output * functools.lru_cache is methodcache * add TestShapeTrackerSize * cache_clear * no 0 sz buffer, add _ on functions that shouldn't be imported * fix size * if -> while	2023-11-26 11:02:29 -08:00
George Hotz	8f89e21fca	torch and numpy don't share ops anymore (#2412 ) * torch and numpy don't share ops anymore * that should be filtered out elsewhere * still const * graph + enet example cleanup * hmm, we do still need it because of symbolic	2023-11-23 16:58:10 -08:00
George Hotz	0505c5ea50	remove force_wait, refactor to graph (#2405 ) * remove force_wait * refactor * get rid of stupid ASTRunner * fix del in diskbuffer * BufferOps.FROM_UNDERLYING * put offset in the rawbuffer * fix bugs * use exec	2023-11-23 12:46:07 -08:00
George Hotz	9b58d4cb37	cleanup unused movement ops (#2353 ) * cleanup_mops * no expand * nothing * revert that * add comment * add correctness check to disk tensor	2023-11-18 09:19:02 -08:00
chenyu	d2c0035c73	add back as_strided, move rebuilt mops to extra (#2344 ) * add back as_strided, move rebuilt mops to extra * negative stride for ops_cpu * Revert "negative stride for ops_cpu" This reverts commit `a13b6815ac`. * skip that * style	2023-11-17 14:34:30 -05:00
forcefieldsovereign	b64738e1d6	Remove AS_STRIDED from shapetracker (#2216 ) * very close * remove comment * negative strides working * almost everything passes * calculate offset with list comprehension * some cleanup * got disk load working * review suggestions * fix after merge * overlap working * did it * clean * fixed disk load * lint * mypy * removed as_strided * trying without simplify * added back simplify * make sure expanding to smaller shape * cleanup * removed comment * removed env file * trying whisper test again * onnx test sqlite issue * working on test * finished test * eliminate unnecessary shrink-then-pad * don't shrink buffer * added strides check * added to ci under linters * switch issue * allow symbolic stride * removed .env * isinstance * adjust strides for double expand * cleanup * needed to add type hint for mypy * set pythonpath	2023-11-15 15:50:17 -05:00
George Hotz	70a65c201e	JIT support in Interpreted (#2314 ) * factor that out * jit is supported everywhere * fix some tests * there's no jit supported device, the jit is everywhere * fix test uops	2023-11-15 11:13:38 -08:00
George Hotz	4da2ddea6e	Interpreted cleanups (#2312 ) * move the compiler out of ops * don't return realized * var_vals filter, fix custom * typing	2023-11-15 09:02:23 -08:00
George Hotz	4f7b1ac0d2	cleanups before interpreted jit (#2306 ) * jit mnist * InterpretedFlopCounter doesn't rely on Interpreted * allocator for cpu and torch * types for exec_ast * fix type issues * fix onnx, remove print * always self.from_underlying	2023-11-14 21:44:25 -08:00
geohotstan	b853e9bb8c	Onnx 1.15.0 gogogo (#2217 ) * lol * lol * add GELULULULUL * onnx 1.50 * fuk torch bool neg * exclude regex tests * exclude dequantizelinear for now * is sunny in philly * damn it affinegrid * fixed auto_pad VALID * skip 0 shape tests * add temporary cast in Reduces * tests should pass now * added comments and cleanup * try moving dequantizelinear to onnx.py * fixed dequantizedlinear? * cleanup * try? * float16 segfaults LLVM CI..??? * cleanup comments * pin to 1.50.0 * remove use of -np.inf cuz numpy is kill * 1.50? lol I'm actually retarded * thx for review, muhbad * moved Gelu higher up	2023-11-10 15:36:48 -08:00
George Hotz	9ea0448103	compile interpreted to python code (#2208 ) * sort of works * interpreted * fix flopcounter * interpreted * simpler * type * functools compile ast * lose a line * delete extra file * no self.method_cache	2023-11-03 09:16:12 -07:00

1 2

85 commits