mirrors/tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-06-24 02:14:17 +00:00

Author	SHA1	Message	Date
George Hotz	c3f99a727e	objc fast msg (#8922 ) * benchmark kernel launch * don't realize unneeded * faster * faster metal * fix mypy * new objc message style [pr] * without sync * no div 0 * lru cache that * no sync in the profile * fix * update all to new style * remove comment * graph one kernel * fix graph one kernel * remove that sync	2025-02-06 17:49:06 +08:00
qazal	a2e7e49fe1	prepickle scheduler process replay [pr] (#8924 )	2025-02-06 10:16:36 +01:00
qazal	89d7480b0c	hotfix: don't sink views [pr] (#8923 )	2025-02-06 09:15:12 +01:00
George Hotz	0cbb7d7f1e	hotfix: metal has known sync issue	2025-02-06 14:29:41 +08:00
George Hotz	a8e54df363	benchmark single kernel launch (#8921 ) * benchmark kernel launch * don't realize unneeded * faster * faster metal * fix mypy * without sync * no div 0 * lru cache that * no sync in the profile	2025-02-06 13:35:34 +08:00
George Hotz	3e082d4a9d	add float4 support to LLVM (#8920 ) * add float4 support to LLVM * is_bool	2025-02-06 12:15:50 +08:00
George Hotz	b05c536f74	cleanup some llvm stuff [pr] (#8919 ) * cleanup some llvm stuff [pr] * debug * default to newer llvm * repr	2025-02-06 11:45:03 +08:00
Josh Moore	44e0eab8fd	Fix AttributeError occurring after ValueError in _apply_uop (#8905 ) * Fix AttributeError occurring after ValueError in _apply_uop * Update tensor.py --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-02-06 10:56:29 +08:00
chenyu	30695da256	remove Tensor._to_const_val (#8917 ) * remove Tensor._to_const_val added a TODO for advance indexing on const, which was the last place that checks const in Tensor * that is not folding now * one more	2025-02-05 21:44:39 -05:00
George Hotz	d09b5f801c	don't use Tensor new, add to all_tensors after constructions [pr] (#8918 )	2025-02-06 10:21:32 +08:00
FICTURE7	759b3f86bf	Pass host CPU features to LLVM target (#8909 ) * Pass host CPU features to LLVM target This gets `test_gemm_fp16` to pass on Windows. It would fail because the generated machine code would call compiler-rt functions to to perform truncating. This gets the test to pass on some hardware, because LLVM gets access to more instructions. Essentially this is similar to `-march=native`. Unless this was intentionally left as is to be re-implemented fully in LLVM IR or something. * Fix linter complaints	2025-02-06 10:19:30 +08:00
uuuvn	09ec33a578	Better errors when relocating against undefined symbol (#8902 )	2025-02-06 10:13:44 +08:00
chenyu	488200f16c	move more pow const to rewrite (#8916 ) * move more pow const to rewrite one less use of _to_const_val * fix	2025-02-05 20:30:12 -05:00
chenyu	76671381aa	move positive const ** t to a rewrite rule (#8914 ) * move positive const ** t to a rewrite rule * one more test	2025-02-05 19:30:12 -05:00
Ignacio Sica	cad44f5f42	add Half-Precision Accumulation Support for Tensor Cores in NV, CUDA, and PTX (#8680 ) * ptx and nv rendering refactor to work with half acc * ptx fix! * use same reg for acc and out * fix comment * another fix * minor change in commet * fix --------- Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>	2025-02-05 16:56:37 -05:00
nimlgen	17f9b1cef6	am: load fw based on versions (#8913 ) * am: load fw based on versions * ops * ops2	2025-02-06 00:02:09 +03:00
chenyu	189bfa164e	enable backward test for pow(neg const x) (#8912 ) backward works now. 0x still does not work because it's a special case fixed in transcendental	2025-02-05 15:35:21 -05:00
chenyu	9307572fe3	Ops.POW and transcendental (#8911 )	2025-02-05 15:15:59 -05:00
nimlgen	bff7c70eef	hcq: better var check (#8908 )	2025-02-05 22:38:59 +03:00
Ignacio Sica	aec3b8d515	add regression test: `test_get_kernel_actions_preserves_actions_state` (#8907 ) * test_get_kernel_actions_preserves_actions_state * simplify * simplify * refactor assert message	2025-02-05 14:13:01 -05:00
qazal	e71497aabc	move assign ShapeTracker check to pattern matcher [pr] (#8906 ) * move assign ShapeTracker check to pattern matcher [pr] * rename the st uop to view	2025-02-05 19:47:20 +01:00
Ignacio Sica	0f6109ec00	hotfix bug in `get_kernel_actions` after `TC_SEARCH_OVER_SHAPE` was introduced (#8904 ) * hotfix search bug * copy actions	2025-02-05 13:10:05 -05:00
Ignacio Sica	15f94ac964	TC_SEARCH_OVER_SHAPE to search multiple TC shapes (#8793 ) * squash search over search * refactor assert * init benchmark * cleaner get_kernel_actions * cleaner get_kernel_actions * add comment	2025-02-05 11:03:46 -05:00
qazal	e7edadda54	construct the sched_sink with graph_rewrite [pr] (#8903 ) * construct the sched_sink with graph_rewrite * diff * move break_sched	2025-02-05 15:16:48 +01:00
qazal	ef7ad3f077	simpler subbuffer construction + copyin is always base (#8900 ) * realize copy * cleanup buffer_view * smaller	2025-02-05 09:10:20 +01:00
qazal	6f0cc2e9c5	rename to KernelContext and move the linearize_sched comment [pr] (#8899 ) * rename to KernelContext and move that comment [pr] * 500	2025-02-05 07:49:58 +01:00
geohotstan	6fb0e5751b	hotfix test_onnx_imagenet (#8897 ) * start * log severity * only change this * change abstraction so it's more usable for huggingface * WHOOPS * actually this is more correct	2025-02-05 14:39:55 +08:00
George Hotz	c1c5227acb	preserve size in dtype ptr [pr] (#8898 )	2025-02-05 14:38:57 +08:00
George Hotz	5844883e59	bump master version v0.10.1	2025-02-05 09:08:28 +08:00
uuuvn	a51c688f39	Cleanup llvm cleanup (and some clang things too) (#8871 ) * Cleanup llvm cleanup (and some clang things too) * Tests * Tests 2 * forgot mockgpu * more print some sources	2025-02-05 07:49:05 +08:00
eliotgolding	bb5ded85cc	Don't rewrite idiv to rshift when numerator is negative (#8885 ) * more conditions for shift rewrite mul/idiv * make ptx test uint so the new condition is true * delete idiv test * rewrite to 0 is wrong for idiv, as denominator is cast to 0 before division * mul/div by 2**(large count) is unsupported anyway	2025-02-05 07:47:33 +08:00
pedro	666b6149bc	Use full soname for libgcc_s in CPUProgram (#8642 ) (#8896 ) Number after .so is abi version, it is always 1 for libgcc_s. Most linux systems set default library versions via symlinks that are simply followed to get actual elf, however conda does it via linker scripts which ctypes doesn't follow (below contents of libgcc_s.so): ``` /* GNU ld script Use the shared library, but some functions are only in the static library. */ GROUP ( libgcc_s.so.1 -lgcc ) ``` ctypes.util.find_library thinks that this is the actual elf and ctypes.CDLL just loads this text file as a shared library. The result is: ``` File "/home/me/src/tinygrad/tinygrad/device.py", line 223, in CPUProgram helper_handle = ctypes.CDLL(ctypes.util.find_library('System' if OSX else 'gcc_s')) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/me/miniforge3/envs/tinygrad/lib/python3.12/ctypes/__init__.py", line 379, in __init__ self._handle = _dlopen(self._name, mode) ^^^^^^^^^^^^^^^^^^^^^^^^^ OSError: /home/me/miniforge3/envs/tinygrad/lib/libgcc_s.so: invalid ELF header ``` Co-authored-by: uuuvn <83587632+uuuvn@users.noreply.github.com>	2025-02-05 07:45:48 +08:00
chenyu	48349efdc1	copy is already contiguous (#8886 )	2025-02-04 17:53:33 -05:00
nimlgen	4c28235bd1	am: remove hardcodes (#8895 ) * am: remove hardcodes for 7900 * h	2025-02-05 00:52:53 +03:00
geohotstan	057c70b05f	add onnx_helpers to extra and add ort validate to benchmark_onnx (#8890 ) * start * log severity * only change this * change abstraction so it's more usable for huggingface --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-04 16:36:01 -05:00
chenyu	89eebd4bfb	pow cleanups (#8894 ) more readable	2025-02-04 15:52:57 -05:00
qazal	7a9e3247c2	simple start to the Kernel UOp [pr] (#8893 ) * simple start to a kernel [pr] * add the sched_sink and spec * rename kernels to sinks * pylint complains	2025-02-04 21:48:15 +01:00
qazal	b4e8878e01	remove tensor_uops tracking from ScheduleContext [pr] (#8892 ) * remove tensor_uops tracking from ScheduleContext [pr] * cleaner	2025-02-04 20:34:15 +01:00
qazal	6a0da51ed0	truncate process replay logs [pr] (#8891 ) * truncate process replay logs [pr] * work * max_lines * bump to 1K	2025-02-04 20:26:48 +01:00
qazal	c7c279a6bd	unbind ShapeTrackers without maintaining a cache [pr] (#8889 ) * replace with a try [pr] * check vars * ahaa	2025-02-04 19:43:41 +01:00
chenyu	61de654efa	minor shard cleanup [pr] (#8888 )	2025-02-04 13:22:31 -05:00
qazal	6ec7f1b00f	replace UPat(name="x") with UPat.var("x") [pr] (#8887 ) * replace UPat(name="x") with UPat.var("x") [pr] * a few more	2025-02-04 19:12:40 +01:00
qazal	c26b06eaeb	delete fold_img_cast [pr] (#8875 )	2025-02-04 18:43:45 +01:00
qazal	acf0baefee	process replay from tensor uops to kernel ast (#8883 ) * process replay from tensor uops to kernel ast * this dedups * switch back to string key	2025-02-04 18:09:20 +01:00
Ignacio Sica	dcf104ee68	ptx wmma render refactor (#8873 ) Co-authored-by: chenyu <chenyu@fastmail.com>	2025-02-04 11:01:23 -05:00
qazal	b92f36179d	don't use set in schedule + add GroupOp.All [pr] (#8882 ) * don't use set in schedule + add GroupOp.All [pr] * update that	2025-02-04 08:19:27 +01:00
George Hotz	56fa5c1191	dsp simulator (#8869 ) * dsp simulator * progress * fix * close on test tiny * working * less waste * line savings * Device DSP compiler * mock DSP at the bottom * DSP tests * docker caching * test update * need load * skip that test for CI DSP * last touch * ugh	2025-02-04 09:45:04 +08:00
chenyu	836cf42c2e	fix rand_like for multi (#8880 )	2025-02-03 19:00:14 -05:00
chenyu	746d899dbd	move multi axis to property (#8879 ) also updated tests so that axis is known prior to realize	2025-02-03 16:02:09 -05:00
nimlgen	fa90079370	amd: reallocate scratch (#8872 ) * amd: reallocate scratch * use it * oops * allocate default * mypy * ops * address realloc from none better * types correct * this better * ops * rm	2025-02-03 23:21:37 +03:00

... 66 67 68 69 70 ...

11,106 commits