mirrors/tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-06-24 02:14:17 +00:00

Author	SHA1	Message	Date
Yixiang Gao	4f89f8b73a	make sure the old hyp breaks the test	2024-01-03 07:13:54 -08:00
Yixiang Gao	84eb6dd32a	skip GPU cause opencl on intel can't compile half	2024-01-03 07:07:21 -08:00
Yixiang Gao	73879b50ad	only need to check the min_lr for the nan bug	2024-01-03 07:00:50 -08:00
Yixiang Gao	99f8740c60	running half in CI CPU is slow	2024-01-02 18:44:35 -08:00
Yixiang Gao	781690fd99	how long it takes on CI CPU without the lr scheduler	2024-01-02 18:33:48 -08:00
Yixiang Gao	dd00bcb9c0	fix whitespace	2024-01-02 18:16:33 -08:00
Yixiang Gao	841487cad9	add half test with using hyp from benchmarks	2024-01-02 18:14:30 -08:00
George Hotz	f494b9d463	simple multitensor API (#2903 ) * simple multitensor API * test multitensor * mt work * new api * copies * all but data parallel * allreduce there * works, but axis sharded * fix all mt tests * features/multi * work * backprop * fix tests * tests passing * mt progress * cleanups * less lines * tensor cleanup * save more lines * mypy passes * fix tests * skip for cuda too * bump download cache	2024-01-02 17:49:44 -08:00
George Hotz	5522ba234b	simplify image functions (#2987 ) * simplify image functions * line in tensor	2024-01-02 17:35:08 -08:00
chenyu	6e9406c986	one list comprehension in search action (#2988 ) instead of list of list then flatten	2024-01-02 20:29:26 -05:00
chenyu	08a34faea8	pass tuple for strs to startswith (#2986 )	2024-01-02 19:51:15 -05:00
George Hotz	dbe4a1a914	switch CI to tiny8 (#2984 ) * switch CI to tiny8 * no copyin for disk * Revert "no copyin for disk" This reverts commit `eb46b7e93d`. * rocm 6 broke llama * rename it	2024-01-02 16:40:25 -08:00
Yixiang Gao	b753d280f7	move hyp out of the train so it can be imported	2024-01-02 15:56:17 -08:00
chenyu	0dd3ca59cd	simpler ModNode.__mod__ and ModNode.__floordiv__ (#2983 ) `gcd(self.b, b) == b` is equivalent to `self.b % b == 0`. use the same condition and format in __floordiv__ too.	2024-01-02 18:52:42 -05:00
chenyu	c07907e644	grad -> grad_output in mlops for consistency (#2982 )	2024-01-02 18:03:55 -05:00
Yixiang Gao	54cdba57e7	mend	2024-01-02 14:21:06 -08:00
Yixiang Gao	26303d181b	re-enable half cifar benchmarks	2024-01-02 14:16:35 -08:00
Yixiang Gao	2e4d9ad936	adjsut div factor to avoid underflow	2024-01-02 13:47:13 -08:00
chenyu	ad0d710ec4	merge apply_opt OptOps.LOCAL and OptOps.LASTLOCAL into one block (#2980 ) and other minor apply_opt cleanups	2024-01-02 16:40:10 -05:00
George Hotz	8de160d08e	hotfix: remove dead code, save lines	2024-01-02 12:52:20 -08:00
chenyu	878e869663	simpler SumNode.__mod__ (#2979 ) * simpler SumNode.__mod__ delegate simplification to individual node * ModNode.__mod__ simplification case * Revert "ModNode.__mod__ simplification case" This reverts commit `73a42205a8`.	2024-01-02 15:09:15 -05:00
chenyu	91ddda244f	minor cleanups in dtype.py (#2978 ) * minor cleanups in dtype.py * all not	2024-01-02 13:42:37 -05:00
chenyu	ff5399f053	move one last dtype test from test_helpers to test_dtype (#2975 )	2024-01-02 12:37:56 -05:00
qazal	deb3722aac	refactor workitems (#2973 )	2024-01-02 09:16:52 -08:00
qazal	01cdd6596f	share hip and cuda (#2972 )	2024-01-02 06:34:24 -08:00
Kevin Herro	bd6a0c90a0	add Tensor.split (#2750 ) * add Tensor.split (#2677) * fix mypy errors * add list support for Tensor.split * fix ruff comments * match tensor.split api * simplify split and test_split --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2024-01-01 22:09:04 -08:00
George Hotz	e7a432b479	search refactor (#2969 ) * minor search cleanup * now that saves lines * fix	2024-01-01 17:39:26 -08:00
chenyu	b1d9e54ea3	regenerate kernel ast dataset (#2968 ) added back the log ast function and removed hacks that work around the old dataset	2024-01-01 20:26:17 -05:00
George Hotz	cc2969f690	simpler cstyle (#2966 ) * simpler cstyle * save lines	2024-01-01 16:20:10 -08:00
George Hotz	17f0c3006b	hotfix: do stable diffusion first on mac	2024-01-01 15:38:25 -08:00
chenyu	58d3d5030b	vars_from_ast -> LazyOp.vars (#2965 )	2024-01-01 18:12:38 -05:00
George Hotz	980f421442	hotfix: remove cast from beautiful_cartpole	2024-01-01 15:02:03 -08:00
George Hotz	a280cfe169	move dtypes to dtype.py (#2964 ) * move dtypes to dtype.py * fix urllib	2024-01-01 14:58:48 -08:00
chenyu	fadaa2ec28	remove type check for LazyOp.src now it's always LazyOp (#2963 ) * remove type check for LazyOp.src now it's always LazyOp also matched MULACC criteria between interpreted and compiled (that probably need to be refactored somewhere else) * disable that test	2024-01-01 17:27:29 -05:00
George Hotz	c81ce9643d	move globalcounters to ops (#2960 ) * move globalcounters to ops * missed a few * sick of that failing	2024-01-01 14:21:02 -08:00
chenyu	8291986959	Variable.sum -> Node.sum, Variable.ands -> Node.ands (#2961 )	2024-01-01 16:21:28 -05:00
chenyu	3d720b5761	move expand_idx, iter_idxs and expand_node from symbolic to linearizer (#2959 )	2024-01-01 14:41:21 -05:00
George Hotz	e0ecab3797	touchups from multibuffer branch (#2958 )	2024-01-01 11:33:41 -08:00
George Hotz	45247385eb	hotfix: make the line counter correct	2024-01-01 11:01:22 -08:00
George Hotz	56f44bd10e	move the compiler cache to be global (#2957 ) * move the compiler cache to be global * remove non robust test * remove dead code	2024-01-01 10:59:56 -08:00
George Hotz	063f465604	simpler webgpu (#2956 ) * simpler webgpu * skip that test	2024-01-01 10:28:59 -08:00
Shawn Hagler	fea20d71b3	add `/opt/cuda/include` directory (#2920 )	2023-12-30 08:16:42 -08:00
chenyu	0d6e264c48	cleanup Tensor.triu and Tensor.tril (#2953 ) `.where` does the dtype and shape conversions for 0, no need to use zeros_like	2023-12-29 22:27:18 -05:00
chenyu	e53b96fdbb	fix TC=2 tensor core op test (#2951 ) * print DEBUG for TC=2 in CI * enable TC=2 * no need to check src type * LOAD has side effect * don't push any local buffer * update comment * and BARRIER	2023-12-29 21:39:49 -05:00
chenyu	ad4472e6e8	cleanup llama apply_rotary_emb and other helpers (#2950 ) * cleanup llama apply_rotary_emb and other helpers used ellipsis and other higher level tensor function. disabled the half @ half -> half tensor core as it fails uop dtype checks * keep hip 8x8->8 wmma	2023-12-29 11:39:15 -05:00
chenyu	61e255d197	use max for gpt2 and llama (#2949 ) not using argmax yet because there's a multinomial outside of function.	2023-12-28 23:26:00 -05:00
chenyu	c7b106bf9c	hotfix float4 only supports float and half (#2948 ) #2942 broke coder	2023-12-28 20:23:52 -05:00
chenyu	2f67f1e580	remove obsolete TODO in beautiful_mnist (#2946 ) the compiler error was due to `error: call to 'max' is ambiguous` when we have max(int, float) in kernel. it was first fixed in `4380ccb1` the non fp32 math PR, and further solidified with dtype refactor	2023-12-28 17:09:23 -05:00
chenyu	50f2e31d26	cleanup float4 grouping in global_load and global_store (#2942 ) * cleanup float4 grouping in global_load and global_store * fix test decorator	2023-12-27 14:10:04 -05:00
chenyu	54629b56d2	minor cleanup in kernel and linearizer (#2937 ) * minor cleanup in kernel and linearizer less long line, spaces and colocate variables * no deadline in hypothesis test	2023-12-26 12:05:32 -05:00

... 156 157 158 159 160 ...

11,106 commits