Commit graph

11,106 commits

Author SHA1 Message Date
Yixiang Gao
4f89f8b73a make sure the old hyp breaks the test 2024-01-03 07:13:54 -08:00
Yixiang Gao
84eb6dd32a skip GPU cause opencl on intel can't compile half 2024-01-03 07:07:21 -08:00
Yixiang Gao
73879b50ad only need to check the min_lr for the nan bug 2024-01-03 07:00:50 -08:00
Yixiang Gao
99f8740c60 running half in CI CPU is slow 2024-01-02 18:44:35 -08:00
Yixiang Gao
781690fd99 how long it takes on CI CPU without the lr scheduler 2024-01-02 18:33:48 -08:00
Yixiang Gao
dd00bcb9c0 fix whitespace 2024-01-02 18:16:33 -08:00
Yixiang Gao
841487cad9 add half test with using hyp from benchmarks 2024-01-02 18:14:30 -08:00
George Hotz
f494b9d463
simple multitensor API (#2903)
* simple multitensor API

* test multitensor

* mt work

* new api

* copies

* all but data parallel

* allreduce there

* works, but axis sharded

* fix all mt tests

* features/multi

* work

* backprop

* fix tests

* tests passing

* mt progress

* cleanups

* less lines

* tensor cleanup

* save more lines

* mypy passes

* fix tests

* skip for cuda too

* bump download cache
2024-01-02 17:49:44 -08:00
George Hotz
5522ba234b
simplify image functions (#2987)
* simplify image functions

* line in tensor
2024-01-02 17:35:08 -08:00
chenyu
6e9406c986
one list comprehension in search action (#2988)
instead of list of list then flatten
2024-01-02 20:29:26 -05:00
chenyu
08a34faea8
pass tuple for strs to startswith (#2986) 2024-01-02 19:51:15 -05:00
George Hotz
dbe4a1a914
switch CI to tiny8 (#2984)
* switch CI to tiny8

* no copyin for disk

* Revert "no copyin for disk"

This reverts commit eb46b7e93d.

* rocm 6 broke llama

* rename it
2024-01-02 16:40:25 -08:00
Yixiang Gao
b753d280f7 move hyp out of the train so it can be imported 2024-01-02 15:56:17 -08:00
chenyu
0dd3ca59cd
simpler ModNode.__mod__ and ModNode.__floordiv__ (#2983)
`gcd(self.b, b) == b` is equivalent to `self.b % b == 0`.
use the same condition and format in __floordiv__ too.
2024-01-02 18:52:42 -05:00
chenyu
c07907e644
grad -> grad_output in mlops for consistency (#2982) 2024-01-02 18:03:55 -05:00
Yixiang Gao
54cdba57e7 mend 2024-01-02 14:21:06 -08:00
Yixiang Gao
26303d181b re-enable half cifar benchmarks 2024-01-02 14:16:35 -08:00
Yixiang Gao
2e4d9ad936 adjsut div factor to avoid underflow 2024-01-02 13:47:13 -08:00
chenyu
ad0d710ec4
merge apply_opt OptOps.LOCAL and OptOps.LASTLOCAL into one block (#2980)
and other minor apply_opt cleanups
2024-01-02 16:40:10 -05:00
George Hotz
8de160d08e hotfix: remove dead code, save lines 2024-01-02 12:52:20 -08:00
chenyu
878e869663
simpler SumNode.__mod__ (#2979)
* simpler SumNode.__mod__

delegate simplification to individual node

* ModNode.__mod__ simplification case

* Revert "ModNode.__mod__ simplification case"

This reverts commit 73a42205a8.
2024-01-02 15:09:15 -05:00
chenyu
91ddda244f
minor cleanups in dtype.py (#2978)
* minor cleanups in dtype.py

* all not
2024-01-02 13:42:37 -05:00
chenyu
ff5399f053
move one last dtype test from test_helpers to test_dtype (#2975) 2024-01-02 12:37:56 -05:00
qazal
deb3722aac
refactor workitems (#2973) 2024-01-02 09:16:52 -08:00
qazal
01cdd6596f
share hip and cuda (#2972) 2024-01-02 06:34:24 -08:00
Kevin Herro
bd6a0c90a0
add Tensor.split (#2750)
* add Tensor.split (#2677)

* fix mypy errors

* add list support for Tensor.split

* fix ruff comments

* match tensor.split api

* simplify split and test_split

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-01-01 22:09:04 -08:00
George Hotz
e7a432b479
search refactor (#2969)
* minor search cleanup

* now that saves lines

* fix
2024-01-01 17:39:26 -08:00
chenyu
b1d9e54ea3
regenerate kernel ast dataset (#2968)
added back the log ast function and removed hacks that work around the old dataset
2024-01-01 20:26:17 -05:00
George Hotz
cc2969f690
simpler cstyle (#2966)
* simpler cstyle

* save lines
2024-01-01 16:20:10 -08:00
George Hotz
17f0c3006b hotfix: do stable diffusion first on mac 2024-01-01 15:38:25 -08:00
chenyu
58d3d5030b
vars_from_ast -> LazyOp.vars (#2965) 2024-01-01 18:12:38 -05:00
George Hotz
980f421442 hotfix: remove cast from beautiful_cartpole 2024-01-01 15:02:03 -08:00
George Hotz
a280cfe169
move dtypes to dtype.py (#2964)
* move dtypes to dtype.py

* fix urllib
2024-01-01 14:58:48 -08:00
chenyu
fadaa2ec28
remove type check for LazyOp.src now it's always LazyOp (#2963)
* remove type check for LazyOp.src now it's always LazyOp

also matched MULACC criteria between interpreted and compiled (that probably need to be refactored somewhere else)

* disable that test
2024-01-01 17:27:29 -05:00
George Hotz
c81ce9643d
move globalcounters to ops (#2960)
* move globalcounters to ops

* missed a few

* sick of that failing
2024-01-01 14:21:02 -08:00
chenyu
8291986959
Variable.sum -> Node.sum, Variable.ands -> Node.ands (#2961) 2024-01-01 16:21:28 -05:00
chenyu
3d720b5761
move expand_idx, iter_idxs and expand_node from symbolic to linearizer (#2959) 2024-01-01 14:41:21 -05:00
George Hotz
e0ecab3797
touchups from multibuffer branch (#2958) 2024-01-01 11:33:41 -08:00
George Hotz
45247385eb hotfix: make the line counter correct 2024-01-01 11:01:22 -08:00
George Hotz
56f44bd10e
move the compiler cache to be global (#2957)
* move the compiler cache to be global

* remove non robust test

* remove dead code
2024-01-01 10:59:56 -08:00
George Hotz
063f465604
simpler webgpu (#2956)
* simpler webgpu

* skip that test
2024-01-01 10:28:59 -08:00
Shawn Hagler
fea20d71b3
add /opt/cuda/include directory (#2920) 2023-12-30 08:16:42 -08:00
chenyu
0d6e264c48
cleanup Tensor.triu and Tensor.tril (#2953)
`.where` does the dtype and shape conversions for 0, no need to use zeros_like
2023-12-29 22:27:18 -05:00
chenyu
e53b96fdbb
fix TC=2 tensor core op test (#2951)
* print DEBUG for TC=2 in CI

* enable TC=2

* no need to check src type

* LOAD has side effect

* don't push any local buffer

* update comment

* and BARRIER
2023-12-29 21:39:49 -05:00
chenyu
ad4472e6e8
cleanup llama apply_rotary_emb and other helpers (#2950)
* cleanup llama apply_rotary_emb and other helpers

used ellipsis and other higher level tensor function.
disabled the half @ half -> half tensor core as it fails uop dtype checks

* keep hip 8x8->8 wmma
2023-12-29 11:39:15 -05:00
chenyu
61e255d197
use max for gpt2 and llama (#2949)
not using argmax yet because there's a multinomial outside of function.
2023-12-28 23:26:00 -05:00
chenyu
c7b106bf9c
hotfix float4 only supports float and half (#2948)
#2942 broke coder
2023-12-28 20:23:52 -05:00
chenyu
2f67f1e580
remove obsolete TODO in beautiful_mnist (#2946)
the compiler error was due to `error: call to 'max' is ambiguous` when we have max(int, float) in kernel.
it was first fixed in 4380ccb1 the non fp32 math PR, and further solidified with dtype refactor
2023-12-28 17:09:23 -05:00
chenyu
50f2e31d26
cleanup float4 grouping in global_load and global_store (#2942)
* cleanup float4 grouping in global_load and global_store

* fix test decorator
2023-12-27 14:10:04 -05:00
chenyu
54629b56d2
minor cleanup in kernel and linearizer (#2937)
* minor cleanup in kernel and linearizer

less long line, spaces and colocate variables

* no deadline in hypothesis test
2023-12-26 12:05:32 -05:00