Commit graph

55 commits

Author SHA1 Message Date
George Hotz
a280cfe169
move dtypes to dtype.py (#2964)
* move dtypes to dtype.py

* fix urllib
2024-01-01 14:58:48 -08:00
chenyu
8291986959
Variable.sum -> Node.sum, Variable.ands -> Node.ands (#2961) 2024-01-01 16:21:28 -05:00
chenyu
3d720b5761
move expand_idx, iter_idxs and expand_node from symbolic to linearizer (#2959) 2024-01-01 14:41:21 -05:00
chenyu
50f2e31d26
cleanup float4 grouping in global_load and global_store (#2942)
* cleanup float4 grouping in global_load and global_store

* fix test decorator
2023-12-27 14:10:04 -05:00
chenyu
820f2e054e
fix PADTO optimization (#2935)
the correct condition is that PADTO cannot be applied to reduce axis, not Reduce.MAX in ops.
even for Reduce.SUM it's possible that the reduce axis had a div before, and the padded 0 became inf then sum over it is incorrect.
2023-12-25 22:52:49 -05:00
chenyu
50927defad
s/lazydata.realized/lazydata.base.realized/g (#2914)
* s/lazydata.realized/lazydata.base.realized/g

* not that
2023-12-22 14:45:13 -05:00
George Hotz
1765849937
new lazy, benchmark (#2878)
* lazy rewrite, try 2

* min fix tests

* pass contig test

* put broken pads back

* move that to realize

* no contig child fixes array packing

* so wrong

* now that's correct

* base children

* fix bind issues

* disable to_image_idx

* fix tests

* that failure shouldn't break other tests

* more fixes

* fix torch

* skip failing tests in CI

* 1e-7

* half is broken

* 1e-6 margin of error
2023-12-20 14:33:21 -08:00
George Hotz
8fe24038d8
Revert "mulacc fusion cleanup (#2871)" (#2876)
This reverts commit 863c5b26ed.
2023-12-20 13:26:25 -08:00
qazal
863c5b26ed
mulacc fusion cleanup (#2871)
* add mulacc fusion tests

* cleanup the implementation

* fix indent in the test utility

* less verbose
2023-12-20 15:39:54 -05:00
qazal
5f07ef455e
update dtypes (#2872) 2023-12-20 15:04:02 -05:00
George Hotz
90fb09b55c remove unused _device_extra_args 2023-12-18 22:14:58 -08:00
chenyu
e4bbbc5bc3
Revert "Use the reduceop dtype to define the acc in linearizer (#2625)" (#2783)
This reverts commit f3ed96a929.
2023-12-15 16:29:10 -05:00
qazal
f3ed96a929
Use the reduceop dtype to define the acc in linearizer (#2625)
* upcast the other way

* Revert "upcast the other way"

This reverts commit 355692ba79.

* remove uop cast, this should have never been there

* add regression test

* now fuzz it

correct test

* the accumulator is always the output type

lint

* fuzz all reduce ops

* MULACC upcast_dtype could be half too

opencl supports it https://man.opencl.org/mad.html

* cast to the same dtype is a noop

* internal casting support for MULACC

* fuzz test mulacc internal casting

* get_reduce_dtype

handle vectorized acc

update get_reduce_acc calls with the correct dtype

update tests

* pending _complete_ implementation of a function that gets the dtype based on self.reduceop

+more failing tests

* get_reduce_dtype try 2

add TODO

* get_lazyop_info already does it

* cleanup

* bring back internal casting support for mulacc

* use the scalar version of the acc dtype

* conceptual diff cleanup

* one extra line to a cleaner linearizer

* correct test assumptions - these should promote?

* rm mulacc cast, the cast of vins happens with the acc dtype promotion

linearizer hacks

* Revert "rm mulacc cast, the cast of vins happens with the acc dtype promotion"

This reverts commit afdd540733.

Revert "correct test assumptions - these should promote?"

This reverts commit 49ae2206ed.

* skip tests blocked by MULACC->lazyop cleanup

* final changes to add back internal casting for MULACC and update skip test logic, upcast works but downcast does not

* only test the linearizer abstraction layer

we wanna ensure that linearizer matches whatever lazy is returning

* remove unused hypothesis module

* remove mulacc related changes, those will move to the lazy pr

* remove midcast test

* move to helpers

* Revert "remove midcast test"

This reverts commit 86e74d7960.

add TODO with skip

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2023-12-15 16:14:32 -05:00
qazal
3cf4376ce2
test_linearizer cleanup (#2766)
* test_linearizer cleanup

* use unittest.skipIf

* update msg
2023-12-14 17:20:09 -05:00
qazal
746cb5de21
Test coverage for matvec (#2762)
* add test coverage for matvec

* skip devices that don't support locals
2023-12-14 11:34:56 -05:00
George Hotz
6d6eb9302d
ruff checks the max line length is 150 (#2734)
* ruff checks the max line length is 150

* fix tensor.py

* a lot more

* done
2023-12-12 17:34:47 -08:00
Ahmed Harmouche
4b01839774
support vals on WebGPU, run more tests (#2668)
* Vals on webgpu, run more tests

* Skip slow tests, run symbolic ops tests

* Balance out tests
2023-12-07 16:45:21 -08:00
George Hotz
2c363b5f0b
new style device (#2530)
* cpu tests pass

* torch works

* works

* metal works

* fix ops_disk

* metal jit works

* fix openpilot

* llvm and clang work

* fix webgpu

* docs are rly broken

* LRU works on metal

* delete comment

* revert name to ._buf. LRU only on Compiled

* changes

* allocator

* allocator, getting closer

* lru alloc

* LRUAllocator

* all pass

* metal

* cuda

* test examples

* linearizer

* test fixes

* fix custom + clean realize

* fix hip

* skip tests

* fix tests

* fix size=0

* fix MOCKHIP

* fix thneed

* copy better

* simple

* old style metal copy

* fix thneed

* np reshape

* give cuda a device
2023-11-30 17:07:16 -08:00
George Hotz
5629fc368c
Use Buffer.STORE at the end of ASTs (#2494)
* work

* store broken

* interpreteds work

* this passes

* symbolic cpu

* fix tests

* fix opt tests

* images fail

* fix InterpretedFlopCounter

* stupid hack for images
2023-11-28 20:11:37 -08:00
George Hotz
9e07824542
move device to device.py (#2466)
* move device to device.py

* pylint test --disable R,C,W,E --enable E0611

* fix tests
2023-11-27 11:34:37 -08:00
qazal
262cd26d28
Simplify openpilot kernel (#2460)
* a conditional with the same results either way is a noop

* add unit test
2023-11-27 10:02:27 -08:00
George Hotz
0505c5ea50
remove force_wait, refactor to graph (#2405)
* remove force_wait

* refactor

* get rid of stupid ASTRunner

* fix del in diskbuffer

* BufferOps.FROM_UNDERLYING

* put offset in the rawbuffer

* fix bugs

* use exec
2023-11-23 12:46:07 -08:00
George Hotz
66c75f30c6
remove triton (#2396) 2023-11-23 07:40:59 -08:00
George Hotz
8656eebb42
jit doesn't use named tensors (#2393)
* jit doesn't use named tensors

* move to compile2

* remove broken single root junk

* explicit float32

* skip slow test
2023-11-23 00:13:18 -08:00
George Hotz
80e4ad8bf5
faster get_recursive_parents (#2392)
* faster get_recursive_parents

* skip test for those

* full sum works everywhere

* timing

* debug print
2023-11-22 20:37:19 -08:00
chenyu
8798d120bb
autopad shapetracker for BEAM (#2375)
* autopad shapetracker for BEAM

* OptOps.PADTO

* skip that test for now

* correct padding reduce axis

* just 32

* avoid more than double the FLOPs

* cleanups

* test case

* no support for triton and llvm yet

* typos

* symbolic shape would not work

* cannot PADTO with MAX kernel

* advance db version

* no breaking change - don't advance db version

* is triton just python?

* Revert "is triton just python?"

This reverts commit 17e776c25587615e33a3634c2fb0bb8591ce65d4.

* Revert "Revert "is triton just python?""

This reverts commit 6c434c01e1c4b0ea0431ec18632cd859fb3cf260.

* support llvm

* is it really passing in CI only?

* update tests

* oh triton test passed

* simpler

* revert that, with a test

* check if st are the same

* Revert "check if st are the same"

This reverts commit d2a5eac110a5da1af82a2728c883779ef69c3cad.

* update the db version

* rebase artifact
2023-11-22 21:05:25 -05:00
qazal
0eda545946
dtypes.float.vec(sz) (#2386)
* replace all _dtypen with dtype.vec(n)

fix: print works

* conceptul refactor of cstyle render_load logic

* linearizer GEP is explicit that its dtype is the scalar version of localtype

* vectorized global_store and load don't need a conditional
2023-11-22 17:43:14 -08:00
chenyu
a753c8e071
examples of new GPT2 and JIT change (#2261)
* var_vals are global

* working with global ish

* better

* fix export model

* fix tests

* better kv cache

* does it run?

* use where for kvmask

* fix excessive var_vals

* fix import

* how does multigpu use this?

* llama kinda work

* faster and simpler

* cleanup

* fix conversation mode

* test cleanups

* fix one more test

* test cleanup

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2023-11-10 15:07:02 -05:00
chenyu
680cbfdba4
less broken limit_dims_to_max (#2214) 2023-11-04 08:38:06 -07:00
George Hotz
7103b716c4
merge kernel and optimizer (#2200)
* merge kernel and optimizer

* linearize is reentrant

* move global/local size

* clean up linearizer copy

* remove unneeded lin copies

* stop linearizing twice

* oops, that should be None
2023-11-01 15:20:01 -07:00
George Hotz
194e4ad6f8
Revert "optimizer: simplify GROUP and LOCAL to have one of each (#2162)" (#2182)
This reverts commit 8cf0bb9351.
2023-10-30 10:22:26 -07:00
Francis Lam
8cf0bb9351
optimizer: simplify GROUP and LOCAL to have one of each (#2162)
* optimizer: simplify GROUP and LOCAL to have one of each

Now that tensor cores only use LASTLOCAL, we can simplify to use
only that op everywhere.

The only use of GROUP is in matvec hand-coded opts and it doesn't
make a performance difference so switching to use only the top
behavior.

Also adds additional asserts to prevent tensor core dims from
being altered which causes bad kernels to be generated.

* search: remove duplicated actions
2023-10-27 11:37:44 -10:00
Francis Lam
bf3490cdf9
wmma: refactor tensor cores using existing local dims (#2097)
* wmma: refactor tensor cores using existing local dims

* optimizer: fix bad rebase and break after one late local

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2023-10-25 13:10:46 -04:00
Francis Lam
ace6b2a151
optimizer: add test for correctness of opts (#2124)
* optimizer: add test for correctness of opts

Also added OptOps.UPCASTMID to constrain valid axes for opts with
group_for_reduce.

* llvm: fix LinearizerOptions to correctly not has_shared

* optimizer: remove premature test scaffold for TC opts

* search: fix the action space
2023-10-22 08:02:22 -07:00
David Hou
95e17ff0d4
fix wino mask upcast calculation (#2057)
* fix wino mask upcast calculation

* add tests for wino upcast hcopt

* add info to note

* real world wino hcopt test

* wino backward test

* whitespace
2023-10-18 16:54:48 -07:00
George Hotz
90c777d815
remove apply_auto_opt (#2063) 2023-10-13 07:44:14 -07:00
Francis Lam
81c7d750db
test: fix test_linearizer.test_tensor_core test (#2036)
must use apply_tensor_core instead of hand_coded_optimizations
2023-10-10 14:48:28 -07:00
George Hotz
121f7aa8c5
Schedule item (#2012)
* ScheduleItem

* put var_vals in the schedule

* fix tests, wow that proliferated quickly

* not ready to be in the schedule
2023-10-07 08:59:25 -07:00
nimlgen
2ea1dd3e87
no process() in Linearizer (#1966)
* no process() in Linearizer

* more process() clean up
2023-10-04 07:18:42 -07:00
George Hotz
f64d5b3ba8
move to realize.py (#1961)
* move to realize.py

* run_schedule moved
2023-10-03 07:25:40 -07:00
George Hotz
d48a90859c
use the opts from the default device (#1954) 2023-10-02 03:13:46 -07:00
David Hou
d4671cd8e3
use schedule in more places in linearizer tests (#1946)
* pass current linearizer opts to Linearizer in TestFloat4

* use schedule instead of exec_ast hook
2023-10-02 02:22:56 -07:00
David Hou
8e9db88474
expand after expr_idxs in Linearizer.global_load (#1818)
* small changes

* expand in terms of substitute, directly expand g_idxs g_valid

* delete expand_ops

* don't compare using hash

* any instead of in

thanks gijskoning

Co-authored-by: Gijs Koning <gijs-koning@live.nl>

* support tc

* testing code

* no more create_rednode

* maxsize none in view/node

* oops

* undo

* typing

* oops

* oops

* lmao

* lmao

* add expand multi test

* Node.iter_idxs

* type

* type

* delete checks!

* clean up a little?

* expand_idx in symbolic

* un-golf

* play around with types >.>

* test_substitute and also remove an incorrect test?

* get rid of range

* Update symbolic.py

* split out view cache change

* split out flat components change

* reduce diff

* reduce diff

* add some float4 tests

* fix

---------

Co-authored-by: Gijs Koning <gijs-koning@live.nl>
2023-09-29 10:33:34 -07:00
Francis Lam
f445e056ed
wmma: add test and tensor core shape (#1925) 2023-09-28 18:04:28 -07:00
George Hotz
c907efbf4a
reorder a few things (#1915)
* reorder a few things

* huh, that has to be there

* move apply shapetracker

* BufferOps

* only for type checking
2023-09-25 10:17:21 +08:00
George Hotz
7ff7aacdb4
LazyOp out of Linearizer (#1908)
* loadop buffer on cpu

* works for GPU

* sort of working

* has bugs

* gpu tests pass

* fix some tests

* fix tensor cores

* fix test linearizer

* fix symbolic

* fix has_variable_shape

* non symbolic size

* disable weird test

* simple cache fix

* fix custom function

* fix kopt

* cleanups

* a bit broken on the assign

* contig check

* only buffer

* need that order

* idx

* dedup buffers

* hmm, bugfix

* fix tensor cores

* opts device
2023-09-24 14:30:53 +08:00
George Hotz
97dc813329
Revert "All LazyOps in the Linearizer (#1905)" (#1907)
This reverts commit a5820390db.
2023-09-24 11:51:22 +08:00
George Hotz
a5820390db
All LazyOps in the Linearizer (#1905)
* loadop buffer on cpu

* works for GPU

* sort of working

* has bugs

* gpu tests pass

* fix some tests

* fix tensor cores

* fix test linearizer

* fix symbolic

* fix has_variable_shape

* non symbolic size

* disable weird test

* simple cache fix

* fix custom function

* fix kopt

* cleanups

* a bit broken on the assign

* contig check

* only buffer

* need that order

* idx
2023-09-24 11:50:00 +08:00
nimlgen
31fca43706
kopt works with local+grouped reduce and tests (#1824) 2023-09-09 13:22:09 -07:00
George Hotz
ed194a1d3b
zero fold (#1748)
* add constant fold

* err, it's just zero folding

* self store fold + caching

* prints and more folds

* simpler winograd kernels

* remove childless uops
2023-09-03 13:48:11 -07:00