Commit graph

4,842 commits

Author SHA1 Message Date
qazal
4a617c84e1
cleanup ctx usage in scheduler upats [pr] (#8205) 2024-12-13 18:01:13 +08:00
qazal
55b8c4e8bf
apply_swizzle can apply to any views [pr] (#8204) 2024-12-13 17:58:35 +08:00
qazal
c5c0d0277d
flatten buffer args, delete dtype [pr] (#8202) 2024-12-13 16:43:47 +08:00
Ahmed Harmouche
5198415bfb
No unpack_map in wgsl (#8200) 2024-12-13 08:10:31 +01:00
leopf
fe68dbdb23
GroupOp.Idempotent (#8198) 2024-12-12 20:44:04 -05:00
chenyu
ce41e6572d
unit test merge_dim [pr] (#8195)
looking for better ways to write this. first adding some tests
2024-12-12 17:55:52 -05:00
chenyu
d47530c0d4
fix device canonicalize for :0 in middle [pr] (#8193)
replace is wrong because it does not check if `:0` is at the end. use re.sub instead
2024-12-12 16:32:36 -05:00
chenyu
d586c7e108
remove had_counter from rand (#8191) 2024-12-12 13:35:39 -05:00
chenyu
2fe98e44cd
unneeded isinstance(size, int) in alloc [pr] (#8189) 2024-12-12 13:05:02 -05:00
chenyu
72ff631f8d
remove unreachable tensor dtype assert (#8190)
it would have failed in `to_dtype`. added some tests for it too
2024-12-12 13:04:49 -05:00
chenyu
2e4c7d4cfb
add "tinygrad" to be part of cache_dir [pr] (#8188)
instead of having sqlite / http download / metal compile to add "tinygrad" separately. also make it non-private since it's used in metal
2024-12-12 12:09:44 -05:00
Ahmed Harmouche
db76586780
Cast pattern touchup in AMDRenderer [pr] (#8185) 2024-12-12 15:12:14 +01:00
nimlgen
bf7d1fcd2c
tiny import fixes in hcq graph (#8184) 2024-12-12 16:30:06 +03:00
Ahmed Harmouche
2f2b1e792c
wgsl and ops_webgpu simplifications [pr] (#8182)
Simplify wgsl and ops_webgpu
2024-12-12 14:21:58 +01:00
George Hotz
d9a0880d33
delete fuzz uops (not tested) [pr] (#8181) 2024-12-12 01:41:27 -08:00
George Hotz
c77cb57454
remove untested BEAM_COMPARE=1 [pr] (#8180) 2024-12-12 01:35:27 -08:00
Ahmed Harmouche
1b94cc095a
Bump back wgpu to latest (#8179) 2024-12-12 09:40:52 +01:00
chenyu
97aaa50f3a
remove duplicated UOp in Tensor init types [pr] (#8177)
and a small comment
2024-12-11 22:59:35 -05:00
chenyu
d240bdd172
remove upcast_in_mid_reduce_axes [pr] (#8176) 2024-12-11 22:14:28 -05:00
chenyu
64a917b7eb
remove LAZYCACHE ContextVar [pr] (#8175)
also removed from resnet latest script
2024-12-11 22:02:52 -05:00
chenyu
7047ffd27d
tiny gguf_load cleanup [pr] (#8174)
round_up helper
2024-12-11 21:32:52 -05:00
George Hotz
151ac5f5a2
remove UPCASTMID [pr] (#8173) 2024-12-11 17:29:01 -08:00
George Hotz
f86e0014b7
delete CAPTURE_BEAM, this should use PR or VIZ infrastructure instead [pr] (#8172) 2024-12-11 16:29:03 -08:00
George Hotz
8a04a3a77a
rename LazyBuffer -> UOp [pr] (#8169)
* rename LazyBuffer -> UOp [pr]

* fix docs
2024-12-11 16:15:52 -08:00
George Hotz
e0fe867c74
delete beam compare 2 [pr] (#8168) 2024-12-11 16:10:01 -08:00
chenyu
aaa3cc235d
unused from __future__ import annotations (#8171) 2024-12-11 19:05:04 -05:00
George Hotz
aae2f4da8d
fix process replay [pr] (#8170)
* empty change [pr]

* store the context in PROCESS_REPLAY_CAPTURE
2024-12-11 15:58:42 -08:00
qazal
9044b0746a
delete lazy [pr] (#7801)
* LazyBuffer = UOp

* try 4 at this diff

* skip optimization tests p1

* raise kernel count expectations

* BIND isn't the _only_ uop that can become a tensor

* fix test_ones_sum on symbolic

* bump openpilot, correctness first

* offset on assign is fine

* uop is immutable

* what if this was higher

* more optimization skips

* instant fold const copy

* test_multitensor shouldn't expect buffer for unrealized

* move copy folder to upats

* start BUFFER_VIEW

* kinda BUFFER_VIEW

* Revert "kinda BUFFER_VIEW"

This reverts commit 94b4fe3040.

* BUFFER_VIEW try 2

* linter and missed _device

* pylint

* keep Ops.CONTIGUOUS

* always BUFFER_VIEW disk

* test

* cpu isn't a real device

* buffer references afte del

* add that back

* start bringing some of these back

* more test updates

* simpler simplify copy

* subbufer everything

* this is fine with buffer view

* cleanup the diff in test/ 1

* copy is one thing

* diff pruning

* diff pruning 2

* oh bind unbinds way too early

* extra

* more diff pruning

* more const folding

* experiment with symbolic here

* Revert "experiment with symbolic here"

This reverts commit cb87d61f7a.

* Revert "more const folding"

This reverts commit 2a7d258a2b.

* Revert VALID early folding

This reverts commit 4074f52317.

* storing const is fine

* fix test_prefer_half_buffer

* iterate on test_real_world

* this fixes test_train_mnist memory, breaks everything else

* Revert "this fixes test_train_mnist memory, breaks everything else"

This reverts commit dccfcbe068.

* always expect buffer to exist here

* temp debug: something is mutating lazydata in compile3

* Revert "temp debug: something is mutating lazydata in compile3"

This reverts commit 71400f0d55.

* everything back to normal

* compile3

* compile3 test

* start captured jit work, that test passes

* finalized memory skip set

* linter err

* back to base here

* tiny metaop cleanup

* print tensor

* 4th type this unbind got me

* green pickle

* tensor_variable sanity

* cast sanity

* link from the reds

* COPY sanity + minor repr change

* you can exist

* enable test_winograd

* bye bye nbytes

* danger, uop is mutating

* real become

* delete those from uop init

* put it in buffer init

* buffer inits with so much stuff

* buffer pickle try 2

* toposort can't be a cached property

* fix test_schedule_gc_with_inputs

* remove all @unittest.skip(gc)

* Revert "remove all @unittest.skip(gc)"

This reverts commit 9d8d92dd85.

* reenable real world + test_schedule_gc

* test: RUN_PROCESS_REPLAY=0

* fix pickle jit

* test changes

* reenable test_lru_alloc and TestTrain

* fix imagedtype

* bring pr back

* reenable 3 gc tests

* test_schedule better diff

* disable SPLIT_REDUCEOP

* test_save_all_dtypes looks fixed

* fix metadata

* skip that one

* fix viz by not pickling buffers

* simple test for const folding

* bring split reduceop back

* add simplify_alu

* simplify_binop fixes a test

* fix cast folding

* disable that test

* that test looks fine

* changes from delete_lazy pruning p1

* cast folding and children base

* test: cast folding from pruning branch

* green test_sgd_4convs_fuse_conv_bw

* enable some indexing folding

* test_complex_backward is fixed

* prune more, 295 -> 233

* fix test_multi_const_folding_literal

* fix double copy

* early become test

* ooooops

* clean up ctx in all big_graph

* fix openpilot 208 kernels

* train_cifar is fine now

* fix CAST_BEFORE_VIEW

* ever faker const

* back to 13

* mark expectedFailure

* fine don't create them

* test_multi_const_folding_tensor

---------

Co-authored-by: George Hotz <geohot@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-12-12 05:05:19 +08:00
qazal
047a6dabc3
prereq for scheduler contiguous_child [pr] (#8163)
* the whole context is fine here [pr]

* fix that
2024-12-12 02:02:22 +08:00
ignaciosica
3a8e8ac6c2
remove dead code (#8161) 2024-12-11 12:07:19 -05:00
George Hotz
8f4299fcc8 hotfix: suppress shutdown errors in CLProgram 2024-12-11 08:08:32 -08:00
Ahmed Harmouche
a73e3677d0
Test linearizer on webgpu (#8159)
* Test linearizer on wgpu

* Skip tests due to exceeded dims
2024-12-11 17:03:26 +01:00
qazal
63de8f2208
late scheduler context builder [pr] (#8155) 2024-12-11 19:59:39 +08:00
George Hotz
a1b3724ff8
prepickle process replay [pr] (#8147) 2024-12-10 11:46:36 -08:00
George Hotz
aa3b094334
changes from delete lazy [pr] (#8146)
* changes from delete lazy [pr]

* test tweak
2024-12-10 11:06:17 -08:00
chenyu
286fec115e
fix Tensor.minimum for int (#8145)
use invert instead of just neg. consolidate min, argmin, and minimum

also update maximum to not apply the mid point for int
2024-12-10 13:34:41 -05:00
Ahmed Harmouche
71dd222f66
Fix setitem on wgpu (#8144) 2024-12-10 19:34:25 +01:00
qazal
b69fea6ae5
process replay without global list [pr] (#8143) 2024-12-11 02:20:09 +08:00
qazal
08405279f9
pre merge_views+ops_folding refactor [pr] (#8140)
* simple start

* valid early

* more dumb things removed

* don't ever use base

* cleaner
2024-12-11 00:55:00 +08:00
qazal
56c84cee29
derive COPY nbytes late in realize [pr] (#8137)
* derive COPY arg later in realize [pr]

* can assume no implicit casts or movement ops here
2024-12-10 22:04:07 +08:00
qazal
2d26b011ac
allow VIEW on BUFFER [pr] (#8136)
* allow VIEW of BUFFER [pr]

* base it later

* better diff

* base shouldn't exist after anywhere merge_views
2024-12-10 21:29:38 +08:00
qazal
3a2658efbd
small changes to refine the delete_lazy diff (#8134)
* _view -> view

* const_arg things
2024-12-10 18:46:10 +08:00
qazal
7436ebef2f
spend lines on const_arg for tensor and scheduler [pr] (#8132)
* spend lines on const_arg for tensor and scheduler [pr]

* simple test_const_arg

* base on lazy
2024-12-10 18:07:35 +08:00
chenyu
917deb88a4
make //0 return 0 in python_alu (#8131)
on master it raises because it cannot truncate inf to int, which crashes valid expression like `(t > 0).where(1//t, t)`.
2024-12-09 19:32:06 -05:00
chenyu
358287959b
fix pow of int to negative const int (#8129)
it should return in int
2024-12-09 17:20:18 -05:00
qazal
80de06c8b9
scheduler ops_folding from delete_lazy (#8124)
* scheduler diff from delete_lazy

* test_std_mean

* late fold copy of CONST

* clang const is fine
2024-12-10 00:36:01 +08:00
chenyu
ccf54c2375
fix argmax/min on int32 min (#8118) 2024-12-09 02:29:23 -05:00
chenyu
c814de2dd4
fix bitwise_not for signed int (#8117)
-1 is correct because 2**32-1 is not within int32 range, so in some case clang casts the whole thing into uint32
2024-12-09 02:02:51 -05:00
ttomsa
e22d7b6fb0
fix var vmax inside special (#8116) 2024-12-09 01:16:08 -05:00
qazal
0033012096
init noop changes from delete_lazy [pr] (#8115) 2024-12-09 01:42:05 +08:00