Commit graph

11,106 commits

Author SHA1 Message Date
uuuvn
e9c5b23ba1
Use MTLCompiler directly (v2) (#7920)
* Use MTLCompiler directly (v2)

* to_block_literal and REQUEST_TYPE_COMPILE

* Rewrite command encoding

* Revert to_block_literal

* Maybe that's more readable to some people?

* Typo and comment about stdlib caching

* Update ops_metal.py

* Update ops_metal.py

* Update ops_metal.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-12-04 16:36:48 +08:00
George Hotz
bb98bae751
local reordering in block (#8029)
* local reordering in block

* load (and parents) is highest priority

* minor loads in order

* comments

* explicit depth

* simpler

* matters less, but store early too
2024-12-04 15:11:29 +08:00
George Hotz
4cb630ac1c hotfix: early INDEX 2024-12-04 14:47:47 +08:00
George Hotz
fdd1e56827
clean up rewrite logic + merge siblings (#8026)
* clean up rewrite logic [pr]

* simpler

* merge sibling blocks

* no PR
2024-12-04 13:26:16 +08:00
chenyu
004b2ecff5
remove lt/gt/le/ge from SimpleMathTrait [pr] (#8027)
just use the dunder methods
2024-12-04 00:24:33 -05:00
chenyu
39e0fc05f5
update function to not use gt/lt [pr] (#8025)
pr does not test this, but it's the same
2024-12-03 22:39:06 -05:00
chenyu
cfd4d19250
replace .lt in rewrite rules with < [pr] (#8024) 2024-12-03 21:34:47 -05:00
chenyu
0c060fa040
update uop and tests to not use lt/gt/le/ge [pr] (#8023)
just use dunder methods, eventually remove those from ops
2024-12-03 21:02:52 -05:00
chenyu
03bf9c2985
unused mul add lt rule [pr] (#8022) 2024-12-03 19:38:34 -05:00
nimlgen
7fda464b08
hcq c-like args state (#8020)
* hcq c-like args state

* ugh

* Dfix

* rename

* i
2024-12-03 23:53:35 +03:00
qazal
099364ed32
lazy srcs shape mistmatch assert + fix ASSIGN [pr] (#8014)
* lazy srcs shape mistmatch assert [pr]

* duplicate assert

* base it later

* keep the assert
2024-12-03 15:40:37 -05:00
ignaciosica
f14dd1488e
reduce on wmma (#8016) 2024-12-03 12:46:28 -05:00
chenyu
dacb1ff38a
minor nn cleanups (#8018)
use more .numel and .ndim
2024-12-03 12:34:52 -05:00
chenyu
35c30f76f2
minor tweak in ptx asm_for_op [pr] (#8017)
always compare with dtypes instead of name string
2024-12-03 12:34:22 -05:00
chenyu
a5af4e5596
clean up wgsl_matcher [pr] (#8015)
use more UPat syntatic sugar and remove unneeded rules
2024-12-03 11:55:03 -05:00
Ahmed Harmouche
db330a3110
Remove WebGL (#8012) 2024-12-03 16:02:53 +01:00
chenyu
ef3752625b
add test case of realize_size with 0 in shape (#8011) 2024-12-03 09:19:50 -05:00
Ahmed Harmouche
8818046940
YoloV8 on WebGPU (#8007)
Port YoloV8 to WebGPU
2024-12-03 15:10:41 +01:00
George Hotz
09eac42fd6
cache indexed uops in st [pr] (#8008)
* cache indexed uops in st [pr]

* remove arg from range
2024-12-03 21:27:07 +08:00
Sieds Lykles
e44183647f
Improved div folding (#7996)
* First version of div_mod folding together

* Working version with old div folding behaviour

* Test is fixed

* Fix linting

* Happy mypy

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-12-03 08:11:25 -05:00
George Hotz
32675a8a77
sacrifice ClangGraph on the altar of lines [pr] (#8009) 2024-12-03 21:11:15 +08:00
qazal
5441127417
assert const folding return shape matches [pr] (#8006) 2024-12-03 19:31:06 +08:00
George Hotz
dddfb494d7
don't mutate the uop/lazybuffer, just the Buffer [pr] (#8000)
* don't mutate the uop/lazybuffer, just the Buffer [pr]

* fix red test

* try different fix

* that

* that's the right fix

* test for fixed behavior

* bump to 3.12
2024-12-03 19:03:51 +08:00
qazal
ba1183314a
const_like can return a valid [pr] (#8005)
* const_like can return a valid [pr]

* fixup
2024-12-03 18:42:12 +08:00
qazal
4e91533419
test: don't ref until schedule (#8004) 2024-12-03 18:06:52 +08:00
George Hotz
b8bf5b2787
minor uop speedups [pr] (#8002)
* minor uop cleaner [pr]

* free uop creation speed by removing WeakValueDictionary

* a lil faster

* disable that test

* lines

* and it doesn't print non hit patterns
2024-12-03 17:04:48 +08:00
George Hotz
1028b34a20
add typing to basicblocks (#7999) 2024-12-03 15:05:11 +08:00
George Hotz
0905f87b68 hotfix: print only kernel time 2024-12-03 14:25:08 +08:00
chenyu
17d5719a38
add process replay to webgpu tests (#7998) 2024-12-02 20:27:29 -05:00
chenyu
c7bc75e634
alu(c?t0:f0, c?t1:f1) -> c?alu(t0,t1):alu(f0,f1) (#7900)
* alu(c?t0:f0, c?t1:f1) -> c?alu(t0,t1):alu(f0,f1)

only do if at least one branch is const, so total alu won't increase

* tests and interesting TODO cases
2024-12-02 17:19:27 -05:00
chenyu
b91fa24387
script to run regressed sd conv on metal (#7995)
* script to run regressed sd conv on metal

this and other similar `conv2d + add` kernels contributed to most of the speed regression

* # ruff: noqa: E501
2024-12-02 15:34:27 -05:00
geohotstan
0a2e10be1d
add SELU to Tensor (#7993)
* add selu

* more clean ups
2024-12-02 10:04:01 -05:00
Ahmed Harmouche
146e1caea3
Downgrade wgpu to prevent sd segfault (#7969) 2024-12-02 15:48:44 +01:00
wozeparrot
077e7e8ed2
fix: private segment sgpr on gfx103x (#7987)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-12-02 20:54:50 +08:00
qazal
bb606e5bcf
process replayable ops.py changes from delete_lazy [pr] (#7994)
* process replayable ops.py changes from delete_lazy [pr]

* hotfix: seed tiny_jit
2024-12-02 19:38:31 +08:00
George Hotz
0c7477b108
no bool in range [pr] (#7988)
* no bool in range [pr]

* fix llvm

* add arg to range spec

* fix broken test

* forgot this one

* hotfix: test_tiny jit is a real test
2024-12-02 19:05:16 +08:00
Ahmed Harmouche
8909dbd82c
Remove wgpu specific checks from stable diffusion example (#7991) 2024-12-02 11:31:14 +01:00
qazal
e2916ff210
image dtype fixup refactor for delete_lazy [pr] (#7989) 2024-12-02 18:25:13 +08:00
Ahmed Harmouche
5340d3dedf
Merge pull request #7986 from tinygrad/atomics-in-smem-wgpu
Support packed types in smem on webgpu
2024-12-02 10:38:19 +01:00
Ahmed Harmouche
dfae038580 Simplify render_buf_dt 2024-12-02 10:27:59 +01:00
Ahmed Harmouche
1ea0925744 Support packed types in smem in webgpu 2024-12-02 10:13:25 +01:00
George Hotz
61b2cac507
basicblock is dataclass (#7985)
* basicblock is dataclass [pr]

* tiny cleanups
2024-12-02 16:48:39 +08:00
George Hotz
275951b730
clean up a few parents -> toposort [pr] (#7984)
* clean up a few parents -> toposort [pr]

* rename to old_parents + sched tests

* a few more

* that one

* second to last

* final
2024-12-02 15:59:31 +08:00
George Hotz
f17af70d17
replace all sparents with toposort (#7983) 2024-12-02 15:00:30 +08:00
George Hotz
b09310d8c2
add toposort method to uops, faster linearize [pr] (#7982)
* add toposort method to uops, faster linearize [pr]

* trust the toposort

* all toposort

* Revert "all toposort"

This reverts commit db123adfda.
2024-12-02 14:46:16 +08:00
qazal
b797aee720
uop global buf number tracking try 2 [pr] (#7912)
* uop buffer init small refactor [pr]

* add early

* this way it doesn't need late

* buffer_num

* itertools.count

* count from 0

* down to 380
2024-12-02 14:45:17 +08:00
George Hotz
cbcc1c20eb
second try at block linearize (#7892)
* second try at block linearize

* weeee, works for lil matmul

* it's so beautiful

* test tiny passes

* fix bugs

* combine matching BLOCKENDS

* wrapping

* test lin failures passes

* those failures were fake

* flip sort order

* fix ptx tests

* deal with store better

* dumb ptx fix

* expect less

* reduce lines

* reduce lines

* less lines and cleaner

* no defaultdict

* tighter

* simpler block_parent_count
2024-12-02 13:43:09 +08:00
George Hotz
9b0859d717
PYTHON device is okay to use everywhere [pr] (#7981) 2024-12-02 12:34:42 +08:00
mesozoic-egg
90e2b2d577
Remove gated store, put rewrite to uopgraph [pr] (#7975)
* update test for gated store

* put gated store rewrite to uopgraph, rm from ptx

* update test

update test

update test

* remove gated st rewrite in llvm

* lint

---------

Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.mail>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-12-02 12:33:16 +08:00
George Hotz
d53cd92364
fix tests for delete lazy [pr] (#7980) 2024-12-02 12:00:48 +08:00