Commit graph

11,106 commits

Author SHA1 Message Date
wozeparrot
7ae6898e31
better late bufferview (#12333) 2025-09-29 03:08:34 -07:00
George Hotz
3291e00df7
fix efficientnet slowness on rangeify (#12332) 2025-09-29 18:01:01 +08:00
chenyu
9d2f2b8e34
skip test_mean_half_precision_overflow (#12331)
it only works with SPLIT_REDUCEOP=1
2025-09-29 05:15:04 -04:00
qazal
9915bcf2b4
remove no-op contiguous from rand (#12329) 2025-09-29 11:53:16 +03:00
chenyu
76c87d81b3
delete test_backward_sum_acc_dtype (#12330)
this test tests the wrong thing, it was only working because expand realize rule
2025-09-29 04:46:17 -04:00
George Hotz
fd2e4f2353
failing rng test (#12328)
* tighten spec: fixup devectorizer types / rangeify

* tighten assign

* failing rangeify test

* simpler

* otherwise contig

* more tolerance cause rng seed changed
2025-09-29 16:06:45 +08:00
George Hotz
29469577e8
tighten spec: fixup devectorizer types / rangeify (#12327)
* tighten spec: fixup devectorizer types / rangeify

* tighten assign
2025-09-29 15:41:11 +08:00
wozeparrot
a982480512
feat: late to_bufferview (#12271) 2025-09-29 00:29:43 -07:00
qazal
e01a3eb59a
rangeify whitespace cleanups [pr] (#12326)
* rangeify whitespace cleanups

* this is a noop
2025-09-29 10:04:51 +03:00
George Hotz
cf925d1ac5
remove metadata for rangeify codegen (#12325) 2025-09-29 14:29:28 +08:00
George Hotz
b252f890da
add support for SPEC=1 (#12322)
* add support for SPEC=1

* cleaner place for it

* non rangeify spec

* split non rangeify
2025-09-29 12:55:01 +08:00
qazal
292cb6ae26
viz: 404 if the requested rewrite doesn't exist (#12323) 2025-09-29 07:51:10 +03:00
qazal
250cb10e8f
rangeify permuted assign (#12299)
* enable RANGEIFY=1 test_assign

* work

* rangeify=0 asserts this ast

* remove that

* beta test, it's correct though

* skip multi

* matches torch/np output

* memcopy without memcopy

* can remove this

* rangeify isn't silently wrong anymore

* diff cleanup

* use UOp toposort instead of global tags

* actual assert TestRangeifyAssign

* step

* work

* this isn't optimizing away now

* some todos

* test fusion schedule

* typo

* dedup idxs

* cleaner

* pre

* work

* diff
2025-09-29 07:27:57 +03:00
Sieds Lykles
ed90de6583
Revert "Bufferize early, fix "children not making progress" on big graphs (#1…" (#12318)
This reverts commit 6f1cf717de.
2025-09-28 19:10:21 +02:00
Sieds Lykles
29f0886395
skip test_softmax_fusion tests if RANGEIFY==1 (#12310) 2025-09-27 05:57:40 +02:00
Sieds Lykles
b98f1881ef
dsp opt test has different axis number on rangeify (#12309) 2025-09-27 05:06:11 +02:00
Sieds Lykles
6f1cf717de
Bufferize early, fix "children not making progress" on big graphs (#12308)
* bufferize children early

* cleaner

* fix types

* lower number of reduceops

* test openpilot
2025-09-27 04:17:15 +02:00
qazal
0104b16b9b
rangeify: fix empty tags in reshapes (#12307) 2025-09-26 16:32:48 +03:00
nimlgen
f5eb46a3d9
fix limit buf metal on non rangeify (#12303)
* add failure test for limit buf on non rangeify

* correct metal

* correct

* hm
2025-09-26 11:06:28 +03:00
qazal
8b2e0930d7
rangeify: enable passing multi test (#12301) 2025-09-26 08:31:13 +03:00
Sieds Lykles
74411984fc
Rangeify IMAGE (#12304)
* add imagedtype to rangeify

* enable some image tests

* move the tests

* image upcast before locals

* add if statement

* rangeify image_dtype test

* decrease read_image count
2025-09-26 07:21:02 +02:00
wozeparrot
d2cd269e28
fix: try close mmap (#12306) 2025-09-25 20:54:27 -07:00
chenyu
17cec8d645
RANGEIFY winograd test (#12297)
speed seems fine
2025-09-24 23:42:32 -04:00
nimlgen
476a2a0a96
test_qcom: update (#12293) 2025-09-24 21:45:58 +03:00
qazal
38ecefaacb
RANGEIFY=1 allreduce (#12260)
* ci

* extract mops

* work

* assert early

* port this?

* can realize shard

* allreduce passing

* notes

* better handling of shard

* err

* outerworld allreduce twice

* work

* don't tag movement ops

* don't tag movement ops

* delete old logic

* 19 failing + ram

* cleanup

* reset stuff

* simplest failing test

* diff

* test_ones

* allreduce work

* allreduce more work

* down to 22 failing tests

* port _device_num

* replace creates a new UOp here

* pour symbolic everywhere

* 7 failing

* focus on allreduce

* work

* cleanup

* more ci

* fix test_schedule_ring

* post index const shape

* much better

* diff cleanup
2025-09-24 18:13:08 +03:00
qazal
0e778296be
rangeify: refactor const folding (#12291)
* rangeify: refactor const folding [pr]

* it got better
2025-09-24 17:58:39 +03:00
qazal
6c9d8c7e41
rangeify: simplify noop copy (#12289) 2025-09-24 17:01:23 +03:00
qazal
1400ce105f
rangeify: fix sharding (#12288) 2025-09-24 14:33:56 +03:00
qazal
154c865966
rangeify: fix ram usage in multi (#12286) 2025-09-24 13:48:58 +03:00
Sieds Lykles
e8945c74de
fix infinite symbolic loop with VCONST (#12285) 2025-09-24 07:06:22 +02:00
Sieds Lykles
45c7252aed
Better div nesting 2 (#11812)
* remove check

* use fold_divmod_congruence instead of simplify

* adjust tests

* shorten line

* new algo

* add test

* cleanup

* update tests

* ALLOWED_GATED_READ_IMAGE from 16 -> 12

* only remove the call to simplify

* add option to simplify with factor_remainder

* Allowed readimage gates back to 16
2025-09-24 04:50:26 +02:00
Sieds Lykles
6146c64d81
lower the invalid gate last (#12164)
* lowering invalid gate is part of lower_index_dtype

* update test

* remove import

* put that back

* reduce_collapse uses invalid

* fix that pattern to use invalid_pat

* valid creates the right dtype count

* seperate rule for lowering invalid gate

* dont unvectorize Invalid gate

* image_fixup uses Invalid

* update tests

* cleanup

* update split_load_store

* add .scalar() there
2025-09-24 04:27:35 +02:00
qazal
ad7c8c21ea
rangeify: INDEX doesn't passthrough MSELECT (#12279) 2025-09-23 21:36:50 +03:00
nimlgen
02a7b7fe48
rangeify: fix test_setitem (#12269)
* rangeify: fix test_setitem

* um?

* better?

* simple where folding

* f

* revert

* x
2025-09-23 20:42:36 +03:00
qazal
2f145a98e0
rangeify: fix contiguous multi (#12278)
* rangeify: fix contiguous multi

* when it's changing root, it should construct a new UOp
2025-09-23 20:05:29 +03:00
nimlgen
5f4eeb054c
rangeify: passes now (#12277) 2025-09-23 18:46:49 +03:00
qazal
680ce54dd4
add types to replace_dnum (#12276) 2025-09-23 14:43:04 +03:00
chenyu
fffce0a6b4
use more no_range in simplify [pr] (#12275) 2025-09-23 02:33:56 -04:00
chenyu
51b88b2265
process replay tests in rangeify (#12274) 2025-09-23 01:30:06 -04:00
chenyu
b54cb272d0
move test_qcom to test/device (#12272) 2025-09-22 21:07:10 -04:00
Sieds Lykles
d21e34e617
enable test_sum_twice (#12270)
* remove skip

* remove import
2025-09-23 00:57:29 +02:00
Sieds Lykles
5a4b244e6b
Check for group inside another reduce (#12268)
* add check

* get the ranges correctly

* add test

* comment and better check
2025-09-23 00:32:41 +02:00
qazal
a6fd96f620
rangeify: don't tag movement ops (#12267)
* don't tag movement ops

* delete old logic
2025-09-22 16:40:17 +03:00
chenyu
b03ceb806e
move test_sample to test_randomness (#12266) 2025-09-21 21:11:32 -04:00
qazal
25e0b725d1
cleanup section 0 rangeify (#12264) 2025-09-22 00:30:44 +03:00
qazal
1aba668a37
cleanup buffer_view matcher (#12263) 2025-09-21 23:45:48 +03:00
nimlgen
b53a266254
rangeify: fix test_optim (#12262)
* rangeify: fix test_optim

* add to cl?

* these are good now
2025-09-21 18:08:35 +03:00
qazal
461e9becec
srender UOp in movement op arg (#12261) 2025-09-21 13:55:45 +03:00
Sieds Lykles
9569fdfa36
use str for AxisType and AddrSpace __repr__ (#12252) 2025-09-21 05:24:41 +02:00
qazal
8365c28cd5
viz: put a limit of brightness scale (#12259) 2025-09-20 18:52:55 +03:00