Commit graph

5,694 commits

Author SHA1 Message Date
chenyu
393c6b236c
test case to sum twice in different order (#12253)
* test case to sum twice in different order

fixed by #12251

* try metal
2025-09-20 10:11:57 -04:00
qazal
4756971c88
skip test_bf16_disk_write_read on CL=1 (#12256) 2025-09-20 17:11:06 +03:00
Sieds Lykles
73c8dae60d
add missing remove_blockend case (#12251)
* add missing remove_blockend case

* remove expectedFailure

* better comment
2025-09-20 06:29:19 +02:00
qazal
bb59eed82f
rangeify: don't tag consts, they are global (#12247)
* rangeify: don't tag consts, they are global

* don't map movement ops

* sym failing test

* remove that

* update comment

* simpler test

* work
2025-09-19 15:25:03 +03:00
Sieds Lykles
cc038b31b6
Shrink instead of reshape to unregister symbolic (#12241)
* Slice to unbind symbolic

* use vmax for now

* assert shape in reshape is valid

* update test_symbolic_ops to use shrink instead of reshape

* remove infer_with_bound_values for npw

* symbolic output doesnt have symbolic strides

* symbolic jit tests use shrink to unregister symbolic

* update test

* update more tests

* wrap vmax in int()

* only create a new st if the store is not an assigne

* unwrap st

* comments
2025-09-19 06:04:35 +02:00
Sieds Lykles
8d703a6369
z3 xor doesnt use bitcast (#12243) 2025-09-19 00:31:44 +02:00
chenyu
0dad6cc518
good RANGEIFY kernel counts in external_test_opt (#12242)
no push permute stuff. the model ones are less clear if it's good, some got slower
2025-09-18 17:58:54 -04:00
qazal
825f148469
rangeify: fix copy size mismatch errs (#12232)
* rangeify: fix copy size mismatch errs

* const folding can happen in sym

assert it

* shippable

* rangeify copy is completely wrong

* pre_bufferize

* tag bufferize

* pre back
2025-09-18 18:23:32 +03:00
chenyu
f82b16a0e9
RANGEIFY test_tensor (#12235) 2025-09-18 10:35:43 -04:00
chenyu
7487c13b61
truncate_fp16 -> float_to_fp16 (#12234)
match float_to_bf16 and float_to_fp8
2025-09-18 09:48:27 -04:00
b1tg
54c15d74a4
python float8 support (#11960)
* basic support

* alu

* nan in exec_alu

* rand_for_dtype

* inf + 0.0

* finfo

* revert rand_for_dtype

* clean

* truncate fp8s inf

* spec ok

* float_to_fp8 nan/inf

* least_upper_dtype

* clean up

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-09-18 09:17:09 -04:00
qazal
dbbc261075
rangeify: fix COPY simplifier (#12233) 2025-09-18 14:35:33 +03:00
qazal
525f80e0d2
rangeify: enable putting consts back in the tensor graph (#12225)
* rangeify: enable putting consts back in the tensor graph

* work

* sym in ci
2025-09-17 19:45:04 +03:00
chenyu
edffc246ed
MUL in reduce_unparented (#12223)
* MUL in reduce_unparented

* some test
2025-09-17 11:56:39 -04:00
qazal
7733c217c5
remove spam comments in test_schedule (#12224) 2025-09-17 18:24:55 +03:00
qazal
d917895569
map out rangeify errors in test_schedule (#12211)
* map out rangeify errors in test_schedule

* skip that

* add to ci
2025-09-17 09:10:28 +03:00
Sieds Lykles
158506b91e
Upgrade some divmod folding for symbolic divs (#12216)
* use const_factor() instead of arg

* add test

* change div min_max

* add tests

* add divide_by_symbolic_gcd

* add tests

* one more test

* Slice to unbind symbolic

* deal with const factor properly

* minor cleanup

* divide_by_symbolic_gcd becomes UOp.gcd and UOp.divide_exact

* add tests

* add gcd_without_const

* fix divide_exact bug

* add factor_remainder

* add tests

* fix imports

* elif -> if

* remove expectedFailure

* add more tests

* add more unwrap

* fix signature of pop_const

* remove that

* remove that
2025-09-17 03:00:50 +02:00
chenyu
5b12764b83
add arange cat arange test (#12217)
simple test case to catch wrong reduce const folding. also clean up the old arange complexity test
2025-09-16 17:12:32 -04:00
chenyu
6b808c5fe6
update TestSymbolicJit.test_plus1_pad (#12214)
was failing because movement was not captured
2025-09-16 15:57:50 -04:00
Shun Usami
2a72b00679
Add test for 2D tensor indexing in setitem (#12193)
* Add test for 2D tensor indexing in setitem

* Fix _masked_setitem to handle multi dim indexing correctly

* Fix indent

* Add fuzz test for 3D tensor indexing in setitem

* Skip indexing fuzz test (slow)
2025-09-16 14:57:25 -04:00
chenyu
84d2d047ea
Tensor.pad_to and Tensor.shrink_to (#12210)
most of the time i want this instead of spelling out the args

also add more input validation to shrink
2025-09-16 12:24:55 -04:00
qazal
122a50fe8c
assert kernel count (#12205) 2025-09-16 14:24:39 +03:00
chenyu
e555748807
test rangeify const folding (#12200)
* test rangeify const folding

reduce i know how to fix, multi and test_cast_padded tbd

* test_instancenorm_3d is very slow
2025-09-15 20:03:48 -04:00
chenyu
f732f66709
rangeify test_nn almost pass (#12198)
* rangeify test_nn almost pass

* issue with jit

* flaky
2025-09-15 17:49:20 -04:00
qazal
a388d2cb1a
remove PROFILE=1 option, it's just VIZ=1 [pr] (#12176)
* remove PROFILE=1 option, it's just VIZ=1 [pr]

* sqtt

* sqtt 2

* return last

* rename
2025-09-15 12:51:50 +03:00
chenyu
bdb3afd566
failed test case for symbolic pad (#12179) 2025-09-15 00:25:21 -04:00
chenyu
15b166ce6d
bump test_module_runs to 30 seconds (#12174)
25 seconds sometimes
2025-09-14 16:48:40 -04:00
Shun Usami
34a05b31fe
Fix advanced tensor indexing setitem (#12128)
* Add failure test case for advanced tensor indexing setitem

* Fix advanced tensor indexing setitem when permuted

* Reduce line count

* Revert unnecessary change

* Combine two lines into one
2025-09-14 15:22:40 -04:00
chenyu
d09c0f28c5
increase test_module_runs (#12173)
timed out on ci windows llvm
2025-09-14 15:19:21 -04:00
chenyu
12a910f1d2
update torch 2.8 (#12172)
support _reshape_alias. something is wrong with one case of unfold
2025-09-14 15:19:03 -04:00
chenyu
98ecab7563
remove ml_dtypes (#12169) 2025-09-14 14:20:05 -04:00
qazal
02054b53fe
remove tests that pre date the uop spec (#12168)
* remove tests that pre date the uop spec

* const src

* for RANGEIFY=1

* update with bind

* remove import
2025-09-14 18:47:42 +03:00
qazal
1591e4f66b
update outbufs selection in test_linearizer [pr] (#12166) 2025-09-14 13:46:49 +03:00
George Hotz
bcafa72b7f
use tags instead of graph_rewrite_map in rangeify (#12110)
* use tags instead of graph_rewrite_map in rangeify

* new style, add realize

* metadata works

* simple failure

* fix

* loops

* stuff becomes a NOOP when you remove it

* stuff becomes a NOOP when you remove it

* tags on bufferize

* bmnist works

* locals don't work

* shippable

* fix some tests

* simpler map_realize

* remove const hack

* debuggable test

* broke

* assign test

* straight up bug

* wooo it passes

* sink shouldn't be there

* fix ops

* bmnist

* kv cache ish

* Set RANGEIFY context variable to 0

* should work normal

* better

* types

* hacks to fix test_symbolic

* pm_add_buffers

* tests should pass
2025-09-14 11:39:01 +08:00
nimlgen
b1d1816f43
device: fix envvars (#12159) 2025-09-13 23:38:09 +03:00
nimlgen
92df52d79a
make method_cache account for compiler (#12156)
* make method_cache account for compiler

* sorry
2025-09-13 17:00:11 +03:00
Sieds Lykles
e3a3764917
delete fold_unrolled_divs (#12146) 2025-09-13 03:09:36 +02:00
Sieds Lykles
2fc0bd150b
Arange overflow raises error and one_hot upcast (#11975)
* add error

* to_dtype

* shorten line

* add test

* upcast one hot dim im overflows
2025-09-13 00:18:25 +02:00
chenyu
aac3dceaf6
merge two PYTHON backend ci job (#12143)
* merge two PYTHON backend ci job

and mark anything that takes > 10 in test_ops slow

* two more
2025-09-12 17:36:46 -04:00
ttomsa
a12d0933c1
fix vec dtype in fast idiv (#12080)
* fix

* add vec dtypes to fuzzer

* add vec=False

---------

Co-authored-by: Sieds Lykles <93992551+S-Lykles@users.noreply.github.com>
2025-09-12 23:00:43 +02:00
chenyu
25091951ba
update test/models (#12142)
minor fix and run more stuff in tinygrad for speed
2025-09-12 16:43:28 -04:00
Sieds Lykles
62376c8b2b
update store load noop pattern to use Invalid (#12141)
* update pattern

* add test
2025-09-12 22:25:53 +02:00
chenyu
647965fb09
test_train cleanup (#12140)
* test_train cleanup

remove skipIf due to buffer sizes, runs locally

* those are slow
2025-09-12 13:21:30 -04:00
qazal
e80c8a7548
merge TestIndexing with TestSchedule + remove duplicate tests (#12134)
* merge TestIndexing with TestSchedule

* remove the arange_copy tests

* no FUSE_ARANGE import
2025-09-12 10:35:14 +03:00
Sieds Lykles
b5a3b8de20
remove where on gated load if gates are the same (#12129)
* add rules

* add tests
2025-09-12 06:52:35 +02:00
George Hotz
0766616962
isolate the const hacks in the old kernelize (#12126)
* isolate the const hacks in the old kernelize

* if rangeify, don't waste time
2025-09-12 08:35:35 +08:00
Sieds Lykles
1f3950a484
Invalid idx (#12067)
* merge index_dtype_3

* new lowering with Invalid idx

* remove that dtype from range

* finish merge

* annotate better

* indentation

* dont need that anymore

* always process replay for openpilot

* more uop_given_valid for idx

* valid past index_child

* fix bug preventing load getting an alt value

* add track_match_stats back in in shapetracker and remove cache

* get_valid_idx -> get_valid and get_idx

* fix heuristics with new idx

* split line

* fix typo

* fix signature

* dont skip idx if stride is 0

the idx may still be invalid

* lower const with new valid

* delete to_indexed_uops

* update shapetracker test

* delete axis_is_masked

* add cache back

* move around comment

* fix get_valid bug

* move invalid fold to symbolic so its earlier

* cleanup

* update applying padto to new idx

* add unit tests

* cleanup

* fold line

* improve spec

* dont try to render Invalid as a float

* more consistent invalid index

* update some tests

* Fold index with true cond

* skip test

* vconst min max if Invalid in arg

* fix signature of UOp.const

* add test for min/max of Invalid CONST/VCONST

* add InvalidType to as_const signature

* is Invalid to isinstance

* Add InvalidType to ConstLike

* index gate is a where gate

* make that a metaclass

* fix heurisics for new idx

* mypy happy
2025-09-12 01:42:02 +02:00
chenyu
544eb2c402
clean up test_scatter_reduce (#12125) 2025-09-11 16:36:58 -04:00
chenyu
9ad6a56d17
smaller test_simple_reduce (#12124) 2025-09-11 15:45:38 -04:00
chenyu
3a83b56da5
fix test_dequantization_mxfp4 (#12123)
* fix test_dequantization_mxfp4

* assert_allclose

* rtol
2025-09-11 14:22:06 -04:00