George Hotz
260da2017c
fix unit test dtypes
2025-08-13 12:15:12 -07:00
kevvz
e2873a3a41
[bounty] Muon optim ( #11414 )
...
* newton schulz
* add muon + move newton schulz to tensor
* compact newton schulz
* better tests
* cleanup
* add comments for muon
* cleanup
* add export with tests
* match muon optim with test optim
* cleanup
* unsed import
* correct comment
* whitespace
* move export
* muon test fix
* match reference impl + tests
* remove export by moving muon device
* add credit
* cleanup
* remove print
* spacing
* spacing
* comma
* cleanup
* removal
* fix tests + optim momentum
* consistent is not/ not
* more consistency
* fix test
* cleanup
* fix the nones
* remove comment
* cast
* comment
* comment
* muon teeny test
* muon flag beautiful mnist
* set steps
* steps as hyperparam
* match default test steps
* name
* large cleanup
* dont care about steps
* nesterov false default
* match each other impl
* steps
* switch nest
* swap defaults
* update docstring
* add no nesterov test
* ban fuse_optim
* prints
* classical momentum
* alternative condition
* recon
* pre + post wd
* false default
* detach
* signature changes
* context
* swap order
* big cleanup
* 0 step instead
* parity
* remove fuse
* remove fused
* better paper
* assert message
* correct shape check + eps
* multidim
* add eps
* cleanup
* correct assert message
* lint
* better tests
* naming
* ns_steps,ns_params
* update docstring
* docstring
* match sgd and muon together
* sandwich
* add back fused
* parity
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-08-13 14:27:55 -04:00
George Hotz
d2521d828a
transcendental+idiv+threefry are uop decompositions ( #11636 )
...
* transcendental+idiv+threefry are uop decompositions [pr]
* threefry decomp
* fix randomness tests
* fix webgpu
* unneeded now
* fix
* move prematcher
* all cast should probably be cast_vec
2025-08-13 09:37:12 -07:00
geohotstan
925555b62a
Fix onnx Domain bug ( #11650 )
2025-08-13 08:20:50 -07:00
chenyu
0d8a0d7a96
update test_multi_const_folding_tensor to include pow ( #11635 )
...
pow folds now
2025-08-12 13:35:37 -04:00
Sieds Lykles
4d6e407eb0
Extend fast_idiv to negative ints ( #11632 )
...
* fast idiv for signed ints
* Add rule and test
* fix tests
* redo fuzz_fast_idiv to do negative ints as well
* adjust comments
* remove unused imports
2025-08-12 19:34:49 +02:00
geohotstan
ad9dec25b3
combine onnx parser and onnx ( #11485 )
...
* start
* more
* fix onnx_runner test
* pass
* patch for disk and add domains from huggingface
* simpler docs
* revert domain changes
* rerun ci
* revert onnx ops test change
* add fix from strenum stuff
* correct way
* revert correct way to leave the fix for another PR
* test segfault
* Revert "test segfault"
This reverts commit 4e1aaf41e7 .
* remove some unnecessary documentation
* test segfault again
* Revert "test segfault again"
This reverts commit 56fc5f03e7 .
* try gemini suggested patch for sys._getframe
* keep trying with gemini
* revert not working gemini suggestions and try faulthandler
* remove pythonfaulthandler
* trigger CI a few times
* minimize diff
---------
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-08-12 12:56:39 -04:00
Sieds Lykles
4c3982c44e
Take sign out of mod ( #11631 )
...
* Add rule and test
* fix tests
2025-08-12 18:44:36 +02:00
chenyu
0d7075f2de
assign should broadcast input tensor ( #11629 )
...
fixed test_assign_broadcast
2025-08-11 23:36:35 -04:00
George Hotz
ca41b5e38b
skip_0 in graph rewrite [pr] ( #11627 )
...
* skip_0 in graph rewrite [pr]
* no track_rewrites on test
* use dict instead of set
2025-08-11 18:29:04 -07:00
chenyu
0c97d6de1b
don't round pow output for int pow int ( #11625 )
...
also added atol=0 and big pows for the tests
2025-08-11 20:57:47 -04:00
chenyu
d623f6d850
support int Tensor pow to const non-negative int ( #11624 )
...
matches torch
2025-08-11 19:50:19 -04:00
chenyu
857a830dcc
fix test_arange_float_step ( #11623 )
2025-08-11 16:58:42 -04:00
ttomsa
ae0c3cfff6
change clang -march flag to -mcpu on arm ( #10970 )
...
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-08-11 13:38:48 -04:00
geohotstan
27bcb9fd1c
Support cubic mode for ONNX Resize OP ( #11612 )
...
* start
* add reference
* this is so much slower
* this makes sense but differs from official impl, but results are still correct..?
* add a comment
* Just keep it simple for now since I don't fully get it yet
* address comments
* correct
* teeny clean up
* another small comment improvement lol
2025-08-11 11:49:30 -04:00
nimlgen
d2bb1bcb97
cloud: a bit better err handling ( #11616 )
...
* cloud: err propagation to client
* fix
* print exc
* linter
* excs
* fix
* hm
* flaky
2025-08-11 15:51:22 +03:00
chenyu
a67e0917c3
list indexing can normalize in python ( #11609 )
...
* list indexing can normalize in python
list index does not need to be normalized in tensor
* update those
2025-08-10 20:02:38 -04:00
chenyu
1181ec0cd2
few more tensor indexing test cases ( #11608 )
2025-08-10 18:56:42 -04:00
George Hotz
996c907c0b
rewrite not ready + children machinery ( #11607 )
...
* rewrite not ready + children machinery
* it doesn't like track rewrites
2025-08-10 15:28:30 -07:00
geohotstan
b0dab6a4cd
onnx Resize OP clean up ( #11603 )
...
* start
* slight clean up
2025-08-10 14:10:39 -04:00
Sieds Lykles
10540414cd
Add Ops.CMPEQ ( #10431 )
...
* Add op
* add to Groupop.ALU
* fix spec
* fix ptx
* temporary pickle by name to see process replay
* add Ops.EQ to binary ops
* Actuall rename properly
* add test to assert CMPEQ is being used
* Ops.CMPEQ is automatic cast to bool
* add Ops.CMPEQ to llvm
* add Ops.CMPEQ to llvm
2025-08-10 13:13:16 +02:00
chenyu
dfb702ef33
fix sort for small dim ( #11601 )
...
* fix sort for small dim
* fixed test_sort_empty
2025-08-10 01:17:41 -04:00
Sieds Lykles
01c770c77b
Fix z3 float cast in indexing ( #11590 )
...
* adjust dtype of z3_renderer and add rule for cast
* dtypes.bool is also cast noop
* add regression test
* make embedding smaller
* even smaller test
2025-08-09 17:59:23 +02:00
Sieds Lykles
10d388499d
Refactor optional.py ( #11578 )
...
* move fast_idiv to transcendental
* move optional.py
* adjust comment
* change import
* mypy needs this?
2025-08-09 17:35:05 +02:00
qazal
16f0edbe90
pass opts arg in get_program process replay [pr] ( #11571 )
...
* fix ptx process replay
* keyword arg
* renderer is also optional [pr]
* test_linearizer fixup
* name function order is args,ret,kwargs
* can use opts_to_apply
* pass through p.applied_opts
* sink_arg
* now it opens devices too
2025-08-08 03:05:09 +03:00
qazal
960cc6533a
pass through name function args in track_rewrites ( #11572 )
2025-08-08 02:28:52 +03:00
George Hotz
82be8abfd2
move opt under codegen ( #11569 )
2025-08-07 14:19:17 -07:00
George Hotz
6ed2dfd187
delete the arange dim mismatch restriction ( #11568 )
...
* delete the arange dim mismatch restriction
* skip that test race
2025-08-07 13:46:17 -07:00
chenyu
aa1a6f2132
support threshold in Tensor.softplus ( #11564 )
...
fix gradient for large input
2025-08-07 13:43:18 -04:00
chenyu
7ee3770961
FUSE_ARANGE=1 ( #11427 )
...
* FUSE_ARANGE=1
* fix test
---------
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-08-07 13:32:34 -04:00
George Hotz
9764c6cdee
fix mismatch reduce, try 2 ( #11560 )
...
* fix mismatch reduce, try 2
* fix heuristic
* delete that test
* don't start allowing ones
2025-08-07 07:57:58 -07:00
nimlgen
4f29a2c441
fix flaky test on macos ( #11557 )
2025-08-07 15:55:35 +03:00
George Hotz
a1aa5670aa
Revert "fix mismatch reduce ( #11547 )" ( #11549 )
...
This reverts commit 49d21a9055 .
2025-08-06 22:43:15 -07:00
George Hotz
49d21a9055
fix mismatch reduce ( #11547 )
...
* fix mismatch reduce
* cleanups
* fix shape
* fix mypy
* resolve
2025-08-06 21:12:51 -07:00
George Hotz
21570545d3
move view pushing to codegen, try 2 ( #11534 )
...
* move view pushing to codegen, try 2
* fix up some linearizer tests
* fix test search
* fix test schedule
* delete that test
* fix test arange
* fix a few tests
* update tests
* push views
* ebs cleanup
* fix local/reg
* test and lint
* fix more tests
* test cleanups
* skipped that one
2025-08-06 15:58:38 -07:00
George Hotz
80d9cced07
more test cleanups ( #11544 )
...
* more test cleanups
* revert that
2025-08-06 15:05:21 -07:00
George Hotz
6fd1332763
update some tests for less Kernel ( #11543 )
...
* update some tests for less Kernel
* get_program update
2025-08-06 14:19:59 -07:00
George Hotz
09dc7af8e9
move bind to big graph ( #11539 )
...
* move bind to big graph
* fix tests
* unbind inside kernel only
* merge views
* fix multitensor
* failure text change
2025-08-06 13:27:51 -07:00
George Hotz
7c5e115747
test_mismatch_reduce ( #11538 )
2025-08-06 10:02:14 -07:00
George Hotz
4fe11725c6
pass through sink arg, update linearizer test ( #11536 )
...
* pass through sink arg, update linearizer test
* get_program help
* bump line count
* use new api
2025-08-06 09:48:48 -07:00
chenyu
c9225d22ce
only disable flaky test_jit_multidev_xfer ( #11523 )
2025-08-05 22:17:25 -04:00
George Hotz
07b0df0d86
hotfix: test tensor dims start at 1
2025-08-05 15:40:24 -07:00
George Hotz
4dabdf7c6d
Revert "optimize in rewrite ( #11516 )" ( #11517 )
...
This reverts commit 3b777a9e05 .
2025-08-05 15:39:07 -07:00
George Hotz
3b777a9e05
optimize in rewrite ( #11516 )
...
* changes
* fix test uops
* dim shouldn't be 0
* huh, why did that one not save
2025-08-05 15:33:26 -07:00
nimlgen
fc4e713d1c
jit graph split tests ( #11507 )
...
* jit graph split tests
* fix
* one more test
* more tests
* fix
* xm
* rmeote
2025-08-05 21:32:37 +03:00
chenyu
ace8e9a706
fix test_conv2d_winograd ( #11511 )
2025-08-05 12:15:46 -04:00
chenyu
223aaa0492
clean up more conv tests ( #11510 )
2025-08-05 12:15:30 -04:00
Garret Castro
76e62a1c23
extract conv layer test logic ( #11488 )
...
* refactor: extract conv layer test logic
* tuple is unnecessary
* integrate _test_conv logic into all conv tests
* fix linter, forgot dilation
* undo winograd extraction
adds too many if statements for a single case
2025-08-05 11:15:54 -04:00
uuuvn
011ef8fa9d
Fix incorrect jit current batch devs reset ( #11505 )
...
`current_batch_devs = []` (in `flush_batch()`) happens between
`new_batched_devs = ...` and `current_batch_devs = new_batched_devs` =>
doesn't actually reset anything leading to things not jitting properly
which 2xs remote bert step time (should have similar effects on any
non-hcq backend)
2025-08-05 08:16:16 +03:00
chenyu
f02720ca2d
fix fuse gate_contiguous unique ( #11504 )
2025-08-04 23:43:31 -04:00