Commit graph

8,470 commits

Author SHA1 Message Date
George Hotz
da35edbb55 reenable that upcast 2025-04-01 17:09:02 +08:00
George Hotz
661431ee75 correctness 2025-04-01 17:01:46 +08:00
George Hotz
8340d9c1c2 disable padding 2025-04-01 16:27:54 +08:00
George Hotz
910cddbbca correct but slower 2025-04-01 16:11:47 +08:00
George Hotz
e6e0c0ec86 should work 2025-04-01 15:25:15 +08:00
George Hotz
d0eedb5a79 hack 2025-04-01 15:05:13 +08:00
George Hotz
f69deddbd4 opt 2025-04-01 14:43:36 +08:00
George Hotz
be11fbbf78 works 2025-04-01 14:38:38 +08:00
George Hotz
812c391617 fp mul 2025-04-01 13:43:16 +08:00
George Hotz
3306083f42 YOU DIDNT FOIL 2025-04-01 12:32:00 +08:00
George Hotz
18d7e9d3f1 oops 2025-04-01 11:56:57 +08:00
George Hotz
1c3f249ecf fix multicore flop tracking 2025-04-01 10:16:01 +08:00
nimlgen
bb7b89475c
dsp multicore 2 (#9644)
* dsp multicore 2

* hmm

* better
2025-03-31 23:56:54 +08:00
George Hotz
8005e6c974 write test pkl imagenet 2025-03-31 19:37:28 +08:00
George Hotz
a3d61a0372 save pkl from benchmark 2025-03-31 19:31:48 +08:00
George Hotz
c73e35aa24 non const fix 2025-03-31 19:10:06 +08:00
George Hotz
0b4b9f61b9 simpler 2025-03-31 19:03:06 +08:00
George Hotz
ee3ddfcdc1 many l2fetch 2025-03-31 18:58:52 +08:00
George Hotz
220d682489 prefetch l2 is so winning 2025-03-31 18:29:12 +08:00
George Hotz
9c388c3539 try to be smarter 2025-03-31 18:23:49 +08:00
George Hotz
4b3a4c8c46 fix prefetch l2 2025-03-31 18:09:48 +08:00
George Hotz
eb606d7230 MULTICORE=1 PYTHONPATH=. QUANTIZE=1 DEBUG=2 DEVECTORIZE=0 python3 extra/replay_pkl.py /tmp/im.pkl 2025-03-31 15:37:07 +08:00
George Hotz
49d52a2763 support acc in __builtin_HEXAGON_A2_vraddub 2025-03-31 15:12:00 +08:00
George Hotz
a59c3dd09a err, that's a bug 2025-03-31 14:56:15 +08:00
George Hotz
a640292aed delete extra 2025-03-31 14:35:32 +08:00
George Hotz
2f48c12441
Merge branch 'master' into dsp_search 2025-03-31 14:27:27 +08:00
George Hotz
e4c545b396
linearizer fix from dsp branch (#9641)
* linearizer fix from dsp branch

* revert that
2025-03-31 14:26:39 +08:00
George Hotz
be3b5efc64 fix precommit a bit 2025-03-31 14:26:19 +08:00
George Hotz
996d0ac1d2 multicore all the way 2025-03-31 14:17:19 +08:00
George Hotz
ec405b919f
Revert "Revert "do not block gc in UOp.toposort (#9623)" (#9624)" (#9639)
This reverts commit 7ef02d0e1c.
2025-03-31 14:03:38 +08:00
George Hotz
77e897b3b1
Merge branch 'master' into dsp_search 2025-03-31 13:03:29 +08:00
George Hotz
49b1c46d16
good changes from the dsp branch (#9638) 2025-03-31 13:02:53 +08:00
George Hotz
273dde69bd remove range split support 2025-03-31 12:43:21 +08:00
George Hotz
a64030d8c8 ignore hacks 2025-03-31 12:36:39 +08:00
qazal
9d67d3a2f3
simpler viz codeblocks (#9636)
* simpler viz codeblocks

* err
2025-03-31 11:48:35 +08:00
George Hotz
9b19129e87 mc 2025-03-31 11:34:22 +08:00
George Hotz
48221d9024 2 global dim 2025-03-31 11:25:12 +08:00
George Hotz
bcfcd60f55 opt weights 2025-03-31 11:02:03 +08:00
chenyu
60eb0c4ed7
exclude slow tests on PYTHON (#9634) 2025-03-30 22:55:05 -04:00
George Hotz
abc90024ac hand coded opts 2025-03-31 10:44:09 +08:00
chenyu
5012ba3f04
cumalu touchup [pr] (#9632) 2025-03-30 22:43:11 -04:00
chenyu
d8d7ac1bb1
fix bert free_intermediates (#9633)
fix when only run eval `TRAIN=0 BERT_SIZE=tiny examples/mlperf/training_submission_v5.0/tinycorp/benchmarks/bert/implementations/tinybox_green/dev_beam.sh`
2025-03-30 22:42:52 -04:00
qazal
ff984c807d
hotfix: less lines for viz helpers (#9631) 2025-03-31 10:10:34 +08:00
George Hotz
f0e6d8394c
Merge branch 'master' into dsp_search 2025-03-31 10:01:19 +08:00
qazal
c206a7ae6d
refactor viz state updates (#9630)
* refactor viz state updates

* onclick
2025-03-31 09:54:54 +08:00
Yvon Manzi
6652003839
Add cumprod to Tensor (#9629)
* probably how cumprod should look like

* update _cumalu to work with MUL

* shorter

* cumprod testing

* clean

* more cleanup

* add cumprod to torch backend.

* make it look like cumsum

* mypy fix

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-03-30 21:49:18 -04:00
geohotstan
d52e91db7b
ONNX ops clean ups (#9622)
* combine work from remove numpy and onnx ops tests

* clippy

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-03-30 21:39:22 -04:00
uuuvn
962c0f65f8
Fix generate_am (#9626)
This should be a comment
2025-03-31 01:15:44 +08:00
uuuvn
2a4247b8c2
RDNA 3.5 support (#9627) 2025-03-31 01:15:20 +08:00
geohotstan
a08b07b4da
Bump onnx==1.17.0 (#9618)
* bump

* remove resize tf_crop_and_resize

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-03-30 03:21:51 -04:00