Commit graph

11,106 commits

Author SHA1 Message Date
qazal
660c034da6
KERNEL op try 3 (#9061)
* work

* tolerate shape, maybe this is ASSIGN(RESHAPE(BUF), KERNEL)

* err, it's not ASSIGN(BUF, KERNEL), it's ASSIGN(VIEW(BUF), KERNEL)

* burn the boats

* assign slightly works

* assign works

* cleanup + var_vals can exist

* fine image + fix metadata

* metadata, without making everything 30% slower

* diff pruning

* faster assign schedule

* add_buffer_ops stage

* add kernel_spec back

* add viz display

* more strict kernel_spec
2025-02-17 14:47:54 +01:00
qazal
ec80df5115
add PROGRAM renderer to viz [pr] (#9137) 2025-02-17 14:46:08 +01:00
qazal
7b09a72682
don't display void dtype in viz nodes [pr] (#9136)
* don't display void dtype in viz nodes [pr]

* extra
2025-02-17 13:49:36 +01:00
George Hotz
4dd10d03b7
move is_increasing to ops [pr] (#9134) 2025-02-17 19:27:48 +08:00
qazal
22c571d3cb
add kernel axis colors to viz [pr] (#9129)
* add kernel axis colors to viz [pr]

* slightly blending with white makes this nicer

* space
2025-02-17 12:21:35 +01:00
George Hotz
1bf66d62cf
symbolic gets its own file [pr] (#9132) 2025-02-17 18:55:21 +08:00
George Hotz
bd694faf6c
factor out the expander logic [pr] (#9131) 2025-02-17 18:09:48 +08:00
quortus
5bdf0c7951
Bitcast constant folding 2.0 (#9089)
* Prevent const folding in test_payne_hanek_reduction

* Do not use list as a default parameter

* Bitcast constant folding

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-17 18:08:20 +08:00
quortus
2be4529f14
Test broken const folding wraparound behavior (#9080)
* Test broken const folding wraparound behavior

* Add repro for test_payne_hanek_reduction const folding bug

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-17 17:44:56 +08:00
George Hotz
7eea9b639d hotfix: add replay_pkl debugging env 2025-02-17 17:34:58 +08:00
George Hotz
af9d8d39d2
dsp matchers + bump line count to 11300 (#9130) 2025-02-17 17:31:54 +08:00
quortus
638d925e4e
Prevent const folding in test_payne_hanek_reduction (#9088)
* Prevent const folding in test_payne_hanek_reduction

* Do not use list as a default parameter
2025-02-17 17:31:10 +08:00
George Hotz
9289425170
add ast to ProgramSpec + pre matcher [pr] (#9128)
* add ast to ProgramSpec + pre matcher [pr]

* cleaner cast + test fix
2025-02-17 16:39:14 +08:00
qazal
fe260ac4d7
viz/server cleanups [pr] (#9127)
* viz/server cleanups [pr]

* space
2025-02-17 09:59:41 +02:00
George Hotz
a38b47e026 hotfix: DSP doesn't use that path 2025-02-17 10:45:29 +08:00
quortus
edf7213f34
Make bitcast to the same dtype noop (#9121) 2025-02-16 20:28:44 -05:00
Ahmed Harmouche
59fe45f947
Solve get_grouped_dims does not split issue (#9085)
* Solve dims too large errors on webgpu

* Simplify divisor find

* Test square root divisor

* Fix lint

* Refactor into group_dims and split_dims

* Refactor

* Fix lint

* Add back max check in _group_dims

* Prefer grouping over split

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-16 19:57:29 -05:00
Ahmed Harmouche
84dc331dd1
Refactor async (#9126) 2025-02-16 17:47:15 -05:00
qazal
6a9e5598f9
small viz touchups [pr] (#9123) 2025-02-16 20:07:40 +01:00
qazal
b3127f38e6
faster viz data fetching with streaming [pr] (#9122)
* refactor to generator

* yield

* switch to SSE

* start client side + end events

* start javascript work

* need to redo this whole part

* more correct

* diff

* works

* diff cleanup

* more diff cleanup
2025-02-16 19:31:11 +01:00
uuuvn
8926bac00a
am: profiling working (#9119)
ops_amd.py registres device finalization via atexit.register after
finalize_profile is registred in device.py leading to AM device
being closed before finalizing profile leading to hang.
(atexit.register is LIFO: https://docs.python.org/3.12/library/atexit.html#atexit.register)

This pr moves registring device finalization to device.py before
registring profile finalization
2025-02-16 18:51:08 +03:00
qazal
97cb9cb1ed
always viz the first graph + non blocking matches fetch [pr] (#9117)
* always display the first graph in viz [pr]

* simpler

* progress indicator is the matches list style

* remove extra

* back

* res.json is still slow
2025-02-16 13:39:51 +01:00
chenyu
1fda98d14f
fix import time_linearizer [pr] (#9118)
only test that used it was skipped in CI due to being slow
2025-02-15 21:33:28 -05:00
chenyu
c1dfe5c00d
compact get_late_rewrite_patterns [pr] (#9116) 2025-02-15 20:33:09 -05:00
qazal
2e97022e5e
remove extra block in viz [pr] (#9115) 2025-02-16 02:38:09 +02:00
chenyu
fd95543ff1
user scatter_reduce in scatter [pr] (#9114) 2025-02-15 18:21:01 -05:00
chenyu
c954419bc8
minor tweak to transcendental pow (#9112)
also added more pow with const test cases
2025-02-15 18:03:25 -05:00
chenyu
8dfa0024f0
raise in scatter if self and src have different dtype [pr] (#9109)
raise RuntimeError that matches torch instead of an implcitly cast
2025-02-15 11:21:34 -05:00
chenyu
d129ccda4c
add RAWAST back to DEBUG=3 [pr] (#9107) 2025-02-15 09:12:51 -05:00
qazal
2e19976d03
assert views in tensor uops [pr] (#9106) 2025-02-15 13:27:55 +02:00
George Hotz
81f5a7af7d
improve DEBUG=3 [pr] (#9105) 2025-02-15 18:44:56 +08:00
qazal
41d143d27c
new order to prepare for becomes_map = tensor_map [pr] (#9104) 2025-02-15 10:37:36 +01:00
George Hotz
4672d9af73
actual tests for the dsp backend [pr] (#9102)
* actual tests for the dsp backend [pr]

* fix name
2025-02-15 15:17:56 +08:00
George Hotz
7e09057afa
fixup clang devectorize (#9099)
* fixup clang devectorize

* __builtin_convertvector is some casts

* dsp fixups
2025-02-15 09:29:47 +08:00
Marcello Fuschi
8824f7e9df
Make logcumsumexp numerically stable (#9050)
* Make logcumsumexp numerically stable

* Refactor

* Refactor for special case ndim=0

* Refactor

* Use the correct device for mask

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-02-14 19:25:17 -05:00
chenyu
81597ddd96
increase lr for bert (#9098)
had one run that converged better https://wandb.ai/chenyuxyz/MLPerf-BERT/runs/u66tv2hh/overview
2025-02-14 19:10:35 -05:00
b1tg
3ad39b247b
refactor LLVMRenderer (#9090)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-02-15 08:00:31 +08:00
b1tg
1f1362fd27
add truncate_bf16 (#9078)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-02-15 07:59:09 +08:00
Ahmed Harmouche
2dc8f1867c
Synchronize webgpu (#9093) 2025-02-15 00:52:10 +03:00
chenyu
b58e7b1898
zero out the weight in bert init run (#9076)
`DEFAULT_FLOAT=HALF BENCHMARK=10 BS=66 EVAL_BS=6 GPUS=6 MODEL=bert python3 examples/mlperf/model_train.py` no longer oom. I think the buffer of random init weights caused the oom.
2025-02-14 08:40:41 -05:00
qazal
82ad0d2e65
keep CONST/BUFFER uops in tensor_map [pr] (#9083) 2025-02-14 14:50:08 +02:00
qazal
65297066c2
move buffer refcount increment to the toposort [pr] (#9081) 2025-02-14 12:54:22 +01:00
chenyu
73af42aeab
fix pow backward when base is 0 (#9075) 2025-02-13 21:06:01 -05:00
qazal
2d04a75a40
start tracking bottom_up_rewrite in viz [pr] (#9071)
* start tracking bottom_up_rewrite in viz [pr]

* use the tracking matcher in test_viz
2025-02-14 00:28:10 +01:00
chenyu
5ef48bbe0a
swap order in rsqrt (#9069)
fixed backward for 0
2025-02-13 16:51:21 -05:00
Ahmed Harmouche
e83905696e
Show install instructions when dawn library is missing (#9059)
* Show install instructions when dawn library is missing

* Handle missing dawn in ops_webgpu

* Simplify

* Solve f-string backlash error
2025-02-14 00:30:20 +03:00
chenyu
9e91898941
bert eval at the end of training (#9070)
always eval at the last epoch
2025-02-13 16:29:44 -05:00
chenyu
e02e3b94c3
remove SQRT hack in llvm (#9067)
replaced with xpow 0.5 in transcendental. fixed sqrt(0) backward
2025-02-13 15:42:34 -05:00
chenyu
947c97e6ff
add test_sqrt to test_speed_v_torch (#9066)
working on getting rid of llvm sqrt hack
2025-02-13 15:25:54 -05:00
chenyu
49abc09f77
remove the reshapes in test_arange_2_reduce [pr] (#9063) 2025-02-13 12:33:25 -05:00