Commit graph

362 commits

Author SHA1 Message Date
George Hotz
f3cb4c3eef oops 2025-04-01 23:44:44 +08:00
George Hotz
6ecaf11224 ugh many hacks 2025-04-01 23:33:09 +08:00
George Hotz
8b24f9cb0d oops, didn't mean to change that 2025-04-01 17:55:04 +08:00
George Hotz
797e512c00 all correct 2025-04-01 17:51:24 +08:00
George Hotz
f600482982 correctness 2025-04-01 17:27:16 +08:00
George Hotz
da35edbb55 reenable that upcast 2025-04-01 17:09:02 +08:00
George Hotz
8340d9c1c2 disable padding 2025-04-01 16:27:54 +08:00
George Hotz
e6e0c0ec86 should work 2025-04-01 15:25:15 +08:00
George Hotz
eb606d7230 MULTICORE=1 PYTHONPATH=. QUANTIZE=1 DEBUG=2 DEVECTORIZE=0 python3 extra/replay_pkl.py /tmp/im.pkl 2025-03-31 15:37:07 +08:00
George Hotz
996d0ac1d2 multicore all the way 2025-03-31 14:17:19 +08:00
George Hotz
273dde69bd remove range split support 2025-03-31 12:43:21 +08:00
George Hotz
9b19129e87 mc 2025-03-31 11:34:22 +08:00
George Hotz
48221d9024 2 global dim 2025-03-31 11:25:12 +08:00
George Hotz
abc90024ac hand coded opts 2025-03-31 10:44:09 +08:00
George Hotz
e0fd84dd64 add locals 2025-03-28 18:52:48 +08:00
George Hotz
0aa7031b5f simpler 2025-03-28 17:42:14 +08:00
George Hotz
d32ad080c3 fast 66 2025-03-27 16:47:58 +08:00
George Hotz
6d860389f4 issue 2025-03-27 16:21:02 +08:00
George Hotz
38488ec3b0 extend to 128 2025-03-27 10:35:06 +08:00
George Hotz
ff96f0adae 7 2025-03-27 00:29:00 +08:00
George Hotz
5dd59a6096 touchup 2025-03-27 00:23:58 +08:00
George Hotz
a436d7542f up 7 2025-03-27 00:00:40 +08:00
George Hotz
928994c6ea bugfix 2025-03-26 20:10:15 +08:00
George Hotz
290ba9ee37 more cleanups 2025-03-26 17:59:26 +08:00
George Hotz
8660fecb02 unroll both sides 2025-03-26 15:12:02 +08:00
George Hotz
f1ff18acec prepad kernel weights 2025-03-26 12:13:46 +08:00
George Hotz
f6e64a5e8e optional conv 2025-03-25 19:00:48 +08:00
George Hotz
554a490751
Merge branch 'master' into dsp_search 2025-03-24 12:29:22 +08:00
George Hotz
de7d6cec3a hotfix: DEBUG 5 prints the ast 2025-03-24 11:43:11 +08:00
George Hotz
fd73ec2b1b knum 2025-03-22 18:59:54 +08:00
George Hotz
e1d2bec4a4 opt 2025-03-22 18:52:56 +08:00
George Hotz
26b02a037c fix 33 2025-03-22 17:17:47 +08:00
Ignacio Sica
eddafb84e5
Bugfix for TC=3 (#9464)
* wrong but uses less shared

* for size 8 tc1 with devectorize in 0 loads into local before wmma and works

* improvements over tc1 devectorize

* fix tc=3

* works for handcoded tc opts

* clean bugfix tc=3

* fix

* revert changes
2025-03-21 16:43:42 -07:00
chenyu
6da78164f9
assert Kernel ast.op to be Ops.SINK [pr] (#9539)
rest of the code assumes self.ast is defined anyway
2025-03-21 18:09:44 -04:00
George Hotz
0416b0998d revert those 2025-03-21 17:15:38 +08:00
George Hotz
5ce951fb34 l2 2025-03-21 11:14:12 +08:00
George Hotz
e5ccd9e846 work 2025-03-20 15:20:03 +08:00
George Hotz
52ae9af4dd
Fast DSP for MobileNetV2 (try 2) (#9467)
* Fast DSP for MobileNetV2 (try 2)

* enable fast path on uchar

* fix tests
2025-03-17 15:10:36 +08:00
chenyu
ca5064a5b6
remove Kernel.float4_axis [pr] (#9448) 2025-03-14 17:54:32 -04:00
George Hotz
a73d8717f3
fast amd gemm (#9318)
* 50 TFLOP AMD gemm

* add lds tiling

* register tiling

* flip locals

* work

* comment

* remove those
2025-03-03 12:01:14 +08:00
chenyu
2e7c2780a9
CLANG -> CPU (#9189) 2025-02-20 18:03:09 -05:00
qazal
cf315d544b
rename can_pad arg to cache [pr] (#9170) 2025-02-19 12:24:59 +01:00
George Hotz
a330f3338c
save applied opts in ProgramSpec [pr] (#9150) 2025-02-19 00:40:03 +08:00
George Hotz
ff9b985d9f hotfix: View Base AST 2025-02-18 18:48:34 +08:00
George Hotz
ddddcc165b
colors back in DEBUG=2 [pr] (#9155) 2025-02-18 16:17:57 +08:00
George Hotz
caee42e8a6
Revert "name from uops [pr] (#9151)" (#9154)
This reverts commit 28897be9a2.
2025-02-18 16:06:44 +08:00
George Hotz
28897be9a2
name from uops [pr] (#9151) 2025-02-18 15:52:03 +08:00
George Hotz
a4dab3ec3f
add name uop (#9149)
* add name uop, TODO: refactor renderer to use

* renderer uses name uop

* fix tests

* render

* ptx
2025-02-18 15:26:58 +08:00
George Hotz
df3b320f46
rewriter -> devectorizer [pr] (#9147) 2025-02-18 12:42:08 +08:00
George Hotz
9289425170
add ast to ProgramSpec + pre matcher [pr] (#9128)
* add ast to ProgramSpec + pre matcher [pr]

* cleaner cast + test fix
2025-02-17 16:39:14 +08:00