Commit graph

13,471 commits

Author SHA1 Message Date
George Hotz
263b724143
one cache and bump it (#13258) 2025-11-13 07:33:31 -08:00
George Hotz
5efa727b83
move _pool to MovementMixins (#13257) 2025-11-13 07:28:52 -08:00
George Hotz
bcdfc109b5 hotfix: disable flaky test 2025-11-13 06:19:28 -08:00
qazal
006dea4c3e
roc: only save instruction execs (#13254) 2025-11-13 21:28:40 +08:00
nimlgen
f9586b38ba
system: pci mask and val (#13251) 2025-11-13 20:44:58 +08:00
George Hotz
7316da3253
new readme (#13250)
* new readme

* update
2025-11-13 00:48:28 -08:00
George Hotz
17aa3379e9 hotfix: improve self_tokenize 2025-11-13 00:18:57 -08:00
chenyu
4e5a9132e7
JIT_BATCH_SIZE=0 in compile3 (#13245)
fixed some enqueue time
2025-11-12 23:12:45 -05:00
wozeparrot
759557f633
feat: move tk tests to testextra (#13242) 2025-11-12 17:06:53 -08:00
chenyu
3f939f3d3c
update pm_simplify_valid (#13241)
* update pm_simplify_valid

fixed openpilot conv regression

* IMAGE training is broken
2025-11-12 19:40:02 -05:00
chenyu
f9851a852f
minor update to uop_given_valid [pr] (#13243)
split from #13241
2025-11-12 19:03:18 -05:00
qazal
fe2876a6d8
hotfix: second GB/s in viz (#13240) 2025-11-13 07:14:27 +08:00
George Hotz
a23dea202b
actually make AMD_LLVM not default (#13238) 2025-11-12 15:07:23 -08:00
George Hotz
ab9fa964d8
DISABLE_COMPILER_CACHE -> CCACHE (#13234)
* DISABLE_COMPILER_CACHE -> CCACHE

* Fix cachekey assignment in Compiler constructor
2025-11-12 15:07:09 -08:00
qazal
be2e24cb25
roc: requires sudo to install (#13237) 2025-11-12 16:59:22 -05:00
George Hotz
8f1f195b6d hotfix: no hexdump for usbgpu patch.py 2025-11-12 12:05:37 -08:00
nimlgen
9a53fcbde4
amd: sqtt on rdna3.5 (#13233) 2025-11-13 03:30:42 +08:00
George Hotz
13f10a31dc
AMD_LLVM default off (#13232) 2025-11-12 11:06:33 -08:00
qazal
8b26cf2b3d
sqtt: update rcp timing test (#13231)
* sqtt: assert correct output in timing test

* found why
2025-11-13 02:01:54 +08:00
Jan Akhremchik
bc8e537423
Add NONZERO op to onnx backend (#13211) 2025-11-12 08:55:51 -08:00
nimlgen
af17e07251
viz: sqtt touchups (#13228)
* viz: sqtt touchups

* revert

* matches
2025-11-12 22:40:37 +08:00
qazal
7a6853fa40
viz: show python callstack in the first graph (#13218) 2025-11-12 20:52:28 +08:00
nimlgen
82eb63d3ad
qcom: auto switch idle timer when profiling (#13230)
* qcom: auto switch idle timer when profiling

* fi
2025-11-12 20:31:24 +08:00
nimlgen
fcd8d0751a
test_timing for hip (#13229) 2025-11-12 20:28:58 +08:00
qazal
74b9d33acb
viz: direct link to program source (#13227) 2025-11-12 16:27:13 +08:00
wozeparrot
371c1f2355
tk: move tiles to class (#13224) 2025-11-11 21:53:46 -08:00
Christopher Milan
41a098a82d
In-tree autogen: libc.py (#13217)
* checkout changes from autogen branch

* parents

* pylint happy

* move sys to system in helpers.py

* typo

* typo
2025-11-11 19:13:48 -08:00
wozeparrot
222bb12ddf
tk softmax (#13205) 2025-11-11 15:13:16 -08:00
wozeparrot
787f0070ed
feat: don't use output reg as local reduce reg (#13203) 2025-11-11 14:35:16 -08:00
chenyu
ece1415def
clean up image_dot and image_conv2d (#13222)
* clean up image_dot and image_conv2d

* those are fine

* interesting
2025-11-11 15:53:03 -05:00
nimlgen
2f0ea29b34
qcom: 48bit timestamps (#13214)
* qcom: 48bit timestamps

* f

* lol

* fix
2025-11-12 04:14:33 +08:00
qazal
bc55bc4849
cleanup test_viz profiler tests (#13221) 2025-11-12 03:46:48 +08:00
chenyu
23b90945c3
add a benchmark for openpilot vision with DEBUG=2 (#13219)
see per kernel speed, also disable the jobs for 0.9.9
2025-11-11 14:41:52 -05:00
George Hotz
c2075f3613
gc disable during big rewrites (#13215)
* gc disable during big rewrites

* cleaner with helper
2025-11-11 10:30:47 -08:00
Roelof van Dijk
e59313da08
migrate pytest and ruff (#13216) 2025-11-11 13:27:51 -05:00
Gaétan Lepage
6fd7ce3832
migrate to pyproject.toml (#13189)
* migrate to pyproject.toml

* move mypy config to pyproject.toml
2025-11-11 09:09:27 -08:00
qazal
8002921a04
viz: improve the program run tooltip (#13212)
* add tflops to tooltip format

* show if the run was batched
2025-11-12 00:56:03 +08:00
qazal
f91e366a17
viz: display the graph layout recursion error (#13194)
* viz: display the graph layout recursion error

* share styles

* +min-width

* same thing

* inline the append
2025-11-11 15:25:12 +08:00
wozeparrot
73497af4c0
clean: use np for allclose (#13204) 2025-11-10 23:02:43 -08:00
George Hotz
a6360fd94d
store can have shape (#13202)
* store can have shape

* _shape
2025-11-10 22:16:47 -08:00
b1tg
f3692b7406
clean up hip renderer (#13063)
* clean up hip renderer

* ocml

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-11-11 00:44:24 -05:00
chenyu
22b8579234
one last regressed dm kernel (#13201) 2025-11-10 23:30:52 -05:00
chenyu
58b7e4fab3
GROUPTOP heuristic on more axes (#13206)
fixed dm speed
2025-11-10 23:30:37 -05:00
chenyu
829cdafccc
update openpilot slow conv uop ast (#13197)
the two remaining slow ones
2025-11-10 17:03:20 -05:00
George Hotz
0c978d45e6
stub attention (#13196)
* stub attention

* name the kernels
2025-11-10 13:48:38 -08:00
chenyu
58c30fc7ce
minor image_conv2d cleanup (#13193) 2025-11-10 16:05:40 -05:00
chenyu
60e55d9a2d
line count 18500 (#13191) 2025-11-10 13:52:13 -05:00
nimlgen
09a59c2203
qcom: support new chip versioning (#13185)
* qcom: support new chip versioning

* ops

* nit

* fix

* f
2025-11-10 23:57:29 +08:00
qazal
50934050bc
sqtt: append all wave execs (#13190) 2025-11-10 23:50:08 +08:00
qazal
38a24731a1
cleanup sqtt tooling (#13188)
* cleanup viz/serve.py

* use latest profile in rgptool.py

* unwrap nullable in roc.py, fix disasms typing
2025-11-10 20:52:57 +08:00