Commit graph

293 commits

Author SHA1 Message Date
chenyu
f5090192c8
reorder AMD tensor core benchmark test (#13860)
* reorder AMD tensor core benchmark test

* disable that
2025-12-28 12:29:51 -05:00
George Hotz
4702da41d5 hotfix: mkdir for extra/disassemblers 2025-12-19 17:18:37 -04:00
Christopher Milan
97103831c5
Revert "remove image from BufferSpec (#13636)" (#13761)
This reverts commit 2571a1eb47.
2025-12-19 13:54:36 -05:00
Christopher Milan
2571a1eb47
remove image from BufferSpec (#13636)
* remove image from BufferSpec

* cl tiny_gemm (64) works

* mypy

* padding

* openpilot CL

* reshape properly

* remove extra qcom checks

* pad output

* mypy

* update compile test

* move undo

* TestImageCopy valid images

* TestImageRealization valid images

* TestImageDType valid images

* cleanups

* test_renderer_failures

* ruff

* mypy

* simplify ops_qcom

* bump step time
2025-12-19 13:41:20 -05:00
George Hotz
4b741e893f
remove REMOTE=1 (#13722)
* remove REMOTE=1

* leave ibverbs
2025-12-16 15:58:10 -04:00
George Hotz
7589c897b2
split usbgpu tests into their own benchmark [pr] (#13711) 2025-12-15 21:42:40 -04:00
qazal
6bafd90248
remove unused process replay input [pr] (#13712) 2025-12-16 09:29:35 +08:00
nimlgen
cbae33003d
ci: add usb4 (#13643)
* ci: add usb4

* debug=3

* undef

* revert
2025-12-11 19:41:41 +03:00
chenyu
2471b49e45
minor bert / llama change from grad acc branch (#13622)
* minor bert / llama change from grad acc branch

* revert those
2025-12-08 16:04:14 -05:00
chenyu
ac1227575f
IMAGE=1 driving_vision in benchmark (#13587) 2025-12-05 10:20:54 -05:00
chenyu
8902781dc1
enable more benchmarks (#13540)
* enable more benchmarks

* disable some

* adjust ASSERT_MIN_STEP_TIME

* mac NOCLANG=1
2025-12-02 20:31:14 -05:00
nimlgen
455dd88236
nv: minimal hevc (#13502)
* nv: minimal hevc

* validate

* not needed

* tralin

* var

* cpu

* fxi

* desc

* move

* cleanup
2025-11-30 16:46:55 +03:00
wozeparrot
1f648bb1ba
feat: reenable mobilenetv2 dsp (#13320) 2025-11-21 15:21:49 -08:00
chenyu
6372c95094
disable benchmark MobileNetV2 on DSP (#13305)
failed on tinyc2
2025-11-16 09:42:52 -05:00
Harald Schäfer
3af231904e
openpilot compile tests: assert pre-rangify speeds (#12775)
* assert pre-rangify speeds

* typo

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-11-13 09:39:06 -08:00
chenyu
3f939f3d3c
update pm_simplify_valid (#13241)
* update pm_simplify_valid

fixed openpilot conv regression

* IMAGE training is broken
2025-11-12 19:40:02 -05:00
George Hotz
ab9fa964d8
DISABLE_COMPILER_CACHE -> CCACHE (#13234)
* DISABLE_COMPILER_CACHE -> CCACHE

* Fix cachekey assignment in Compiler constructor
2025-11-12 15:07:09 -08:00
chenyu
23b90945c3
add a benchmark for openpilot vision with DEBUG=2 (#13219)
see per kernel speed, also disable the jobs for 0.9.9
2025-11-11 14:41:52 -05:00
chenyu
6c48c87e51
improved ASSERT_MIN_STEP_TIME (#13182)
* improved ASSERT_MIN_STEP_TIME

getting close, current time +1ms  then round up

* relax
2025-11-09 16:41:12 -05:00
George Hotz
42b34cf83d
bottom up linearizer (#13133)
* bottom up linearizer

* late stores

* more complete

* remove broken heuristic

* upcast size

* opt

* more conservative

* it needs that

* disable opencl half on QCOM

* fix

* make that a real test

* cpu test okay

* ptx skip

* end is after the range
2025-11-06 15:30:32 -08:00
chenyu
54141e9cb9
DISABLE_COMPILER_CACHE=1 in speed_v_theoretical (#13096) 2025-11-04 11:28:18 -05:00
George Hotz
5eb87ab131 hotfix: bump cifar time to 350 2025-10-30 17:29:20 +08:00
b1tg
bb307b9e81
fix fp8 vectorization (#12977)
* fix fp8 vectorization

* add fp8 tc to benchmark
2025-10-28 13:55:30 -04:00
b1tg
45e2f916a3
add quantize fp8 in llama3 (#12893)
* add quantize fp8 in llama3

* don't truncate fp8 alu result

* cast to float32 before matmul

* --model weights/LLaMA-3/8B-SF-DPO/

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-10-27 10:22:57 -04:00
wozeparrot
6e00dec95d
feat: pin openpilot 0.10.1 models (#12878) 2025-10-22 14:57:54 -07:00
chenyu
f0831c8c30
add 0.10.0 to comma benchmark (#12875)
* add 0.10.0 to comma benchmark

disabled the 0.10.1 ones which are pinned to master. it does not work because benchmark uses the cached old version

* that's pinned
2025-10-22 15:18:21 -04:00
George Hotz
726988fa4b
late ifs try 2 (#12865)
* late ifs try 2

* fix image

* fix that test

* panic

* ptx fixups

* preserve toposort

* those pass locally

* Revert "those pass locally"

This reverts commit 063409f828.

* no ls

* make that explicit
2025-10-22 18:49:27 +08:00
chenyu
6d86e962c7
update ASSERT_MIN_STEP_TIME (#12857)
0.10.1 driving_policy is good now, still need driving_vision and dmonitoring to be fast
2025-10-21 22:46:07 -04:00
wozeparrot
62e7b8b870
feat: just use compile3 (#12849) 2025-10-21 07:56:50 -07:00
wozeparrot
990e8b97ee
feat: log openpilot 0.10.1 times (#12816) 2025-10-20 18:30:34 -07:00
chenyu
350a4754a9
Update openpilot models (#12780)
* Update openpilot models

* Update slower model

* fix that

---------

Co-authored-by: Bruce Wayne <harald.the.engineer@gmail.com>
2025-10-18 20:32:35 -04:00
Harald Schäfer
addc54b96c
Simplify openpilot compile3.py (#12748)
* Simpler compile3

* tests

* remove default args

* onnx file is still fp16

* self-test FP16 too

* allow test disable

* absurd tolerance

* Just do latest

* Try simplest

* use later models

* kernel count not relevant if speed is good

* dead improts

* Revert "dead improts"

This reverts commit f68c2cd15d.

* Revert "kernel count not relevant if speed is good"

This reverts commit 0955ca4ee0.

* add back kernal count check on latest model
2025-10-18 10:12:22 -04:00
chenyu
285534ce64
delete DONT_REALIZE_EXPAND and DONT_GROUP_REDUCES (#12744)
does nothing now
2025-10-16 14:11:33 -04:00
chenyu
53478c741d
relax ASSERT_MIN_STEP_TIME for space lab policy (#12742) 2025-10-16 11:40:36 -04:00
chenyu
b8cf35fb77
print macOS version in CI (#12705) 2025-10-15 15:05:33 -04:00
chenyu
89df6f611d
reenable sdxl mac benchmark (#12680)
also updated faster sd step times
2025-10-14 17:36:17 -04:00
Sieds Lykles
e625c27598
update min step times openpilot (#12600) 2025-10-10 11:24:27 +02:00
chenyu
be05028419
move ASSERT_MIN_STEP_TIME to compile3 (#12535)
threshold is current time +20%
2025-10-08 22:16:59 -04:00
chenyu
5986d656a2
tighter ASSERT_MIN_STEP_TIME (#12531)
set to about 1.2x of actual time now
2025-10-08 21:22:54 -04:00
George Hotz
3b0b3a2e64
fast RANGEIFY (#12504)
* rtoposort is fast, can replace rangeify with this

* fast rangeify

* work

* fast rangeify works for mnist

* should work

* progress

* pad fix

* FAST

* tests passing

* don't delete those shape ops

* put in rangeify map

* ending ranges fix

* tests

* mstack/mselect no hacks

* move to indexing.py

* touch up tests + add comments

* disable failing test

* actually make the file readable

* failing

* error
2025-10-08 19:38:06 +08:00
chenyu
eb3bc277b3
remove ASSERT_MIN_STEP_TIME in external_benchmark_openpilot (#12495)
should add for compile3 and compile 3 only
2025-10-07 22:13:42 -04:00
chenyu
fe774a4319
more skip WINO on benchmark (#12482) 2025-10-07 03:43:51 -04:00
chenyu
8ad5f9e74f
skip slow benchmarks (#12481)
* skip slow benchmarks

padded tc is already slow, rest are slow with rangeify (correct if run locally)

* relax more
2025-10-07 03:28:56 -04:00
Sieds Lykles
e74be4a140
UOp.factor and add chain sorting (#12413)
* add ordering

* fix some tests

* fix more tests

* shorten comment

* update test

* add rule and test

* add rule and test

* remove check

* use fold_divmod_congruence instead of simplify

* adjust tests

* shorten line

* new algo

* add test

* add function to un-nest the div

* add UOp.factor

* test UOp.factor

* uop_given_valid tries to factor simplex expression

* shorten line

* symbolic_flat is back

* change that back

* fix those new tests

* new rule for ordering

* factor multiple factors

* no symbolic_flat

* symbolic_flat to there

* move that back

* fix imports

* merge correctly

* linter happy

* add rule

* add a test

* cleanup

* revert that for now

* UOp.factor returns self instead of None

* try all_candidates

* remove or_else

* post index symbolic

* add test

* maket this closer to the original

* increase mac hlb_cifar min step time

* add some ordering tests

* cleanup

* increase pytest timeout time

* check dtype
2025-10-04 06:05:38 +02:00
chenyu
494bb12500
skip slow cifar bf16 on red benchmark (#12213)
very slow to compile the fake bf16
2025-09-16 14:55:01 -04:00
chenyu
419e997187
increase benchmark timeout (#12212)
account for compile cache, and it's annoying that job died due to timeout also messes the machine
2025-09-16 14:09:02 -04:00
nimlgen
fb96394ff5
auto-select available compilers (#12094)
* device: auto select compilers

* fix

* metal+opencl

* nv/cuda

* test without ptx

* ptx

* fix tests

* fix

* fix test

* rename

* test + cleaner

* xx

* ops

* better test

* win?

* um?

* types

* debug

* win??

* sep rung

* wtf?

* debug

* skip win

* revert this

* types
2025-09-10 19:52:01 +03:00
Sieds Lykles
5b73076e48
assert benchmark times (#12042)
* assert jitted times in openpilot

* better error

* better error

* add ASSERT_MIN_STEP_TIME to more models

* t is step_times

* update benchmark times

* update times
2025-09-09 23:40:02 +02:00
nimlgen
1c6c42715f
unify cpu and llvm (#11982)
* try unify cpu and llvm

* fixes

* fix

* ops

* no llvm

* fix

* rm

* lvmm is ot

* oops

* override

* no llvm

* ignore

* skip llvm

* ooops
2025-09-09 13:54:44 +03:00
George Hotz
433581f8ed
make POSTOPT=2 the default (#12034)
* make POSTOPT=2 the default

* more matching tc

* fix winograd

* fix that test

* add matvec to Scheduler

* flip tc sort order

* similar speed

* fix beam on image

* disable slow tests

* slow
2025-09-05 14:34:05 -07:00