Commit graph

236 commits

Author SHA1 Message Date
chenyu
b232c60def
benchmark openpilot 0.9.9 (#11575)
* benchmark openpilot 0.9.9

not sure what to do with the 0.9.7 ones with IMAGE=2 and validate

* name
2025-08-08 01:26:14 -04:00
chenyu
702e38dc19
remove FUSE_ARANGE_UINT (#11567)
also add IGNORE_OOB=1 to bert runs. lowered BS on tinybox to 90 since 96 oom during eval without reset
2025-08-07 16:49:06 -04:00
chenyu
594cbdc66f
skip AM ResNet50 benchmark (#11565)
hanging with FUSE_ARANGE?
2025-08-07 14:07:01 -04:00
nimlgen
1afb290027
ci: fix runner in nv (#11527) 2025-08-06 10:38:04 +03:00
chenyu
3f742a5a7c
comma space lab models benchmark (#11461) 2025-07-31 19:06:18 -04:00
nimlgen
5fc5bb5237
ci: clear processes (#11434)
* unified hcq_smi for managment

* fix

* fix

* no reset for amd
2025-07-30 22:15:18 +03:00
nimlgen
4b4ba5454c
ci: move driver start higher (#11431) 2025-07-30 10:48:38 +03:00
chenyu
204da24cfc
increase driverbenchmark timeout-minutes to 15 (#11428) 2025-07-29 19:45:05 -04:00
nimlgen
c88e401d0e
ci: fix typos in h machine benchmarks (#11423) 2025-07-29 22:11:47 +03:00
George Hotz
1f1f99c287 hotfix: add DEBUG=3 to driver CI 2025-07-29 11:03:47 -07:00
nimlgen
d38d285489
ci: add h machines (#11416)
* ci: add h machines

* more

* fix names

* names not collide

* 20

* 10
2025-07-29 19:21:51 +03:00
chenyu
2b48b961be
fix a few broken AMX tests (#11204) 2025-07-12 21:42:38 -04:00
George Hotz
0597735f28
remove TC=3 not porting this (#11045) 2025-06-30 15:12:49 -07:00
chenyu
126fcf4129
clean up AMD_LLVM in tests (#11021) 2025-06-28 22:45:47 -04:00
chenyu
d71bb6a7b2
remove comma 0.9.4 from benchmark (#10867) 2025-06-18 12:43:59 -04:00
chenyu
4f535641f7
add one huggingface_onnx test to mac benchmark ci (#10700)
this crashed for me on onnx parser pr but seems fine for the author. see if ci mac is fine
2025-06-08 12:26:12 -04:00
wozeparrot
37e1ef1be3
feat: cleanup old AM processes (#10653) 2025-06-05 15:41:00 -07:00
wozeparrot
5e3c4a8431
fix: comma testsig (#10568) 2025-05-29 19:00:07 -07:00
George Hotz
6b8eb5fec2
split mlperf to its own red benchmark run (#10492)
* Add mmapeak implementation for 7900 XTX

* Change identation

* Use a template instead of multiple assebly files

* Fix output formatting

* Reduce register file bank conflicts

* More accurate measurement for quick instructions

* Add support for gfx1201

* RDNA4 wmma requires less VGRPs

* RDNA4 does not have s_cmpk instructions

* Add v_wmma_i32_16x16x32_iu4 for gfx1201

* Add sparse wmma instructions

* split to tinybox red MLPerf Benchmark

---------

Co-authored-by: Panagiotis Kourouklidis <panagiotis.kourouklidis@gmail.com>
2025-05-23 17:12:41 -07:00
uuuvn
3ca5680920
Test remote in benchmark (#10304)
hlb cifar is fast so added it, can add bert too if you think it's ok

6 real gpus to test multigraph and transfers + accuracy validation

should probably be added to tinystats too, i don't know how though

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-05-23 12:12:57 -04:00
qazal
90eb3c0e5d
add MobileNetV2 benchmark to comma CI (#10250)
* add MobileNetV2 to comma CI

* symlink imagenet

* also the signature

* comment that out

* need imagenetmock

* same train and test set

* quantize on CPU=1

* verbose

* need __hexagon_divsf3

* 0x858d6c15

* quant cpu + CC=clang-19
2025-05-19 18:22:50 +03:00
George Hotz
b06291077c
no amdgpu kernel driver (#10408)
* no amdgpu kernel driver

* don't test hip

* lower req
2025-05-18 20:52:39 -07:00
wozeparrot
1ed04f993b
move benchmark stat tracking to influxdb (#10185) 2025-05-15 16:14:56 -07:00
Ignacio Sica
47b3055fe2
set fail-fast behavior (#10336) 2025-05-15 11:24:45 -07:00
George Hotz
7a3d4de59a hotfix: add GRAPH_ONE_KERNEL=1 to UsbGPU openpilot test 2025-05-14 14:50:37 -07:00
George Hotz
f1130ab3d3
openpilot benchmark test (#10290)
* openpilot benchmark test

* that
2025-05-13 22:49:28 -07:00
chenyu
ad5cb2717d
FUSE_ARANGE=1 in bert bench (#10263)
still fails, something multi related maybe

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2025-05-13 09:12:19 -04:00
chenyu
0015b3921f
sleep more in CI Remove amdgpu (#10261)
see if this is less flaky
2025-05-12 08:13:44 -04:00
nimlgen
7d6ed1b1e9
hotfix: mac ci (#10210)
* fixed?

* cmnt
2025-05-08 14:13:23 +03:00
nimlgen
ba52fce4b2
usbgpu: benchmark in ci (#10208)
* usbgpu: benchmark

* usbgpu: benchmark
2025-05-08 12:02:04 +03:00
Ignacio Sica
bf5fb97498
fix AMD_LLVM bf16 tc for gfx1100 (#10102)
* fix amd_llvm bf16 tc

* cleanup pattern
2025-04-30 20:06:38 -03:00
chenyu
4a04098389
fix llama3 with nf4 quantize (#10107)
also int8 outputs is wrong
2025-04-29 15:14:36 -04:00
Ignacio Sica
9d5677c12c
fix ptx linearizer bug 2 [pr] (#9967)
* check for local buffer

* hotfix

* add test_tensor_cores_emulation run for ptx
2025-04-29 14:30:07 -03:00
Ignacio Sica
58cf8cd493
add support for "shared_mem" for LLVM (#10093)
* init llvm shared

* add test_tensor_cores_emulation run for llvm
2025-04-29 08:56:36 -04:00
Ignacio Sica
bda116d773
fix use_tensor_cores propagation (#10048)
* propagate use_tensor_cores

* add use_tensor_core to arg in test and search

* bugfix

* get TC val from ContextVar in search

* revert minor space change

* add tc emulation test to ci and benchmark

* revert

* revert whitespace change

* remove test for ptx

* add comment and remove llvm test run
2025-04-28 19:30:50 -03:00
chenyu
e996584685
olmoe in mac benchmark (#10077) 2025-04-27 21:07:02 -04:00
George Hotz
b6d2effaf5
assign is contiguous (#10066)
* assign is contiguous

* disable process replay for SDXL
2025-04-27 08:40:33 -04:00
Ignacio Sica
023b1c28a2
test_tensor_cores_padded refactor (#9724)
* set pad t 3 for amd padded tc test

* change pad for amd regardless CI

* test tc padded uops and correctness separately

* add test_tensor_cores_padded_uops test to ci

* remove redundant chack for amd device

* cleanup
2025-04-18 17:05:54 -03:00
chenyu
c5db5b83b9
add SHOULD_USE_TC=1 check to simple_matmul (#9802)
* add SHOULD_USE_TC=1 check to simple_matmul

also zero centered the random input and update atol for tf32

* ATOL=2e-2 for HALF
2025-04-09 02:24:42 -04:00
George Hotz
14928fecff Revert "fix TF32 tensor core dropped in tc_sm89 (#9798)"
This reverts commit 7c9a96824f.
2025-04-09 12:27:39 +08:00
chenyu
7c9a96824f
fix TF32 tensor core dropped in tc_sm89 (#9798)
also add `SHOULD_USE_TC=1` to verify TC is applied in simple_matmul
2025-04-08 23:20:50 -04:00
Ignacio Sica
58785181a8
AMD bf16xf32 TC (#9717)
* dont test bf16 for emulated amd tc

* skip bf16 tc test in ci

* skip bf16 for AMD in test_tensor_cores_codegen

* add simple bf16 gemm test to benchmark
2025-04-07 11:41:04 +08:00
chenyu
1d25844d44
Revert "disable CI red llama 3 4 gpu beam (#9690)" (#9709)
This reverts commit 6a5eacba8b.
2025-04-03 02:34:39 -04:00
chenyu
6a5eacba8b
disable CI red llama 3 4 gpu beam (#9690)
device hangs and ci would fail
2025-04-02 03:19:09 -04:00
qazal
4df2b6347d
hotfix: bump tinybox red training CI timeout to 30 minutes (#9426) 2025-03-13 09:31:44 +01:00
nimlgen
cd9d74f7ea
use am in training benchmarks (#9357)
* am in training benchmarks

* fix

* not needed anymore
2025-03-05 19:13:47 +03:00
chenyu
2e7c2780a9
CLANG -> CPU (#9189) 2025-02-20 18:03:09 -05:00
Ignacio Sica
aaed315fee
add AMX support to LLVM (#8957)
* init amx support for llvm

* revert elf changes

* fix attributes for AMX asm calls

* add comments

* add llvm amx job to benchmarks

* cleanup

* cleanup

* hotfix: improve comments

* comment for aux buffers

* hotfix:

* move amx_tc to ClangRenderer

* merge master

* refactor

* add docs

* add corsix docs reference

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-02-12 16:01:18 +08:00
nimlgen
52a69dd5e9
Revert "use am in training benchmarks (#8965)" (#8981)
This reverts commit 107e616857.
2025-02-09 15:43:45 +03:00
nimlgen
107e616857
use am in training benchmarks (#8965)
* am in training benchmarks

* fix

* not needed anymore
2025-02-08 20:20:47 +03:00