Commit graph

1,121 commits

Author SHA1 Message Date
chenyu
d93a0bee6b
mlperf ci uses its own cache (#10705)
not to interfere with regular cache which is used by benchmark
2025-06-08 19:43:32 -04:00
George Hotz
81b9c04574
move high level stuff to unit tests [pr] (#10708)
* move high level stuff to unit tests [pr]

* process replay on unit tests

* fix pr, less compute

* set omp num threads

* set 200MB buffer size limit

* delete junk

* fix tests

* faster

* move test_indexing to unit

* faster
2025-06-08 14:05:56 -07:00
George Hotz
4e2c3560b4
smaller tests are faster tests [pr] (#10704)
* remove del spam from CI

* more

* preconstruct default buffer spec

* ignore those errors

* check exception

* more exception check

* skip stuff

* smaller tests mean faster tests

* a few more
2025-06-08 10:54:19 -07:00
chenyu
4f535641f7
add one huggingface_onnx test to mac benchmark ci (#10700)
this crashed for me on onnx parser pr but seems fine for the author. see if ci mac is fine
2025-06-08 12:26:12 -04:00
George Hotz
7ff175c022
cache a venv to avoid pip usage (#10689)
* try built in pip caching

* try venv

* export venv

* set VIRTUAL_ENV

* revert that

* venv key

* fix

* ci cache hit?

* fix windows
2025-06-07 20:13:41 -07:00
George Hotz
53ed64e133
ci speed work 1 (#10676)
* skip a few slow tests

* use a venv for python packages

* create venv

* no user, it's in venv

* ignore venv

* venv

* new cache key

* try that

* this

* version the python cache
2025-06-07 16:33:11 -07:00
wozeparrot
37e1ef1be3
feat: cleanup old AM processes (#10653) 2025-06-05 15:41:00 -07:00
qazal
7114b6ab31
viz browser tests (#10626)
* viz browser tests

* expect failure if js/ isn't included

* back green
2025-06-04 14:58:24 +03:00
chenyu
18e9ec3ea1
add wino cifar to search benchmark (#10615)
* add wino cifar to search benchmark

* FUSE_OPTIM=1

* revert those
2025-06-03 20:38:43 -04:00
chenyu
1c1f578490
DISABLE_COMPILER_CACHE in sdxl search (#10614) 2025-06-03 09:22:25 -04:00
chenyu
4ab3391e6f
set -o pipefail for mlperf run_and_time (#10577)
also run the 5.1 script in ci cron job
2025-05-30 16:36:44 -04:00
wozeparrot
5e3c4a8431
fix: comma testsig (#10568) 2025-05-29 19:00:07 -07:00
George Hotz
ee12e801a3
optional fused optimizers (#10549)
* enumerate cases of Tensors in the JIT

* optional fused optimizers

* add fused optimizer test

* move that there

* ugh
2025-05-28 13:50:30 -07:00
Sieds Lykles
ae02a1e232
[bounty] Z3 symbolic fuzzer [pr] (#10514)
* First version, caught a bug?

* Nicely print failure to reproduce

* Remove that

* Put the assert back

* Change fuzzing to use testing_unit so it has z3

* Test key to match

* Add rule

* Add test

* Add test for edge case 0

* Merge patterns

* update comment

* consistent whitespace

* whitespace

* add condition

* add test

* update comment

* use Variable

* fuzzer using z3_renderer

* Cleaned up printing and debugging

* working new fuzzer

* change some comments and printing

* more formatting

* fuzz failures in seperate file

* fix fstring

* more tests

* naming

* remove added line

* remove comment

* print number of skipped expressions

* use self.assertEqual

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-05-28 16:28:37 -04:00
chenyu
23e41f523a
sdxl also run with cached search (#10546) 2025-05-28 06:51:56 -04:00
chenyu
fffdc4d31c
workflow to run sdxl with search (#10543) 2025-05-27 17:25:41 -04:00
uuuvn
c29c46853f
Very basic mock sqtt (#10512)
This mockgpu sqtt emulation will just ignore basically everything and end
up with a 0x1000 size trace full of zeroes, but just testing for things
like register rename is better than nothing i guess
2025-05-26 14:38:28 -07:00
chenyu
2eeea373af
add BENCHMARK_LOG for mlperf resnet cron (#10516) 2025-05-25 22:00:29 -04:00
b1tg
a1f64af92d
ci: setup llvm for amdremote (#10507)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-05-25 21:52:27 -04:00
wozeparrot
7c81f9f95e
fix: gate mlperf workflow (#10515) 2025-05-25 17:06:21 -07:00
George Hotz
6b8eb5fec2
split mlperf to its own red benchmark run (#10492)
* Add mmapeak implementation for 7900 XTX

* Change identation

* Use a template instead of multiple assebly files

* Fix output formatting

* Reduce register file bank conflicts

* More accurate measurement for quick instructions

* Add support for gfx1201

* RDNA4 wmma requires less VGRPs

* RDNA4 does not have s_cmpk instructions

* Add v_wmma_i32_16x16x32_iu4 for gfx1201

* Add sparse wmma instructions

* split to tinybox red MLPerf Benchmark

---------

Co-authored-by: Panagiotis Kourouklidis <panagiotis.kourouklidis@gmail.com>
2025-05-23 17:12:41 -07:00
George Hotz
bf2a0907be
gate the mockdsp behind MOCKDSP=1 [pr] (#10486) 2025-05-23 11:44:02 -07:00
uuuvn
3ca5680920
Test remote in benchmark (#10304)
hlb cifar is fast so added it, can add bert too if you think it's ok

6 real gpus to test multigraph and transfers + accuracy validation

should probably be added to tinystats too, i don't know how though

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-05-23 12:12:57 -04:00
chenyu
c5acb4e06e
run mlperf resnet daily (#10482)
Runs at 08:05 UTC (12:05 AM Pacific Time)
2025-05-23 07:16:20 -04:00
chenyu
116d9e6306
run mlperf resnet on red box (#10413)
also made push to `update_mlperf` branch trigger
2025-05-19 12:48:36 -04:00
George Hotz
f1fe1f93c1 hotfix: 14000 lines 2025-05-19 09:40:53 -07:00
qazal
90eb3c0e5d
add MobileNetV2 benchmark to comma CI (#10250)
* add MobileNetV2 to comma CI

* symlink imagenet

* also the signature

* comment that out

* need imagenetmock

* same train and test set

* quantize on CPU=1

* verbose

* need __hexagon_divsf3

* 0x858d6c15

* quant cpu + CC=clang-19
2025-05-19 18:22:50 +03:00
George Hotz
b06291077c
no amdgpu kernel driver (#10408)
* no amdgpu kernel driver

* don't test hip

* lower req
2025-05-18 20:52:39 -07:00
chenyu
485e80da69
run_and_time for resnet ci (#10405) 2025-05-18 23:39:57 -04:00
uuuvn
0f825e12f2
Remote fixedvars (#10371)
* amd mockgpu graph support

For testing remote graph stuff (prompted by #10371) in ci

* Remote fixedvars

Somehow none of existing tests failed when fixedvars were added, looking
what to add as an regression test for this

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-05-18 09:57:13 -07:00
uuuvn
27c12be471
amd mockgpu graph support (#10385)
For testing remote graph stuff (prompted by #10371) in ci
2025-05-18 09:43:16 -07:00
chenyu
9b4e2a75cd
symlink datasets in mlperf workflow (#10391) 2025-05-18 03:26:05 -04:00
qazal
0294bfe507
simpler can_pad (#10364)
* simpler can_pad [pr]

* 3 kernels

* tests

* less kernels
2025-05-18 10:00:07 +03:00
chenyu
efa8dfe7fb
test cron job to run resnet (#10368) 2025-05-17 08:57:02 -04:00
wozeparrot
1ed04f993b
move benchmark stat tracking to influxdb (#10185) 2025-05-15 16:14:56 -07:00
Ignacio Sica
47b3055fe2
set fail-fast behavior (#10336) 2025-05-15 11:24:45 -07:00
George Hotz
50181ab09f hotfix: bump to 13500 lines 2025-05-14 18:49:59 -07:00
George Hotz
7a3d4de59a hotfix: add GRAPH_ONE_KERNEL=1 to UsbGPU openpilot test 2025-05-14 14:50:37 -07:00
George Hotz
f1130ab3d3
openpilot benchmark test (#10290)
* openpilot benchmark test

* that
2025-05-13 22:49:28 -07:00
George Hotz
ec46f658d7
openpilot llvm test [pr] (#10288) 2025-05-13 16:51:41 -07:00
uuuvn
ddff9857b8
Remote properties is a dataclass (#10283)
Not strictly required for anything but soon there will be like 4 new
properties and having it be a huge json just seems like a bad taste.

It also seems right to not have a separate endpoint for this, just
`GetProperties` request that returns a repr of this similar to how
requests are sent in `BatchRequest`.

This will also make a switch to anything other than http much simpler
if it will be required for any reason, like just a tcp stream of
`BatchRequest`s
2025-05-13 11:56:58 -07:00
uuuvn
ba87eca0f1
Remote multi (basic) (#10269)
* Basic remote multi support

Simplest thing to be able to use remote with multiple gpus, very slow
because no transfers (copyin copyout for cross-device copies)

* tests
2025-05-13 09:52:47 -07:00
chenyu
ad5cb2717d
FUSE_ARANGE=1 in bert bench (#10263)
still fails, something multi related maybe

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2025-05-13 09:12:19 -04:00
chenyu
0015b3921f
sleep more in CI Remove amdgpu (#10261)
see if this is less flaky
2025-05-12 08:13:44 -04:00
hooved
7b4f05fd00
Add test for correctness of Infinity in WebGPU (#10201)
* use function for infinity instead of uniform

* test infinity math locally

* test infinity math in CI

* make pytest available to MacOS (WebGPU)

* revert to master except failing webgpu test
2025-05-08 05:20:05 -07:00
nimlgen
7d6ed1b1e9
hotfix: mac ci (#10210)
* fixed?

* cmnt
2025-05-08 14:13:23 +03:00
nimlgen
ba52fce4b2
usbgpu: benchmark in ci (#10208)
* usbgpu: benchmark

* usbgpu: benchmark
2025-05-08 12:02:04 +03:00
uuuvn
dba073e5c0
Less messy broken graph on paravirtualized metal workaround (#10182)
* Less messy broken graph on paravirtualized metal workaround

GitHub CI macOS runners use paravirtualized metal which is broken with
graph (some comments say that ICB in particular is broken but in my
testing it was fine sometimes, but other times hitting an assert inside
metal's code related to resouces, so not sure).

> Assertion failed: (resource != nil), function -[IOGPUMetalResource initWithResource:], file IOGPUMetalResource.m, line 458.

This can be reproduced locally with any virtualization software (like utm)
that can create macOS VMs with apple's own virtualization framework.

* unused import
2025-05-06 20:41:02 +03:00
wozeparrot
10437904cd
refactor: ops_cloud -> ops_remote [pr] (#10166) 2025-05-05 15:59:51 -07:00
George Hotz
e07d8b147a hotfix: don't OOM in the osx unit test 2025-05-04 17:53:55 -07:00