Commit graph

11,106 commits

Author SHA1 Message Date
geohotstan
536b254df4
Bump onnx to 1.18.0 (#11266)
* bump

* thou hast implement functions

* hacked in domain support

* some clean ups

* hack quantize_onnx_test too

* add helper lol, why onnx tests why

* better dispatcher, but need tests and better naming

* flaky ci

* change some names

* small clean ups

* make it easier to clean up tests once ORT supports 1.18.0

* nits

* fix bug of Softmax_1 being registered in onnx_ops

* need a default value

* resolve_const is better name

* fix OnnxRunner.to

* use proper domain names
2025-07-17 15:35:41 -04:00
qazal
1606491b1c
viz: refactor to generic shape spec (#11272) 2025-07-17 20:25:15 +03:00
nimlgen
cfb229473f
hcq: refactor buffer mapping (#11271)
* hcq: refactor buffer mapping

* fix

* fix mypy
2025-07-17 15:16:49 +03:00
qazal
e68af3b336
disable flaky assert in test_cpu_profile (#11270) 2025-07-17 06:50:39 +03:00
chenyu
60ffe00172
remove Kernel.first_reduce [pr] (#11269) 2025-07-16 18:30:14 -04:00
chenyu
522dc72f08
remove Kernel.local_dims [pr] (#11268)
* remove Kernel.local_dims [pr]

also not needed

* fix test_matvec
2025-07-16 17:46:19 -04:00
chenyu
d8c783f65f
remove Kernel.global_dims [pr] (#11267)
all reference to global used axis_types, so we don't need number of global helper that was used to locate GLOBAL
2025-07-16 17:16:49 -04:00
uuuvn
6f0ddcc24c
Remote cross-host graph (#11229) 2025-07-16 13:27:54 -07:00
nimlgen
6aa20c607d
nv: graceful shutdown to cold state (#11265) 2025-07-16 19:49:35 +03:00
chenyu
59b52d49d7
remove .global_dims that are for locating GLOBAL [pr] (#11264) 2025-07-16 11:19:31 -04:00
chenyu
e6c016ddd0
move check axis < shape_len to real_axis [pr] (#11263)
ensure output of real_axis is always valid
2025-07-16 10:15:44 -04:00
quortus
924bc7c9ae
Fix test_uop_spec (#11259) 2025-07-16 11:02:31 +03:00
chenyu
c8e5c4d7c3
insert_before -> insert_at [pr] (#11257)
more precise
2025-07-15 17:44:34 -04:00
wozeparrot
b32d9321fb
feat: more keccak cleanup + more explicit shape (#11256) 2025-07-15 13:57:47 -07:00
chenyu
9f79079cbe
update KernelInfo dims to return list of dims [pr] (#11255)
local dims are not contiguous once upcast sits between local and groupreduce
2025-07-15 15:01:39 -04:00
chenyu
629fa21b6b
remove final range in heuristic [pr] (#11251)
all dims are based on AxisType now
2025-07-15 11:39:15 -04:00
chenyu
d7adc24083
remove Kernel.first_upcast [pr] (#11248)
first_reduce does not need a default now
2025-07-15 10:21:34 -04:00
nimlgen
197d345804
nv: print rpc msg with DEBUG>=3 (#11247) 2025-07-15 16:39:58 +03:00
chenyu
034e51bd36
remove first_reduce used for locate real_axis [pr] (#11245)
LOCAL goes to the last of (GLOBAL+LOCAL)+1
GROUP goes to right before first REDUCE
2025-07-15 09:19:38 -04:00
chenyu
0e2422d216
Kernel.axes_of helper [pr] (#11243)
look up dim based on AxisType
2025-07-14 22:17:43 -04:00
chenyu
968f6b2a2e
remove hasattr(self, 'axis_types') checks in dims property [pr] (#11242)
no needed anymore
2025-07-14 20:59:51 -04:00
leopf
557ca7d757
testing SimpleTokenizer against OASST1 (#11214) 2025-07-14 17:09:31 -07:00
wozeparrot
5878b189b8
don't const fold shape changing bitcast (#11236) 2025-07-14 16:42:16 -07:00
chenyu
b6662096cb
remove more first_reduce [pr] (#11239) 2025-07-14 19:13:44 -04:00
chenyu
eb8e17ef59
remove most of the first_upcast [pr] (#11238) 2025-07-14 16:54:24 -04:00
qazal
c78b1cbae7
viz profiler cleanups (#11234)
* move all render calls to zoom callback

* cleanup the naming

* require transform arg
2025-07-14 19:06:33 +03:00
chenyu
36ce883c7d
update heuristic to use k.upcastable_dims and k.unrollable_dims [pr] (#11233)
idea is to make it behave the same regardless of axis order and with empty 1s in shape.

not quite fully remove all first_upcast yet because some conditions used already upcasted size which need a separate benchmark to remove.
2025-07-14 11:10:30 -04:00
qazal
c0c695dd89
viz: remove extra transform (#11232) 2025-07-14 16:51:47 +03:00
chenyu
da219199f5
minor hcopt cleanup [pr] (#11231) 2025-07-14 09:36:25 -04:00
nimlgen
756ba1a5f9
nv: support ampere in nvpci (#11230) 2025-07-14 15:35:44 +03:00
uuuvn
b2cc6cfa1b
JIT_BATCH_SIZE is a ContextVar (#11228) 2025-07-14 14:03:45 +03:00
nimlgen
c4a920d95c
nv: use last signature (#11227) 2025-07-14 13:00:39 +03:00
nimlgen
a830d37881
nv: check wpr2 is inited (#11226) 2025-07-14 11:46:14 +03:00
chenyu
0387bb9630
clean up image upcast in hcopt [pr] (#11220)
GLOBAL+LOCAL for upcast
GROUP_REDUCE+REDUCE for unroll
2025-07-13 18:06:43 -04:00
chenyu
85ddd72038
simpler grouptop in hcopt (#11219)
* simpler grouptop in hcopt

keep the only perf relevant conditions and the rest is handled by try except

* update openpilot read image count
2025-07-13 16:06:09 -04:00
qazal
40847ca29c
viz: prune out of screen rects (#11217) 2025-07-13 21:49:59 +03:00
chenyu
674dc28505
remove Kernel.full_unupcasted_shape [pr] (#11215)
decomp to shape_len and first_upcast to get the last upcast-able dim
2025-07-13 13:56:23 -04:00
chenyu
9575cf6c6e
shave more hcopt [pr] (#11213)
start to use AxisType for conditions
2025-07-13 12:43:58 -04:00
Alisher Zhubanyshev
4ef6b46b34
hcq: reduce launch overhead (#11193)
* nv: improve mmio creation speed

* add memoryview test

* fix indents

* move mv bench to `test_helpers`, remove comparison
2025-07-13 19:25:50 +03:00
nimlgen
1cc2b3f845
nv: use wait_cond (#11212) 2025-07-13 19:25:20 +03:00
nimlgen
6cce3a5d58
generic wait_cond (#11210)
* generic wait_cond

* fix linter

* fix linter
2025-07-13 16:59:21 +03:00
chenyu
e11ccf2342
update float4 condition in hcopt (#11211)
don't need all upcast candidates to be upcast-able, only check the actual one
2025-07-13 09:51:45 -04:00
nimlgen
55c54d9745
nv: sync after gpfifo setup (#11209) 2025-07-13 14:40:11 +03:00
chenyu
d90d837013
clean up hcopt [pr] (#11205)
removed one condition that's always true
2025-07-12 23:10:27 -04:00
chenyu
2b48b961be
fix a few broken AMX tests (#11204) 2025-07-12 21:42:38 -04:00
wozeparrot
667c7a9fa6
clean: keccak cleanups + explicit shapes (#11202) 2025-07-12 18:17:14 -07:00
chenyu
a0438012af
remove Kernel.get_program [pr] (#11203) 2025-07-12 20:50:29 -04:00
George Hotz
d67c8e7b42
local metal on metal in uop syntax (#11185)
* local metal on metal in uop syntax

* TODO: just put the axis_info in the kernelinfo

* local

* amd_matmul works @ 28 TFLOPS

* clean up matmul

* kernel8 works

* remove that

* locals

* axistype innovation

* work

* cleanup

* kernel3 regs

* cleanup kernel3

* work

* why is it broken

* no beam

* reenable

* permutes
2025-07-12 16:31:19 -07:00
uuuvn
40da5f0c81
fix silent mypy failure in ci (#11201)
Example: https://github.com/tinygrad/tinygrad/actions/runs/16215577171/job/45784110543?pr=11177#step:7:20

Caused by footguny exception in how `set -e` works:

```bash
python -m mypy --strict-equality --lineprecision-report . && cat lineprecision.txt
```

Will fail (and have non-zero exit code if run in interactive mode) but
because there is `&&` it won't count as script-terminating failure in a
script with `set -e` and instead as a test (similar to how fail of a
command in if condition won't count as a script-terminating failure
despite having non-zero exit code)
2025-07-12 15:12:25 -04:00
chenyu
73caa5dd1b
remove Kernel.membufs [pr] (#11200) 2025-07-12 14:48:47 -04:00