Commit graph

10,980 commits

Author SHA1 Message Date
George Hotz
db8c6d9a04 work 2025-11-10 15:26:16 -08:00
George Hotz
0647f87bf8 outer range runs in the scheduler 2025-11-10 14:49:42 -08:00
chenyu
829cdafccc
update openpilot slow conv uop ast (#13197)
the two remaining slow ones
2025-11-10 17:03:20 -05:00
George Hotz
0c978d45e6
stub attention (#13196)
* stub attention

* name the kernels
2025-11-10 13:48:38 -08:00
chenyu
58c30fc7ce
minor image_conv2d cleanup (#13193) 2025-11-10 16:05:40 -05:00
chenyu
60e55d9a2d
line count 18500 (#13191) 2025-11-10 13:52:13 -05:00
nimlgen
09a59c2203
qcom: support new chip versioning (#13185)
* qcom: support new chip versioning

* ops

* nit

* fix

* f
2025-11-10 23:57:29 +08:00
qazal
50934050bc
sqtt: append all wave execs (#13190) 2025-11-10 23:50:08 +08:00
qazal
38a24731a1
cleanup sqtt tooling (#13188)
* cleanup viz/serve.py

* use latest profile in rgptool.py

* unwrap nullable in roc.py, fix disasms typing
2025-11-10 20:52:57 +08:00
qazal
845a24dcc6
viz: group sqtt waves by program (#13187)
* viz: group sqtt waves by program

* color the names
2025-11-10 19:25:23 +08:00
George Hotz
fd6803000e
mutmut cfg (#13184)
* mutmut cfg

* coveragerc
2025-11-09 23:29:29 -08:00
wozeparrot
6252831ceb
feat: initial tk library (#13160) 2025-11-09 22:54:29 -08:00
George Hotz
925231aec1
repeat does less reshape for 1s (#13183) 2025-11-09 19:43:02 -08:00
George Hotz
d7369de048 hotfix: update weekly commits table 2025-11-09 19:37:06 -08:00
chenyu
6c48c87e51
improved ASSERT_MIN_STEP_TIME (#13182)
* improved ASSERT_MIN_STEP_TIME

getting close, current time +1ms  then round up

* relax
2025-11-09 16:41:12 -05:00
nimlgen
17715688c7
system: validate vendor for APLPCIIfaceBase (#13181) 2025-11-10 02:49:21 +08:00
nimlgen
614783693e
nv: remove hardcoded expansion_rom_off (#13180)
* nv: remove hardcoded expansion_rom_off

* to max size
2025-11-09 21:43:19 +08:00
chenyu
e1d46de8f8
update GROUPTOP heuristic more (#13178)
reverts #13176
2025-11-09 02:31:12 -05:00
chenyu
41e45c20ff
minor stuff reading the printed code [pr] (#13177) 2025-11-09 00:58:51 -05:00
chenyu
8e868dced8
only GROUPTOP one reduce kernel (#13176)
* only GROUPTOP one reduce kernel

* ALLOWED_GATED_READ_IMAGE=148
2025-11-08 22:38:44 -05:00
chenyu
834067d91f
move onnx import in compile3 (#13172)
only used in test_vs_onnx
2025-11-08 09:44:34 -08:00
nimlgen
7f3240dbfe
nv: cleanup alloc (#13170)
* nv: cleanup alloc

* okay okay
2025-11-09 00:14:46 +08:00
qazal
7250fc0354
viz: double click on kernel run goes to codegen (#13147) 2025-11-08 23:40:50 +08:00
qazal
8a7fa9e7b4
sqtt: show total cycles of kernel in viz (#13169) 2025-11-08 21:00:40 +08:00
chenyu
2ba8b4946f
external_benchmark_op_cat.py (#13168)
* external_benchmark_op_cat.py

cat kernel that's 1ms on master and 50us with no GROUP and with NOLOCALS

* fix
2025-11-08 01:54:10 -05:00
chenyu
a62496cb3d
clean up get_grouped_dims [pr] (#13159) 2025-11-08 01:53:54 -05:00
wozeparrot
eb0192b0bb
feat: print ranges that aren't ended (#13167) 2025-11-07 22:01:29 -08:00
George Hotz
b41541bc44
bounty: Remove Tensor._pool alternative implementation and verify kernels remain the same (#13164) 2025-11-07 16:59:48 -08:00
George Hotz
ffb9e8396f
fix indexing bug with convs
* minimal difference for ONE_POOL=1

* fix indexing bug

* improve indexing debugger

* more debugger improvements

* always for reshape
2025-11-07 16:45:19 -08:00
chenyu
6a509da7f3
Scheduler.reduceops helper [pr] (#13162) 2025-11-07 18:59:46 -05:00
George Hotz
2413311289
make _pool simpler (#13161)
* make _pool simpler

* just syntax

* more correct and smaller

* try this now

* Revert "try this now"

This reverts commit 607cdc2164.

* ONE_POOL
2025-11-07 15:58:44 -08:00
George Hotz
70054cdb14
move backward cast to broadcasted, expand to mixins (#13156)
* shrink_to mixin

* move backward cast into _broadcasted

* expand to movement mixin

* move a few more

* fix spec issue
2025-11-07 15:07:47 -08:00
George Hotz
f2519ea0ba
shrink_to mixin (#13155) 2025-11-07 11:46:24 -08:00
C T
0f9d7f650d
whisper: fix oob, explicit dtype (#13144)
* fix dtype depending on numpy version

numpy v2 np.array returns int64 which Tensor passed through for the
first decode call, swallowing the <|notimestamps|> token and corrupting
the sequence

* fix whisper OOB

global limit on whisper's context length

* enforce whisper max_tokens_to_sample (match openai)

local limit on max tokens decoded
2025-11-07 12:55:01 -05:00
Ahmed Harmouche
3ecff3a8da
Fix dim splitting bug for len(dim) == len(limited) case (#13142)
* Fix gpudims bug on webgpu

* Fix split dim bug

* Remove webgpu_bug from examples

* Add test for shape correctness

* Fix 3D indexing

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-11-07 12:31:06 -05:00
nimlgen
b8e48effcb
device: no compilers message with reasons (#13146)
* device: no compilers message with reasons

* typings

* mypy
2025-11-07 23:01:45 +08:00
nimlgen
35e461ef69
hcq: use exception group (#12616)
* hcq: use exception group

* fix
2025-11-07 21:23:12 +08:00
nimlgen
10dc8335d2
tinygpu: fix teardown crash (#13143)
* tinygpu: fix crash

* um?

* double relase

* restore
2025-11-07 19:52:54 +08:00
qazal
d4a216d7d9
viz: display compiler errors (#13141) 2025-11-07 18:09:50 +08:00
qazal
7e94369464
add helper for test_timing custom ops (#13140) 2025-11-07 17:13:55 +08:00
nimlgen
95620426d5
tinygpu: unmap dma when client closed (#13129)
* tinygpu: unmap dma when client closed

* syn

* tiny fixes
2025-11-07 16:08:43 +08:00
wozeparrot
500d7661fa
feat: show range len on index in viz (#13139) 2025-11-06 23:21:27 -08:00
George Hotz
bb6364d7c7
tuplize from linearizer behind flag (#13136)
* remove tuplize from linearizer

* optional tuplize
2025-11-06 20:15:03 -08:00
chenyu
bb8cf948f2
variation of (x%c)+(x//c)*c = x (#13135)
when x is in the form of y//b, the idiv term might have combined
2025-11-06 18:53:28 -05:00
George Hotz
42b34cf83d
bottom up linearizer (#13133)
* bottom up linearizer

* late stores

* more complete

* remove broken heuristic

* upcast size

* opt

* more conservative

* it needs that

* disable opencl half on QCOM

* fix

* make that a real test

* cpu test okay

* ptx skip

* end is after the range
2025-11-06 15:30:32 -08:00
George Hotz
e0d828dba8 little cleanups 2025-11-06 13:58:19 -08:00
chenyu
bfb0c0391f
test custom eye function (#13134)
this version is also faster with NOOPT
2025-11-06 14:51:55 -05:00
George Hotz
290441dd44
do loads early (#13131)
* do loads early

* local and reg
2025-11-06 09:57:09 -08:00
George Hotz
097264853d
very simple priority (#13130)
* very simple priority

* still simple
2025-11-06 09:25:28 -08:00
George Hotz
07b415e831
fixup op order (#13128)
* fixup op order

* more order

* move a few more

* more

* DEBUG_LINEARIZE
2025-11-06 08:50:04 -08:00