Commit graph

5,245 commits

Author SHA1 Message Date
George Hotz
37a40bf975 early lower cat 2026-03-07 11:39:20 +08:00
George Hotz
6fd18ef875
rename CAT to VCAT (#15167) 2026-03-06 18:46:28 +08:00
Roelof van Dijk
059c6326c0
metal uint32 icb offset overflow (#15156)
* metal uint32 icb offset overflow

fix: diff

supports_exec_item

GraphRunner.supports_exec_item

tests

fix: can't import on non-metal

stricter

* also test the non-metal buffer case

* imports on non-mac
2026-03-06 00:54:39 +03:00
chenyu
da61088ca4
more divmod recombine (#15162) 2026-03-05 12:53:22 -05:00
chenyu
167a1d56a6
improve divmod folding (#15148)
canonicalize to div than mod which enables more simplifcation
2026-03-05 10:07:36 -05:00
qazal
5bf542469d
viz: python traceback for USER device (#15160)
* start

* ux

* unittests
2026-03-05 20:22:09 +09:00
Roelof van Dijk
d65923bda5
tensor.py: add normalize function (#15159)
* tensor.py: add normalize function

* p==0 should match torch
2026-03-05 18:55:53 +08:00
Ananta Ranganathan
8ef656324e
FIXED TEST Q5_K GGUF dequant (#15147)
* q5_k gguf support as separate pr

* fix the problematic gemv test for q5_k

* add assert to make sure the gemv test cant fail with warning instead of error
2026-03-05 16:32:36 +08:00
George Hotz
e97922a57c
LLM speedup with two jits, prefill/rollout (#15153)
* START_TIME

* print cleanup

* fix tests
2026-03-05 16:21:09 +08:00
George Hotz
fb43b415f9
fix symbolic shape call + chunked prefill (#15149)
* fix precompile for symbolic shape

* chunked prefill

* cleaner

* test that
2026-03-05 14:02:26 +08:00
George Hotz
8a82b26522
llm: print the prefill cache size (#15146)
* print the llm prefill cache size

* mock that too
2026-03-05 12:13:28 +08:00
George Hotz
72a9ed6e23
fix render depth bug + add warmup to serve + no realize default (#15144)
* fix render depth bug + add warmup to serve

* make realize not the default
2026-03-05 11:21:16 +08:00
George Hotz
ac1847cbf7
fully symbolic llm (#15097)
* work

* llm symbolic (almost)

* work

* revert that

* llm sym

* works

* cleanups

* cache tokens with the kv cache

* cleanups

* cleanups
2026-03-05 10:22:11 +08:00
qazal
33a1970045
sqtt: simplify inst mapping, validate JUMP processing in CI (#15139)
* jump cleanup

* assert there's a JUMP

* new example for JUMP

* regenerate examples

* rdna4 work

* new packets

* work

* less for branch handling

* less verbose

* fix err message
2026-03-05 09:53:12 +09:00
chenyu
04da527a7a
minor div_and_mod_symbolic cleanups (#15138) 2026-03-04 19:05:44 -05:00
chenyu
34594bcaaf
Revert "bug in metal: offset is stored as uint32, overflow (#15129)" (#15136)
This reverts commit 9c58db16fa.
2026-03-04 16:54:42 -05:00
Roelof van Dijk
9c58db16fa
bug in metal: offset is stored as uint32, overflow (#15129)
* metal uint32 icb offset overflow

* fix: diff

* supports_exec_item

* GraphRunner.supports_exec_item

* tests

* fix: can't import on non-metal
2026-03-04 22:52:12 +03:00
chenyu
4cce283790
relax test_tqdm_perf (#15134) 2026-03-04 12:58:47 -05:00
chenyu
fae400d300
update assign tests to also test the expected behavior (#15132) 2026-03-04 11:34:43 -05:00
chenyu
1f96cc2b51
update non-contiguous buffer error message [pr] (#15131)
* update non-contiguous buffer error message [pr]

also cleaned up the tests

* order
2026-03-04 11:13:26 -05:00
nimlgen
563d5c3211
more graph tests (#15130) 2026-03-04 19:01:12 +03:00
Christopher Milan
592f9bf6c6
set OPENPILOT_HACKS=1 to enable replace assign (#15123) 2026-03-04 05:26:04 -05:00
wozeparrot
759c7fc81c
failing test for allreduce memory usage (#15106) 2026-03-03 23:38:38 -08:00
George Hotz
2d72a4a90c
fix copying padded const (#15116)
* fix const padding cpu

* remove comment
2026-03-04 10:39:45 +08:00
nimlgen
4b42bb54aa
am: reset sdma to start from 0 (#15109) 2026-03-03 18:14:46 +03:00
George Hotz
01ddb4c267
add precompile to call (#15099)
* add precompile to call

* put get back

* something

* after structure

* alt

* keep it call

* resolve call

* resolve linear call

* precompile works with llm

* revert rangeify

* color for debugging

* getenv PRECOMPILE

* clean up deco pattern

* fully recursive sink scheduling

* revert llama

* fix SPEC=2
2026-03-03 22:32:42 +08:00
qazal
c7f908b788
sqtt: fix rdna4 structs (#15111)
* work

* DEBUG=2
2026-03-03 23:32:14 +09:00
qazal
8dd691761d
sqtt: remove old files (#15108) 2026-03-03 22:43:24 +09:00
wozeparrot
529318259c
fix: fix null tests to actually use null device (#15104) 2026-03-03 02:05:47 -08:00
wozeparrot
92c16810ac
feat: per device mem_used (#15100) 2026-03-03 01:31:28 -08:00
b1tg
a9ea36de79
assembly/amd: v_cmp_lg_f32 is ordered not-equal (#14982) 2026-03-03 15:37:48 +08:00
wozeparrot
c35de9bd68
asm_gemm: support more sharding (#15002) 2026-03-02 23:16:37 -08:00
chenyu
5dcf29b1a0
use clone in test_swap_slices (#15096) 2026-03-02 22:05:12 -05:00
Christopher Milan
c70e8af068
move IMAGE FLOAT16 logic to allocations (#15095)
* FLOAT16 logic in allocations

* cleanup

* separate that

* only apply when IMAGE == 1

* test passing now

* create image buffers earlier
2026-03-02 22:00:05 -05:00
George Hotz
d483e4153a
buffer view is like buffer (#15082)
* buffer view is like buffer

* fix

* swap_reshape_shrink

* contiguous on gguf, fix overlap

* revert that

* _device_supports_view

* this

* fix that test

* 0 buffers

* that test was wrong

* this

* check correct size

* contig BUFFER_VIEW

* this

* fix tests

* buffer view tests

* om

* fix torch

* no MOCKGPU

* skip
2026-03-03 09:52:33 +08:00
chenyu
14d1c5fdfd
assign fusion tests on detach and contiguous_backward (#15092) 2026-03-02 15:21:51 -05:00
qazal
f7aeff6061
viz: cli.py cleanups, do not require PYTHONPATH (#15085)
* cleanup the print

* sys.exit

* equal check

* cleanup unpacker

* cli doesn't need PYTHONPATH

* no semicolons

* %s/PYTHONPATH=. //g
2026-03-02 19:24:38 +09:00
Christopher Milan
977c270774
IMAGE=1 kernel count failing tests (#15083) 2026-03-02 04:35:26 -05:00
George Hotz
3539693555
Support triu variable on diagonal + SDPA symbolic (#15081)
* triu variable

* fails

* dumbbb

* no commutative in reshape

* real fix

* revert that

* sdpa symbolic tests
2026-03-02 12:19:48 +08:00
Nick
8e8e9f6ff6
assert removal for _tri() + tests (#15073)
* assert removal for _tri() and tests

* removed import

* tests triu/tril like in prefill

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2026-03-02 10:34:28 +08:00
nimlgen
ccbbca05ef
beam: add dev_timeout for am (#15063)
* beam: add dev_timeout for am

* all covered

* fk

* x

* fuzz

* reset

* f
2026-03-01 16:57:29 +03:00
chenyu
103ea16ec0
add contiguous back to svd (#15074)
can cause infinite loop
2026-02-28 16:49:26 -05:00
chenyu
fe0fa8333b
Revert "improve Tensor.sort indices (#15070)" (#15072)
This reverts commit e3003631f2.
2026-02-28 14:40:30 -05:00
chenyu
e3003631f2
improve Tensor.sort indices (#15070)
* improve Tensor.sort indices

instead of N^2 match at the end, have an arange to start and go through the same N(logN)^2 path

* contiguous
2026-02-28 14:16:16 -05:00
chenyu
76170d035a
relax atol for test_xlm_roberta_large (#15066) 2026-02-28 11:22:35 -05:00
nimlgen
9b3450c9da
test gpu crash on cdna (#15062) 2026-02-28 13:17:59 +03:00
George Hotz
bb84e389cf
functions for llama trainer (#15045)
* functions for llama trainer

* function there

* axis match

* fix multi

* lil cleaner

* there's a bug with HK_FLASH_ATTENTION

* training functions

* for commit
2026-02-28 12:15:18 +08:00
chenyu
151608aa90
update test_multiple_to_single_device (#15056)
follow up to #14482, add SCACHE=0 to the test
2026-02-27 21:44:33 -05:00
chenyu
5fd06f4f02
differentiable setitem (#15054)
* differentiable setitem

go through the where path for bw

* no return
2026-02-27 17:25:15 -05:00
chenyu
db6b3e1edc
fix mixed setitem with both basic and tensor indexing (#15050) 2026-02-27 15:35:48 -05:00