Commit graph

91 commits

Author SHA1 Message Date
Christopher Milan
bc180a963c
deprecate <dev>=1 in favor of DEV=<dev> (#15467)
* start work on target

* add test

* update actions to use DEV

* update docs

* update readmes

* tests need that too

* update example

* update tests (comments)

* fix that test

* ruff

* mypy

* oops

* remove getenvs

* don't add Target yet

* and the test

* lint

* and docs

* more stuff

* assert

* few more fixes

* test assert
2026-03-26 03:48:03 -04:00
nimlgen
2da008ae3b
jit: rm replan (#15433) 2026-03-23 19:31:51 +08:00
nimlgen
c74fa9bbe1
fix jitbeam not triggered (#15424)
* um

* beam

* x

* f
2026-03-23 15:34:59 +08:00
nimlgen
9656d97d97
jit: captures linears, not execitems (#15399)
* jit: captures linears, not execitems

* x

* um

* etsts

* mockcuda
2026-03-21 16:32:12 +08:00
Christopher Milan
1560b534a5
remove IMAGE=2 (#15312) 2026-03-20 06:26:52 -04:00
Christopher Milan
0c89340a1e
automatically emulate unsupported (tiny) floats [skip_process_replay] (#15366) 2026-03-20 02:31:44 -04:00
chenyu
da1700e16b
dtypes.index -> dtypes.weakint (#15377) 2026-03-20 01:08:46 -04:00
qazal
176ad47d7d
cdna4 emulator testing ASM_GEMM in CI (#15373)
* cdna emulator work

* accvgprs

* cdna passes most tests

* ruff

* add cdna4 to tests

* cdna emu

* crash

* pass?

* work

* gen

* clean up wave_size access

* asm_gemm passes

* remove acc from dsl.py, emulator can keep its different reg file

it's purely an encoding here, the ASM_GEMM already encodes acc srcs with v[], this can
be cleaned up later, but not functionally required for emulator.

* split asm_gemm tests to ones fast on the emulator

* don't do that

* 124 stays null on rdna

* the segfault was because of hw regs, not this

* Revert "clean up wave_size access", it's explicitly tested

This reverts commit 1202ff5787.

* nullcopyout

---------

Co-authored-by: George Hotz <geohot@gmail.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2026-03-20 05:51:30 +09:00
chenyu
b39816e998
failed test case for Tensor(np, "bf16") (#15358) 2026-03-18 23:40:14 -04:00
wozeparrot
c45a606750
feat: no if in rand (#15333) 2026-03-18 15:09:51 -07:00
George Hotz
5524916e39
llama compute gradients explicitly + 243 GB of RAM on MP=8 (#15343)
* llama compute gradients explicitly

* apply grads

* fix multi issue

* multi BUFFER_VIEW support

* simpler

* skip the flaky test
2026-03-18 19:54:40 +08:00
chenyu
761ce8c0d3
fix Invalid combine rules (#15345)
* fix Invalid combine rules

wrong conditions broke setiem into invalids

* fix
2026-03-18 04:58:02 -04:00
chenyu
fceb21c315
Tensor(uop) uses device from uop (#15340) 2026-03-18 02:56:06 -04:00
George Hotz
6109117af1
anonymous buffers are Invalid (#15336)
* anonymous buffers are Invalid

* unique_const

* work

* remove invalid writes

* test_anonymous_buffers_in_function
2026-03-18 14:52:56 +08:00
chenyu
ac7a348d06
dtypes.as_const -> DType.const (#15337)
does not need to be a staticmethod
2026-03-18 00:48:41 -04:00
wozeparrot
b45edeb965
fix: rand supports large tensors (#15329) 2026-03-17 15:45:41 -07:00
wozeparrot
674c760974
embedded bwd vocab shard (#15001)
* fix: remove more multi from call

* feat: embedding bwd vocab sharding

* clean: unused import

* clean: don't actually need this pattern
2026-03-16 19:37:16 -07:00
qazal
33bd33e783
sqtt: add CDNA ops enum, show in viz (#15140) 2026-03-17 09:38:42 +09:00
qazal
5cd1daa3bc
cdna asm_gemm in one file, remove old rdna3 asm (#15281) 2026-03-16 04:32:30 +09:00
chenyu
842c978df3
remove staticmethod dtypes.max/min (#15227)
always use x.dtype.max/min
2026-03-11 23:11:24 -04:00
b1tg
18dc77ccab
add fp8 fnuz dtypes with PYTHON backend support (#14945)
* add fp8 fnuz dtypes with PYTHON backend support

* rm emu related change

* clarify fp8 fnuz zero handling

* Revert "rm emu related change"

This reverts commit efa4763c22.

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2026-03-11 22:30:18 -04:00
Christopher Milan
25d86ec9e1
start using Invalid in image_conv2d (#15208) 2026-03-10 07:11:06 -04:00
Christopher Milan
7810be8d3c
compile QCOM without opening device (#15165)
Co-authored-by: Comma Device <device@comma.ai>
2026-03-06 06:24:27 -05:00
Roelof van Dijk
d65923bda5
tensor.py: add normalize function (#15159)
* tensor.py: add normalize function

* p==0 should match torch
2026-03-05 18:55:53 +08:00
chenyu
fae400d300
update assign tests to also test the expected behavior (#15132) 2026-03-04 11:34:43 -05:00
nimlgen
563d5c3211
more graph tests (#15130) 2026-03-04 19:01:12 +03:00
Christopher Milan
592f9bf6c6
set OPENPILOT_HACKS=1 to enable replace assign (#15123) 2026-03-04 05:26:04 -05:00
George Hotz
2d72a4a90c
fix copying padded const (#15116)
* fix const padding cpu

* remove comment
2026-03-04 10:39:45 +08:00
wozeparrot
c35de9bd68
asm_gemm: support more sharding (#15002) 2026-03-02 23:16:37 -08:00
Christopher Milan
c70e8af068
move IMAGE FLOAT16 logic to allocations (#15095)
* FLOAT16 logic in allocations

* cleanup

* separate that

* only apply when IMAGE == 1

* test passing now

* create image buffers earlier
2026-03-02 22:00:05 -05:00
George Hotz
d483e4153a
buffer view is like buffer (#15082)
* buffer view is like buffer

* fix

* swap_reshape_shrink

* contiguous on gguf, fix overlap

* revert that

* _device_supports_view

* this

* fix that test

* 0 buffers

* that test was wrong

* this

* check correct size

* contig BUFFER_VIEW

* this

* fix tests

* buffer view tests

* om

* fix torch

* no MOCKGPU

* skip
2026-03-03 09:52:33 +08:00
Christopher Milan
977c270774
IMAGE=1 kernel count failing tests (#15083) 2026-03-02 04:35:26 -05:00
George Hotz
3539693555
Support triu variable on diagonal + SDPA symbolic (#15081)
* triu variable

* fails

* dumbbb

* no commutative in reshape

* real fix

* revert that

* sdpa symbolic tests
2026-03-02 12:19:48 +08:00
Nick
8e8e9f6ff6
assert removal for _tri() + tests (#15073)
* assert removal for _tri() and tests

* removed import

* tests triu/tril like in prefill

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2026-03-02 10:34:28 +08:00
chenyu
151608aa90
update test_multiple_to_single_device (#15056)
follow up to #14482, add SCACHE=0 to the test
2026-02-27 21:44:33 -05:00
chenyu
5fd06f4f02
differentiable setitem (#15054)
* differentiable setitem

go through the where path for bw

* no return
2026-02-27 17:25:15 -05:00
chenyu
db6b3e1edc
fix mixed setitem with both basic and tensor indexing (#15050) 2026-02-27 15:35:48 -05:00
chenyu
1406d49eef
failed test cases for advanced setitem (#15048) 2026-02-27 10:50:18 -05:00
chenyu
0f94a4bb73
failed test case for early fixup const copy (#15038)
* failed test case for early fixup const copy

wrong with PAD

* test no copy
2026-02-26 19:09:33 -05:00
chenyu
3a4db53b43
raise RuntimeError in schedule for conflicted var_val [pr] (#15031) 2026-02-26 15:16:01 -05:00
chenyu
127136421d
enable a few WEBGPU isnan tests that work now (#14967)
* enable a few WEBGPU isnan tests that work now

* still failed
2026-02-23 11:06:08 -05:00
ttomsa
0366474089
Bool cast to cmpne (#14544)
* test

* rm in llvmir

* rm in ptx and nir

* hmmmm

* rm in decompositions

* skip tests

* add test

* just this

* rm comment

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2026-02-23 10:31:36 -05:00
George Hotz
b824490e3f
allocate generates a call (#14958)
* allocate generates a call

* symbolic works too

* DEFINE_VAR is param

* replace param later

* apply buffers

* name

* upd

* this was a bug...
2026-02-23 15:59:20 +08:00
George Hotz
677145b393
all consts have shapes (#14959)
* all consts have shapes

* vconst has shape too

* use normal schedule

* cast ptrdtype

* image

* bitcast issue + hack
2026-02-23 10:26:50 +08:00
chenyu
4424757b9a
update test_sharded_memory (#14956)
cleaned up and moved to test/null
2026-02-22 16:56:08 -05:00
b1tg
f9b7493e7a
cleanup fp8 conversion helpers and fp8 edge-case tests (#14953)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2026-02-22 09:16:42 -05:00
chenyu
0255a64a27
update test_jit_init_empty (#14938)
* update test_jit_init_empty

now it fails silently

* that
2026-02-21 09:01:50 -05:00
George Hotz
8ef5544e4a
realized PYTHON copies (#14934)
* realized PYTHON copies

* comment that out

* fix that test

* append afters

* contig

* disk copies

* should be 124

* 332
2026-02-21 20:29:31 +08:00
qazal
8278886cf9
test_profiler cleanup, non flaky cpu_profile test (#14932)
* test_profiler cleanup, non flaky cpu_profile test

* existing device is okay
2026-02-21 16:58:10 +09:00
qazal
c5029fa460
jit case with Tensor.empty input, realized means allocated (#14930)
* simple failing jit test case with Tensor.empty

* this used to exist in ops.py...

* Revert "removed if self.buffer.is_allocated() in realized (#14836)"

This reverts commit 72cf603805.
2026-02-21 16:33:55 +09:00