Commit graph

13,471 commits

Author SHA1 Message Date
George Hotz
69e1f3b551
remove vec2 from image in gater (#16165)
* remove vec2 from image in gater

* only simple idx

* fix python with new image style

* fix vconst

* just vconst and stack

* cast to int there

* fix for const

* fix process replay
2026-05-12 19:25:52 -07:00
chenyu
2172363be5
don't use Tensor indexing in svd (#16174)
prepare mixin, also about 4X faster for 8x8 input
2026-05-12 21:56:19 -04:00
chenyu
420a08c6d1
qr to mixin (#16173) 2026-05-12 21:23:25 -04:00
chenyu
c6a82fe927
functional qr and svd (#16172)
no clone and setitem, will move to mixin next. slightly faster but still quite slow
2026-05-12 19:12:08 -04:00
Christopher Milan
3844a31f87
ci: untangle cuda/ocelot, less apt (#16171)
* ci: untangle cuda/ocelot, less apt

* ldconfig
2026-05-12 18:14:03 -04:00
Christopher Milan
316607f004
dsp: don't use docker in ci (#16167)
* dsp: don't use docker in ci

* add setup script for macos docker
2026-05-12 17:11:03 -04:00
chenyu
bdcdf1f1a1
jittable masked_select and nonzero (#16170)
* jittable masked_select and nonzero

make jittable with `size=`, matches jax

* COMPILE_ONLY
2026-05-12 16:39:36 -04:00
wozeparrot
a613bcfc6d
allow after on contiguous in spec (#16169)
* feat: allow after on contiguous

* feat: add test
2026-05-12 13:11:44 -07:00
chenyu
7c3e3fa154
fix empty input for masked_select and nonzero (#16168) 2026-05-12 15:36:51 -04:00
chenyu
da3b7e89a4
atol in test_custom_kernel_multi_output_backward_interacting (#16166) 2026-05-12 14:42:12 -04:00
chenyu
25583f6dc1
fix cumsum dtype for 0d input (#16164) 2026-05-12 14:18:08 -04:00
George Hotz
64c81dfd24
add all codegen stages to spec_tensor (#16163) 2026-05-12 10:35:38 -07:00
chenyu
f3e3c3851f
explicit args to Tensor.rand (#16161)
added requires_grad, other kwargs were silently dropped
2026-05-12 12:53:39 -04:00
nimlgen
e93fb5f9b9
hcq2: remove hcqprogram (#16157)
* hcq2 rm program

* nonbeauty

* no prog

* tiny

* f

* x
2026-05-12 18:49:13 +03:00
nimlgen
a708542308
fix ci spec (#16156) 2026-05-12 17:57:11 +03:00
nimlgen
e5729935c6
time_call (#16152)
* time_call

* x

* fix caches
2026-05-12 16:58:28 +03:00
qazal
fe39cf148a
add Ops.SOURCE test (#16155)
* simple failing test

* raises

* change
2026-05-12 22:49:32 +09:00
qazal
5cd0494b14
viz: canonicalize ast for schedule to codegen linking (#16154)
* simple failing test

* always null device

* viz: canonicalize ast for schedule to codegen linking

* SCACHE
2026-05-12 22:40:21 +09:00
qazal
c1d125ff3b
llm: add markers to --benchmark (#16153)
* markers in llm

* ui fix
2026-05-12 20:14:11 +09:00
wozeparrot
e9359d9e7d
more llama mp fixes (#16151)
* llama: SPLIT_W13

* llama: fix with no fused kernels

* llama: cast to bf16 on non asm_gemm patH

* llama: new mp flags
2026-05-11 21:29:23 -07:00
chenyu
09fd80fba6
fix randperm and _multi_like drop requires_grad (#16150) 2026-05-11 23:23:34 -04:00
George Hotz
8294d105a7
Update the spec in spec.py to match the current state (#16132)
* start work on specv2

* more spec

* more spec

* fix amd emulator

* more spec

* more

* fix test_uop_graph

* move those

* spec=2

* skip those questionable tests

* ptx fix

* more spec=2

* store

* allow custom function in tensor

* spec 2

* fix beam search for tensor cores

* delete the old specs

* fix import
2026-05-11 20:07:47 -07:00
chenyu
3942a80f66
fix wrong kwargs passed into rands (#16149)
working towards explicit args for these
2026-05-11 22:22:06 -04:00
Christopher Milan
039d84ff02
Revert "onnx: deduplicate simple proto parsers" (#16148)
This reverts commit 83eaefcd0f.
2026-05-11 21:45:17 -04:00
Christopher Milan
20f587d5d5
nv: rm _download (#16147) 2026-05-11 19:56:37 -04:00
chenyu
371ab2023f
clean up image_dot and image_conv2d (#16145) 2026-05-11 19:37:58 -04:00
Vikram Rangarajan
effa263865
Torch backend aten::cat.out fix (#16121)
* Handle empty 1D tensors in cat_out

* Undid other changes

* Fixed torch cat

* Improved cat.out, added more tests

* Cleaned code

* Type hinted dim

* Removed whitespace
2026-05-11 16:28:16 -07:00
chenyu
63c1f00b80
disable test_svd_general again (#16146)
flaky on CI
2026-05-11 19:24:32 -04:00
Christopher Milan
2dccd4a3eb
am: autogen pmc (#16143)
* am: autogen pmc

* cleanup

* fix

* type
2026-05-11 19:22:12 -04:00
Christopher Milan
7ba55ad3ba
nv: autogen regs (#16139)
* nv: autogen regs

* flcn cot

* ci

* gen
2026-05-11 18:52:24 -04:00
chenyu
0b02fb6797
Revert "[pr] match torch rmsnorm (#16122)" (#16144)
This reverts commit 692257dd70.
2026-05-11 17:53:42 -04:00
chenyu
fbe8be0b8b
style cleanup to Tensor.qr and svd (#16142)
* style cleanup to Tensor.qr and svd

same kernels

* more

* enable
2026-05-11 17:16:59 -04:00
qazal
fc2cc1d77a
viz: call graph renderer example (#16141)
* work

* emits

* this

* cleaner repr for custom binaries

* --call-graph

* _ref

* this

* start

* this

* everything execpt the pyrender

* bring pyrender back
2026-05-12 05:07:30 +09:00
chenyu
f65e343fb3
spec.py cleanups (#16140)
removed END from shared_spec and NOOP from full_spec
2026-05-11 15:59:49 -04:00
Joshua James Venter
692257dd70
[pr] match torch rmsnorm (#16122)
* [pr] match rmsnorm torch

Signed-off-by: Joshua James Venter <venter.joshua@gmail.com>

* 1e-5

* ops.md

---------

Signed-off-by: Joshua James Venter <venter.joshua@gmail.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2026-05-11 14:36:41 -04:00
Sachith Shetty
59a81559d4
fix: add self.device to qr, svd, masked_select intermediates (#16131) 2026-05-11 11:22:54 -04:00
nimlgen
70c2480e71
hcq2 to extra (#16126)
* hcq2 in extra

* correct

* some revert from non-extra

* cln

* cpu

* x

* attach

* min

* remove attach

* linter
2026-05-11 17:17:30 +03:00
nimlgen
ad9738892c
get_buf() for Buffer (#16134)
* p

* mypy

* x
2026-05-11 16:36:14 +03:00
qazal
2dd84416bf
viz/cli: schedule renderer (#16101)
* simpler steps

* work

* work

* iterate

* faster

* better

* simplify more

* sys stdin

* less

* work

* work and mv

* better

* seen bufs

* all call graphs

* print query

* ux

* param to buffer / buffer_view

* work

* respect NO_COLOR in uop_to_json

* less

* render uops

* rm custom renderer

* call can't pyrender.

* unrelated diff

* assert

* 5
2026-05-11 01:56:16 +09:00
George Hotz
53f9587099 add canary 2026-05-10 09:38:18 -07:00
George Hotz
28cb7f1bcc update readme with contributing guidelines 2026-05-10 09:35:48 -07:00
George Hotz
daed602569
rename BUFFERIZE to STAGE (#16125) 2026-05-10 09:26:46 -07:00
qazal
39ce780907
viz/cli: emit all runs of selected kernel, json fixes (#16124)
* keep print

* --json in tests, sqtt --json err

* work

* import

* less

* line
2026-05-10 21:45:51 +09:00
qazal
51c7dafb0d
split viz cli test helpers (#16123) 2026-05-10 19:42:24 +09:00
chenyu
b2a682ec60
remove _shape check in pm_mops [pr] (#16120)
seems fine now
2026-05-09 17:54:22 -04:00
wozeparrot
026688f03f
llama: move to correct dir (#16118) 2026-05-08 19:42:16 -07:00
Christopher Milan
a7512e0d12
PYTHON: images have no alignment constraints (by default) (#16115) 2026-05-08 20:35:03 -04:00
Christopher Milan
105b037c3c
cl: image alignment in arch (#16106) 2026-05-08 19:33:33 -04:00
Charlie Kerfoot
71a8c0da09
fix: trailing space format string (#16005) 2026-05-08 16:31:10 -07:00
Pawan
4dd6ad3514
gradient: add TRUNC backward (#15925)
* gradient: add TRUNC backward

* test: move round quantization gradient to test_ops
2026-05-08 16:27:55 -07:00