Commit graph

4,629 commits

Author SHA1 Message Date
George Hotz
db8c6d9a04 work 2025-11-10 15:26:16 -08:00
George Hotz
0647f87bf8 outer range runs in the scheduler 2025-11-10 14:49:42 -08:00
chenyu
829cdafccc
update openpilot slow conv uop ast (#13197)
the two remaining slow ones
2025-11-10 17:03:20 -05:00
wozeparrot
6252831ceb
feat: initial tk library (#13160) 2025-11-09 22:54:29 -08:00
chenyu
2ba8b4946f
external_benchmark_op_cat.py (#13168)
* external_benchmark_op_cat.py

cat kernel that's 1ms on master and 50us with no GROUP and with NOLOCALS

* fix
2025-11-08 01:54:10 -05:00
George Hotz
ffb9e8396f
fix indexing bug with convs
* minimal difference for ONE_POOL=1

* fix indexing bug

* improve indexing debugger

* more debugger improvements

* always for reshape
2025-11-07 16:45:19 -08:00
Ahmed Harmouche
3ecff3a8da
Fix dim splitting bug for len(dim) == len(limited) case (#13142)
* Fix gpudims bug on webgpu

* Fix split dim bug

* Remove webgpu_bug from examples

* Add test for shape correctness

* Fix 3D indexing

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-11-07 12:31:06 -05:00
nimlgen
b8e48effcb
device: no compilers message with reasons (#13146)
* device: no compilers message with reasons

* typings

* mypy
2025-11-07 23:01:45 +08:00
chenyu
bb8cf948f2
variation of (x%c)+(x//c)*c = x (#13135)
when x is in the form of y//b, the idiv term might have combined
2025-11-06 18:53:28 -05:00
George Hotz
42b34cf83d
bottom up linearizer (#13133)
* bottom up linearizer

* late stores

* more complete

* remove broken heuristic

* upcast size

* opt

* more conservative

* it needs that

* disable opencl half on QCOM

* fix

* make that a real test

* cpu test okay

* ptx skip

* end is after the range
2025-11-06 15:30:32 -08:00
chenyu
bfb0c0391f
test custom eye function (#13134)
this version is also faster with NOOPT
2025-11-06 14:51:55 -05:00
nimlgen
dafdb4bfb1
test hcq open with pytest (#13124)
* test hcq open with pytest

* fi
2025-11-06 20:09:51 +08:00
nimlgen
05e2ff4d87
system: fix flock on pcidevs (#13123)
* system: fix locking of hcq devices

* rename and fullrun

* force ok

* fix

* fix
2025-11-06 19:02:13 +08:00
chenyu
f33c182393
test custom qkv kernel (#13118)
adding the online softmax hits infinite loop so starting with this
2025-11-05 23:32:13 -05:00
George Hotz
9b2b535fa4
fix issue with multi flip (#13115) 2025-11-05 15:28:50 -08:00
George Hotz
4027eef264
fix test warnings (#13114)
* fix test warnings

* precommit passes

* ignore std_mean warning
2025-11-05 15:06:29 -08:00
George Hotz
bcfe42937f
move permute/flip/shrink to mixins (#13113)
* move permute to mixins

* move more stuff

* two more

* fix local mypy

* fix tests

* fix shrink
2025-11-05 14:14:15 -08:00
George Hotz
2d4f01fda0
move mixins to mixin dir (#13105)
* move mixins to mixin dir

* math
2025-11-05 10:18:33 -08:00
chenyu
18d4ecc1f3
lower nv test_gemm_4096 target (#13107) 2025-11-05 11:05:16 -05:00
chenyu
54141e9cb9
DISABLE_COMPILER_CACHE=1 in speed_v_theoretical (#13096) 2025-11-04 11:28:18 -05:00
chenyu
ca17718b6d
remove symbolic_flat (#13083)
* remove symbolic_flat

some kernels are different but sometimes it's better so not clear, will merge as long as benchmark passes

* test_location
2025-11-03 17:25:21 -05:00
George Hotz
1e3d6e49a6
index slicing + allclose (#13071)
* continue work on slicing+allclose

* Revert "Revert "slicing + allclose""

This reverts commit 6c7a12f21c.

* fix tests + better syntax

* forgot an after

* slot is an integer
2025-11-03 13:01:48 +08:00
George Hotz
6c7a12f21c Revert "slicing + allclose"
This reverts commit c9a1e35b1e.
2025-11-03 12:05:44 +08:00
George Hotz
c9a1e35b1e slicing + allclose 2025-11-03 12:00:45 +08:00
George Hotz
962d980919
fuse hasn't worked since rangeify, remove it (#13057) 2025-11-02 14:01:52 +08:00
Sieds Lykles
885b6dea9e
multiple reduce range arange folding (#13047)
* multi reduce arange folding

* add test

* cvar to var

* add circular_pad_bw test
2025-11-01 22:11:26 +01:00
Sieds Lykles
ecb8565f67
Revert "Better cleanup of arange bufferize (#13046)" (#13048)
This reverts commit c99b7dfd4a.
2025-11-01 18:09:37 +01:00
Sieds Lykles
c99b7dfd4a
Better cleanup of arange bufferize (#13046)
* check for reduce and index instead of cast

* add test
2025-11-01 16:16:31 +01:00
chenyu
bebec73471
write custom_sum with set and after (#13045) 2025-11-01 10:45:30 -04:00
George Hotz
e98506735b
add CONTRACT support to UOp programs (#13043)
* add contract support

* use contract

* 342 tflops
2025-11-01 19:11:32 +08:00
chenyu
f396df26ea
test custom sum (#13039)
* test custom sum

this is higher level than set and after?

* only float
2025-10-31 19:25:56 -04:00
Sieds Lykles
3dc593c536
add strip_params to pyrender (#13021)
* add strip_params to pyrender

* update that one too

* strip_parens fix

* cleaner

* add test

* add some more tests

* cleaner strip_parens
2025-10-31 14:15:56 +01:00
George Hotz
bc178d14a9
matmul example on metal showing off tensor core (#13033)
* matmul example on metal showing off tensor core

* flip the args of placeholder

* mat_idx

* imp
2025-10-31 19:40:36 +08:00
George Hotz
e066b3176b hotfix: types and names for custom kernel test 2025-10-31 17:34:55 +08:00
George Hotz
54f48f93c6
working backward pass in custom kernel (#13032)
* working backward pass in custom kernel

* custom_kernel tensor method

* no SPEC=2
2025-10-31 17:26:18 +08:00
George Hotz
b791d70725
support custom UOp kernels (#13028)
* support custom UOp kernels

* no number

* multioutput works

* backward kernel runs

* move kernel class

* grad later

* work

* no tags in kernel graph

* test arange

* arange + contig

* delete comment
2025-10-31 15:51:39 +08:00
chenyu
f6430a0559
add script for one slow openpilot conv (#12953)
* add script for one slow openpilot conv

* fix ruff
2025-10-30 18:08:41 -04:00
Sieds Lykles
4c8362128b
New symbolic renderer + strip parens (#13017)
* new uop renderer

* better tester

* strip parens

* update tests

* split method check_uop_against_string

* use ctx.update instead of add_rendered method

* strip parens based on precedence

* update test

* new symbolic renderer

* add comment
2025-10-30 16:41:32 +01:00
George Hotz
e456f2cb1e
more uop programs (#13007)
* more uop program

* test_matmul_relu

* tests fix
2025-10-30 14:57:59 +08:00
George Hotz
e64d4b3b44
uops programs (#13005)
* uops programs

* work

* work

* more syntax

* more syntax

* comments
2025-10-30 12:28:10 +08:00
George Hotz
2da02f1ae1
add loads at the end (#12988)
* add loads at the end

* simpler

* late load

* tests passing

* fix matvec

* spec test passes

* fix where on load

* fix abs2

* fix more tests
2025-10-30 10:42:19 +08:00
nimlgen
4b001ec723
amd: pmc in mockgpu (#13000)
* amd: pmc in mockgpu

* fix

* do not open in ci
2025-10-30 01:52:02 +08:00
Sieds Lykles
70bce62c67
dont collapse possibly empty symbolic range (#12994)
* dont collapse a symbolic range based on min/max

* refactor z3 renderer

* include sink explicitely instead of dtypes.void

* use dtype.scalar()
2025-10-29 12:17:09 +01:00
George Hotz
819592ee67 hotfix: disable DoubleMatmul for PTX 2025-10-29 16:37:17 +08:00
George Hotz
30ca3f2af8
all double matmul (#12993)
* fix more double matmuls

* a few more

* all double matmul passes

* opts for flash attention

* fix spec

* comment
2025-10-29 16:25:27 +08:00
Sieds Lykles
9f39f6391c
shared_codegen_spec and fix index spec (#12967)
* split shared_codegen_spec and fix index

* add VCONST to program_spec and move index to shared_codegen_spec

* working ignore_oob=0

* cleanup

* fix spec

* undo that

* move barrier and special earlier

* fix more spec issues

* more updates

* remove special from program_spec

* cleanup and fixes

* move more to shared

* special is not in shared_spec

* some comments

* dont do bounds check there
2025-10-29 09:14:11 +01:00
George Hotz
1c362736aa
fix more double matmuls (#12991)
* fix more double matmuls

* a few more
2025-10-29 16:09:48 +08:00
George Hotz
8c47cf4323
pcontig double matmul works (#12899)
* pcontig double matmul works

* tests

* contract

* closer

* works-ish

* add that broadcast

* 2 more work

* something

* disable broken ones

* llvm

* align 16
2025-10-29 13:06:43 +08:00
George Hotz
b147e7e8e6
flatten bufferize (#12984)
* flatten bufferize

* simpler

* tests pass

* flat

* not flat
2025-10-29 11:23:43 +08:00
chenyu
ef16e6c68c
unwrap instead of cast [pr] (#12982) 2025-10-28 21:29:23 -04:00