Commit graph

10,955 commits

Author SHA1 Message Date
George Hotz
7c4971f345 ONE_POOL 2025-11-07 15:56:11 -08:00
George Hotz
a5fd297df5 Revert "try this now"
This reverts commit 607cdc2164.
2025-11-07 15:52:33 -08:00
George Hotz
607cdc2164 try this now 2025-11-07 15:50:08 -08:00
George Hotz
d62c733b3f more correct and smaller 2025-11-07 15:44:19 -08:00
George Hotz
a6e1cc3c65 just syntax 2025-11-07 15:40:38 -08:00
George Hotz
6f2dd96df9 make _pool simpler 2025-11-07 15:33:55 -08:00
George Hotz
70054cdb14
move backward cast to broadcasted, expand to mixins (#13156)
* shrink_to mixin

* move backward cast into _broadcasted

* expand to movement mixin

* move a few more

* fix spec issue
2025-11-07 15:07:47 -08:00
George Hotz
f2519ea0ba
shrink_to mixin (#13155) 2025-11-07 11:46:24 -08:00
C T
0f9d7f650d
whisper: fix oob, explicit dtype (#13144)
* fix dtype depending on numpy version

numpy v2 np.array returns int64 which Tensor passed through for the
first decode call, swallowing the <|notimestamps|> token and corrupting
the sequence

* fix whisper OOB

global limit on whisper's context length

* enforce whisper max_tokens_to_sample (match openai)

local limit on max tokens decoded
2025-11-07 12:55:01 -05:00
Ahmed Harmouche
3ecff3a8da
Fix dim splitting bug for len(dim) == len(limited) case (#13142)
* Fix gpudims bug on webgpu

* Fix split dim bug

* Remove webgpu_bug from examples

* Add test for shape correctness

* Fix 3D indexing

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-11-07 12:31:06 -05:00
nimlgen
b8e48effcb
device: no compilers message with reasons (#13146)
* device: no compilers message with reasons

* typings

* mypy
2025-11-07 23:01:45 +08:00
nimlgen
35e461ef69
hcq: use exception group (#12616)
* hcq: use exception group

* fix
2025-11-07 21:23:12 +08:00
nimlgen
10dc8335d2
tinygpu: fix teardown crash (#13143)
* tinygpu: fix crash

* um?

* double relase

* restore
2025-11-07 19:52:54 +08:00
qazal
d4a216d7d9
viz: display compiler errors (#13141) 2025-11-07 18:09:50 +08:00
qazal
7e94369464
add helper for test_timing custom ops (#13140) 2025-11-07 17:13:55 +08:00
nimlgen
95620426d5
tinygpu: unmap dma when client closed (#13129)
* tinygpu: unmap dma when client closed

* syn

* tiny fixes
2025-11-07 16:08:43 +08:00
wozeparrot
500d7661fa
feat: show range len on index in viz (#13139) 2025-11-06 23:21:27 -08:00
George Hotz
bb6364d7c7
tuplize from linearizer behind flag (#13136)
* remove tuplize from linearizer

* optional tuplize
2025-11-06 20:15:03 -08:00
chenyu
bb8cf948f2
variation of (x%c)+(x//c)*c = x (#13135)
when x is in the form of y//b, the idiv term might have combined
2025-11-06 18:53:28 -05:00
George Hotz
42b34cf83d
bottom up linearizer (#13133)
* bottom up linearizer

* late stores

* more complete

* remove broken heuristic

* upcast size

* opt

* more conservative

* it needs that

* disable opencl half on QCOM

* fix

* make that a real test

* cpu test okay

* ptx skip

* end is after the range
2025-11-06 15:30:32 -08:00
George Hotz
e0d828dba8 little cleanups 2025-11-06 13:58:19 -08:00
chenyu
bfb0c0391f
test custom eye function (#13134)
this version is also faster with NOOPT
2025-11-06 14:51:55 -05:00
George Hotz
290441dd44
do loads early (#13131)
* do loads early

* local and reg
2025-11-06 09:57:09 -08:00
George Hotz
097264853d
very simple priority (#13130)
* very simple priority

* still simple
2025-11-06 09:25:28 -08:00
George Hotz
07b415e831
fixup op order (#13128)
* fixup op order

* more order

* move a few more

* more

* DEBUG_LINEARIZE
2025-11-06 08:50:04 -08:00
nimlgen
b9b68bf437
amd: add kern to sqtt event (#13126)
* amd: add kern to sqtt event

* fix
2025-11-06 22:02:02 +08:00
qazal
88245d6579
qol improvements to sqtt decoder and timing tests (#13125) 2025-11-06 20:51:30 +08:00
nimlgen
dafdb4bfb1
test hcq open with pytest (#13124)
* test hcq open with pytest

* fi
2025-11-06 20:09:51 +08:00
nimlgen
05e2ff4d87
system: fix flock on pcidevs (#13123)
* system: fix locking of hcq devices

* rename and fullrun

* force ok

* fix

* fix
2025-11-06 19:02:13 +08:00
qazal
3126c89b84
viz: visible horizontal scrollbar in long texts (#13122) 2025-11-06 17:23:02 +08:00
George Hotz
91cc773397
add run count to toposort (#13119) 2025-11-05 22:29:34 -08:00
Adeeb Shihadeh
dca7fb0a49
qcom: make priority configurable (#13120) 2025-11-05 22:27:54 -08:00
qazal
b2bb3af12a
make range_color work in VIZ (#13121) 2025-11-06 14:26:48 +08:00
chenyu
f33c182393
test custom qkv kernel (#13118)
adding the online softmax hits infinite loop so starting with this
2025-11-05 23:32:13 -05:00
George Hotz
c65e6d8887
add ranges to print_uops (#13116)
* remove tuplize from linearizer

* try this

* simple priority

* add colored ranges to print_uops

* improve comments

* fix no const in src

* fix mypy

* fix define global

* fix var placement

* no prefer early load

* revert linearizer for now
2025-11-05 20:26:56 -08:00
George Hotz
9b2b535fa4
fix issue with multi flip (#13115) 2025-11-05 15:28:50 -08:00
George Hotz
4027eef264
fix test warnings (#13114)
* fix test warnings

* precommit passes

* ignore std_mean warning
2025-11-05 15:06:29 -08:00
George Hotz
bcfe42937f
move permute/flip/shrink to mixins (#13113)
* move permute to mixins

* move more stuff

* two more

* fix local mypy

* fix tests

* fix shrink
2025-11-05 14:14:15 -08:00
George Hotz
2d4f01fda0
move mixins to mixin dir (#13105)
* move mixins to mixin dir

* math
2025-11-05 10:18:33 -08:00
chenyu
52f0081e77
use where instead of mul in Embedding (#13112) 2025-11-05 12:49:01 -05:00
b1tg
edc4e1aede
ignore trailing nops in llvm-objdump output (#13110) 2025-11-06 01:10:51 +08:00
chenyu
03ee0cfe45
minor fast_idiv cleanup [pr] (#13109) 2025-11-05 11:44:36 -05:00
chenyu
18d4ecc1f3
lower nv test_gemm_4096 target (#13107) 2025-11-05 11:05:16 -05:00
nimlgen
eff80beeed
amd: props in device not sqtt (#13106)
* amd: props in device not sqtt

* fix

* f

* fix

* fix
2025-11-05 23:43:20 +08:00
nimlgen
757ceab2a2
system: allow using vidmem for uc mem (#13104) 2025-11-05 19:12:59 +08:00
qazal
8119d9f082
sqtt: decode each instruction exec (#13093)
* sqtt: decode each instruction exec

* start tests

* run_asm

* capture sqtt per kernel

* chaining vgprs

* test things

* inst_execs in viz

* can also configure l and g

* 1l + cleanup

* test_sleep

* test_wmma

* work

* test sleep with llvm builtin
2025-11-05 17:30:27 +08:00
chenyu
54141e9cb9
DISABLE_COMPILER_CACHE=1 in speed_v_theoretical (#13096) 2025-11-04 11:28:18 -05:00
chenyu
1c9f720654
remove unused type ignore [pr] (#13095) 2025-11-04 10:08:07 -05:00
nimlgen
c857dc5af0
autogen: try/except in try_dlopen (#13094)
* autogen: try/except in try_dlopen

* ugh
2025-11-04 22:51:53 +08:00
nimlgen
eaf7cbc178
amd: flush sqtt after each kernel (#13092)
* amd: flush sqtt after each kernel

* merge for rgp
2025-11-04 22:12:48 +08:00