Commit graph

13,427 commits

Author SHA1 Message Date
George Hotz
2c8867c3e9 all cpu tests pass local 2026-05-26 19:28:46 -07:00
George Hotz
c88941a957 bugfix 2026-05-26 19:19:44 -07:00
George Hotz
760acdf554 really fix llvm 2026-05-26 19:16:48 -07:00
George Hotz
f54a1d70d6 unneeded 2026-05-26 19:05:04 -07:00
George Hotz
5d3713a6be fix llvm 2026-05-26 18:47:34 -07:00
George Hotz
d0f956341c replace INDEX with SLICE 2026-05-26 18:37:03 -07:00
George Hotz
156a4438d9
rename BUFFER_VIEW to SLICE (#16391)
* rename BUFFER_VIEW to SLICE

* fix comments
2026-05-26 18:15:00 -07:00
Christopher Milan
3adf7f5d95
disable flaky cl test (#16388) 2026-05-26 19:56:57 -04:00
Christopher Milan
d23659d38b
cleanup some old test skips (#16384) 2026-05-26 19:07:22 -04:00
George Hotz
fd963038a0
remove allow_any_len from store (#16385)
* remove allow_any_len from store

* a few more

* no bv there

* more fixes

* fixes

* oh that
2026-05-26 15:26:53 -07:00
chenyu
0b88827482
remove CONST(UNIQUE) (#16383) 2026-05-26 14:45:22 -04:00
chenyu
d861c50dce
remove unique_const (#16382) 2026-05-26 13:53:31 -04:00
George Hotz
bac82d4949
fix emu bug in gfx950 (#16381)
* fix emu bug in gfx950

* fix renderer
2026-05-26 10:32:03 -07:00
chenyu
9b00defc8c
Revert "remove unique_const (#16372)" (#16380)
This reverts commit 09019d6761.
2026-05-26 12:30:07 -04:00
chenyu
09019d6761
remove unique_const (#16372)
* remove unique_const

* fix SDWA thing

* that?
2026-05-26 12:18:03 -04:00
George Hotz
7f1b02854e
bufferview offset is units of input dtype (#16378) 2026-05-26 08:49:31 -07:00
qazal
846a809af7
viz: add +- toggle for hidden UOps (#16368)
* first

* remove

* move src toggles to client side

* line

* update viz server tests

* remove those

* logic

* cleanup

* call matches

* fix const arg

* add labels

* keep changes

* the stack on movement ops hiding change

* structure

* rename to expandedNodes

* work

* test intention
2026-05-26 22:31:54 +09:00
nimlgen
032905dec9
hcq2: simpler (#16361) 2026-05-26 14:28:48 +03:00
George Hotz
322693dcd3 hotfix: bump Mac pytest timeout to 4 minutes (try 2) 2026-05-25 18:23:21 -07:00
George Hotz
41ee7dab1c
script to generate testsig for DSP (#16371)
* script to generate testsig for DSP

* cleanups
2026-05-25 17:54:58 -07:00
wozeparrot
76fc39ccc0
gather to single device (#16354) 2026-05-25 17:27:08 -07:00
George Hotz
942cb42b97 Revert "hotfix: bump Mac pytest timeout to 4 minutes"
This reverts commit 695a0069ed.
2026-05-25 17:25:11 -07:00
Christopher Milan
8ddd1328df
remove getenv(CI) (#16365)
gone everywhere except test_interop, because torch MPS does not work in actions
2026-05-25 20:23:33 -04:00
George Hotz
695a0069ed hotfix: bump Mac pytest timeout to 4 minutes 2026-05-25 17:20:19 -07:00
George Hotz
689ab6a49f
move buffer view offset to src (#16364)
* this work?

* failed
2026-05-25 17:07:55 -07:00
Christopher Milan
d8f86be613
webgpu: shader-f16 support in arch (#16370) 2026-05-25 19:20:59 -04:00
qazal
4bcc53eb26
viz: stable node position for +- toggle (#16367) 2026-05-26 06:30:47 +09:00
qazal
3506eb08ec
viz: sidebar toggles always recenter (#16366)
* viz: sidebar toggles always recenters

* python brain
2026-05-26 06:14:32 +09:00
chenyu
cdeb861828
invalids is empty [pr] (#16353) 2026-05-25 16:11:38 -04:00
qazal
b73d2d17b9
viz/cli: add --interval (#16363)
* interval support

* add test_interval

* llama uses interval
2026-05-26 03:35:06 +09:00
C T
2ab90f31b1
use windows-specific alias nvcuda when loading cuda on windows (#16260)
This also makes it possible to use cuda on windows by specifying 3 env
vars with direct dll paths: NVCUDA_PATH, NVRTC_PATH and NVJITLINK_PATH
without name collision with CUDA_PATH which is used for cuda headers
include path in NVRTCCompiler.
2026-05-25 08:50:50 -07:00
wozeparrot
68d2102fd2
llama: offload master weights (#16355) 2026-05-25 08:48:13 -07:00
qazal
eecd4706ff
fix mailbox comment, add types (#16360) 2026-05-25 22:24:00 +09:00
nimlgen
64095cf2e2
use get_buf in exec_kernel (#16356) 2026-05-25 15:13:40 +03:00
chenyu
5d5e02871f
remove Tensor.from_uop (#16344)
and no device for const in Tensor init
2026-05-24 18:53:09 -04:00
nimlgen
a891727c9f
hcq2: multi (#16347)
* hcq2: multi

* cleaner a bit
2026-05-24 19:28:33 +03:00
chenyu
926d125a63
update test_stack (#16345)
also skip COMPILE_ONLY, it was comparing 0==0
2026-05-23 10:42:35 -04:00
chenyu
149a87dac2
deviceless const cleanups (#16341) 2026-05-22 20:11:01 -04:00
Christopher Milan
35461d4d8f
ci: cleanup some deps [pr] (#16340) 2026-05-22 19:16:08 -04:00
Christopher Milan
451f38155c
start cleanup of the slowest tests (#16339) 2026-05-22 18:39:36 -04:00
nimlgen
26b3b3f6a2
hcq2: move submit lowering to schedule (#16330)
* hcq: move submit lowering to schedule

* Dx
2026-05-22 23:15:19 +03:00
wozeparrot
2d48fe8b7b
feat: bump version to 0.13.0 (#16337) v0.13.0 2026-05-22 13:12:45 -07:00
chenyu
acc519720b
add missing init files, add chat.html to package-data (#16334) 2026-05-22 13:53:34 -04:00
googlefan256
eeadf26dad
Fix no module named error (#16305)
Co-authored-by: chenyu <chenyu@fastmail.com>
2026-05-22 12:51:29 -04:00
nimlgen
90dbb45563
nv: fix boot mem (#16332)
* nv: fix boot mem

* linter
2026-05-22 19:28:38 +03:00
nimlgen
5d77a94923
am: mec_pipe0_reset on gfx12 only (#16331) 2026-05-22 19:02:18 +03:00
qazal
bbfe4f80ec
quantize_fp8 kernels in uops (#16288)
* add tests

* simple UOp kernel is n^2

* fast kernel matching c++, opts_to_apply=()

* remove cpp

* simple o(n) kernel, two passes

* fuse the loops

* works on DEV=CPU

* multi regression test

* fix multi, this can possibly be its own bugfix

* test cleanups

* minimal diff

* match C in UOps

* Revert "match C in UOps"

This reverts commit 0bef740c30.

* edit test

* match speed with C try 2

* needs_second_gpu

* cleanup
2026-05-22 20:54:06 +09:00
chenyu
3115952266
more unique const removal prerequisite (#16328) 2026-05-21 23:51:40 -04:00
Christopher Milan
c2d06570a5
remove getenv(CI) from core tinygrad (#16326) 2026-05-21 22:20:33 -04:00
chenyu
9744d512d9
use more non-buffered const (#16327) 2026-05-21 21:37:52 -04:00