Commit graph

13,471 commits

Author SHA1 Message Date
qazal
bfb2d1f89a
Revert "fp8 gemm speedup (#16236)" (#16245)
This reverts commit d95bf394e1.
2026-05-19 02:01:44 +09:00
chenyu
5ae4dbd599
make slow tests faster (#16244) 2026-05-18 11:42:02 -04:00
chenyu
981c12182f
remove requires_grad= in tinygrad/ (#16241) 2026-05-17 16:55:37 -04:00
chenyu
fcdd1af880
remove Tensor.detach override [pr] (#16239) 2026-05-16 23:58:12 -04:00
chenyu
dcee90aa3f
remove requires_grad use in extra/examples (#16238)
except the ones fed into optimizer
2026-05-16 18:40:26 -04:00
chenyu
8631b6f17d
remove use of requires_grad in test/ (#16237) 2026-05-16 17:21:07 -04:00
qazal
d95bf394e1
fp8 gemm speedup (#16236)
* add asm_gemm option

* milestone

* work

* edit

* only the fast kernel

* diff
2026-05-17 04:58:28 +09:00
chenyu
0ddc50d050
do not gate backward on requires_grad (#16230)
DETACH is filtered in _deepwalk. instead of None, it gets 0 grad now
2026-05-16 12:29:49 -04:00
nimlgen
bef5f717bc
fix nolocals and beam (#16232) 2026-05-16 18:09:19 +03:00
qazal
ebcb7b7cc0
fp8 gemm tests with scale args (#16231)
* update atol

* update fp8 path

* more work

* update profile.sh
2026-05-16 20:47:58 +09:00
nimlgen
e575f778f9
move debug prints (#16218)
* move debug prints

* x
2026-05-16 13:57:34 +03:00
wozeparrot
2d48d7ab09
remove more invalid (#16227) 2026-05-16 02:52:27 -07:00
wozeparrot
159694347e
llama: fix running flat_llama (#16224) 2026-05-15 20:16:48 -07:00
Christopher Milan
79c0ae5b89
metal: arch is GPU family (#16223) 2026-05-15 21:22:48 -04:00
Christopher Milan
2c61f65211
cl: device extensions in arch (#16220) 2026-05-15 18:59:20 -04:00
George Hotz
2549b14ec2
fix caformer onnx run (#16222) 2026-05-15 15:08:36 -07:00
George Hotz
2570bded8b
update spec for LOAD (#16221)
* add load to the spec

* can
2026-05-15 14:46:00 -07:00
chenyu
d62c1d83c0
remove Tensor.eye override (#16219)
* remove Tensor.eye override

was only needed for requires_grad arg

* README
2026-05-15 15:40:34 -04:00
chenyu
07a172dbbb
remove noop requires_grad_ calls (#16213) 2026-05-15 13:31:10 -04:00
chenyu
c6cf9e8f0c
remove test_svd_nonfull_5_5 (#16217)
flaky, kinda overlap with test_svd_general
2026-05-15 13:10:02 -04:00
qazal
d54fa86b71
viz/cli: select all calls in graph by default (#16214) 2026-05-15 21:01:44 +09:00
nimlgen
28b98e529d
nv: move structs to vram (#16184)
* nv: vram

* x

* 4090

* x

* move and sysmem on macos

* x

* remove hp
2026-05-15 13:41:42 +03:00
chenyu
409bb0c9ad
requires_grad cannot be None (#16212)
final goal is to remove requires_grad, first change the default to True, and don't allow None
2026-05-15 02:01:04 -04:00
Christopher Milan
c7870f11ff
mesa: suggest curl install tip (#16211) 2026-05-15 00:29:06 -04:00
chenyu
a612b88abb
better assert when setitem a refed tensor (#16210)
also decouple from requires_grad
2026-05-14 23:40:29 -04:00
chenyu
a75c14f010
some setitem tests (#16209) 2026-05-14 22:36:25 -04:00
Christopher Milan
891a1ae7c2
onnx: remove dtype_fallback (#15717) 2026-05-14 22:06:57 -04:00
wozeparrot
b4d267dfd4
llama: only save when small (#16208) 2026-05-14 17:46:29 -07:00
chenyu
ffa1aac7b1
gradient for STORE/AFTER ala clone (#16205) 2026-05-14 20:17:27 -04:00
chenyu
09096ea565
test_gradient_through_clone (#16203)
backward through clone crashes now
2026-05-14 19:26:47 -04:00
George Hotz
d4dcd8487b
aggressive shape check to prepare for broadcasting (#16202)
* add implicit broadcasting to shape

* NOOP/ALLREDUCE fixes
2026-05-14 16:15:44 -07:00
George Hotz
83ec66da34
fix a fastdiv edge case (#16199) 2026-05-14 13:12:18 -07:00
nimlgen
62ea73719d
hcq2: share more with graph (#16196)
* share more with graph

* comment
2026-05-14 22:28:11 +03:00
George Hotz
3b8cc31759
disable fast idiv by default, it's broken (#16197)
* disable fast idiv by default, it's broken

* fix fast idiv tests
2026-05-14 11:48:27 -07:00
Christopher Milan
8f811649ff
better compiler_cpu invalid arch errors (#16194) 2026-05-14 14:36:14 -04:00
qazal
f03a7fd6d1
viz/cli: readable uop json (#16195)
* viz/cli: readable uop json repr

* work

* better
2026-05-14 21:33:10 +09:00
C T
1b779a9058
add gelu approximate="none" (match pytorch) (#16162)
* add gelu approximate="none" (match pytorch)

* lint

* pass through onnx Gelu approximate

* type annotate

* explicit math.sqrt

* keep tinygrad's gelu approximate="tanh" default
2026-05-13 18:53:24 -07:00
chenyu
dd9187d9ee
minor hash cleanups (#16190)
same kernels
2026-05-13 20:59:24 -04:00
wozeparrot
88ac2ac1fd
llama: cleanups (#16189) 2026-05-13 17:08:06 -07:00
Christopher Milan
9a365d9978
ci: fix null image tests (#16188) 2026-05-13 18:00:05 -04:00
nimlgen
ad1fb7c981
hcq2: graph (#16186)
* keep this for now

* early graph
2026-05-13 22:49:43 +03:00
chenyu
3f9f6a51b2
minor image_conv2d cleanup (#16187)
remove some no-op slices
2026-05-13 15:47:40 -04:00
b1tg
59c34b9fe0
llm: precise device (#16159)
* llm: precise device

* llm: pass device to precompute_freqs_cis
2026-05-12 21:16:42 -07:00
b1tg
3c806ff406
clean up gguf (#16160) 2026-05-12 21:16:10 -07:00
wozeparrot
e97f2c1114
llama: only gemm + fa custom kernel (#16180)
* llama: tie store to grad directly

* llama: set mp flags

* llama: non fused grad fp8 quantize path
2026-05-12 21:03:49 -07:00
chenyu
38d407fd58
simplify svd more (#16181)
all the slowness is scheduling
2026-05-12 23:48:22 -04:00
Christopher Milan
f1fdd2ccec
ci: add IMAGE=1 compile-only tests (#16182)
* ci: add IMAGE=1 compile-only tests

* fix
2026-05-12 23:40:32 -04:00
George Hotz
faf7fb7513
update nir renderer for new image style (#16179)
* update nir renderer for new image style

* don't cast image indexes
2026-05-12 20:25:01 -07:00
Christopher Milan
7d0c5ab689
ci: ocelot needs nvcc on linux (#16178)
* ci: ocelot needs nvcc on linux

* cudart
2026-05-12 23:13:48 -04:00
chenyu
32138c2418
svd to mixin (#16175) 2026-05-12 22:29:01 -04:00