Commit graph

13,471 commits

Author SHA1 Message Date
wozeparrot
dd8302a6d0
fix: optim device is never none here (#14963) 2026-02-22 23:34:57 -08:00
wozeparrot
25565b2410
fa: test for mp (#14907) 2026-02-22 21:47:36 -08:00
qazal
d6145736c7
sqtt: examples generator changes from inst_discovery (#14961)
* sqtt examples generator changes from inst_discovery

* rdna4

* rdna3

* cdna

* sad reality for mi300x
2026-02-23 14:42:48 +09:00
George Hotz
3acd763684
simple call in allocate (#14962)
* allocate generates a call

* symbolic works too

* add min/max to PARAM

* revert viz
2026-02-23 13:34:20 +08:00
George Hotz
f45199269b hotfix: regress NV cifar_10steps_half to 120 ms 2026-02-23 12:29:25 +08:00
George Hotz
677145b393
all consts have shapes (#14959)
* all consts have shapes

* vconst has shape too

* use normal schedule

* cast ptrdtype

* image

* bitcast issue + hack
2026-02-23 10:26:50 +08:00
qazal
1538960002
viz: smaller view for repeated asm instructions in cfg (#14954)
* simple test

* todo

* feature
2026-02-23 10:41:43 +09:00
George Hotz
226d4a2440 hotfix: code DEBUG=1 defensively 2026-02-23 08:44:54 +08:00
chenyu
4424757b9a
update test_sharded_memory (#14956)
cleaned up and moved to test/null
2026-02-22 16:56:08 -05:00
b1tg
f9b7493e7a
cleanup fp8 conversion helpers and fp8 edge-case tests (#14953)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2026-02-22 09:16:42 -05:00
qazal
60f90dd97c
sqtt: fix jitted program deduping, failing test for graphed kernels (#14951)
* work

* hcq_profile fix, test with JIT=2 passes

* ci, -n=auto

* rm duplicate test

* less
2026-02-22 15:22:31 +09:00
chenyu
ccfd878e0f
minor fix_assign_hazard improvement [pr] (#14949)
target.base cannot be s if s.op is a movement
2026-02-21 21:21:28 -05:00
chenyu
24e8919438
raise explicitly for test_crossunder_assign (#14948) 2026-02-21 21:21:13 -05:00
chenyu
acf8f6b287
faster fix_assign_hazard [pr] (#14947)
one toposort. `time NULL_ALLOW_COPYOUT=1 MNISTMOCK=1 PYTHONPATH="." NULL=1 DEFAULT_FLOAT=HALF BENCHMARK=10 BS=256 GPUS=1 MODEL=resnet python3 examples/mlperf/model_train.py` 150s -> 40s
2026-02-21 19:42:13 -05:00
chenyu
9764e2561c
more assign into unrealize silent fail cases (#14944) 2026-02-21 18:12:57 -05:00
nimlgen
6de15dc480
mockam usb (#14916)
* mockam usb

* f

* win

* x

* x
2026-02-21 23:05:54 +03:00
chenyu
0dbcd764ad
a few assign into unrealized failed test case (#14940) 2026-02-21 13:18:45 -05:00
wozeparrot
3cda781876
llama optim offload (#14901) 2026-02-21 08:53:45 -08:00
chenyu
0255a64a27
update test_jit_init_empty (#14938)
* update test_jit_init_empty

now it fails silently

* that
2026-02-21 09:01:50 -05:00
George Hotz
8ef5544e4a
realized PYTHON copies (#14934)
* realized PYTHON copies

* comment that out

* fix that test

* append afters

* contig

* disk copies

* should be 124

* 332
2026-02-21 20:29:31 +08:00
qazal
cf23c2eee7
viz: merge readelfs, clean up toggles UI code (#14936)
* no extra readelf function

* that node can never be null, display block is wrong fix the css
2026-02-21 19:58:35 +09:00
George Hotz
639224e6e1
no call hack needed anymore (#14935) 2026-02-21 18:06:00 +08:00
George Hotz
d3b829a189
print schedule caller with DEBUG=1 (#14933) 2026-02-21 16:22:45 +08:00
qazal
8278886cf9
test_profiler cleanup, non flaky cpu_profile test (#14932)
* test_profiler cleanup, non flaky cpu_profile test

* existing device is okay
2026-02-21 16:58:10 +09:00
George Hotz
06fb35a1e5
don't graph_rewrite into calls (#14931)
* don't graph_rewrite into calls

* optional

* pm_gate_kernel_sink removed
2026-02-21 15:39:59 +08:00
qazal
c5029fa460
jit case with Tensor.empty input, realized means allocated (#14930)
* simple failing jit test case with Tensor.empty

* this used to exist in ops.py...

* Revert "removed if self.buffer.is_allocated() in realized (#14836)"

This reverts commit 72cf603805.
2026-02-21 16:33:55 +09:00
George Hotz
6533250246
remove more tags stuff (#14927)
* remove more tags stuff

* remove more

* unique consts aren't needed post tensor
2026-02-21 12:51:53 +08:00
chenyu
0c0d07d330
delete forced_reshape [pr] (#14926) 2026-02-20 22:35:31 -05:00
qazal
5b6fcd1cda
gemm/asm: smallest cdna4 asm gemm test (#14925) 2026-02-21 11:56:05 +09:00
George Hotz
ad3d821d63
move size 0 logic to allocations (#14924) 2026-02-21 09:57:40 +08:00
George Hotz
df7774661a
remove late numbering of UOps (#14923)
* remove late numbering of UOps

* stupid fix

* dead code
2026-02-21 09:18:48 +08:00
chenyu
c9b706125d
break Tensor.pad into methods (#14922) 2026-02-20 20:10:09 -05:00
Christopher Milan
5ee654b0d9
test IMAGE=1 driving_vision in mac pytest (#14921)
* test IMAGE=1 driving_vision in mac pytest

* don't multiply array
2026-02-20 18:28:10 -05:00
Christopher Milan
815780f72f
cl: fix multi-image arg kernels (#14920) 2026-02-20 17:34:17 -05:00
chenyu
24286c5593
fix clone for multi (#14919)
also update empty_like to make sure it's backed by buffers
2026-02-20 17:21:09 -05:00
chenyu
1fc1508f67
add assign to test_realize_is_realize.py (#14918) 2026-02-20 16:48:01 -05:00
chenyu
a4634b253a
fix empty_like for sharded tensor (#14915) 2026-02-20 16:30:04 -05:00
chenyu
86e7804d60
correct llm.py mem bw benchmark for moe (#14626)
only count active experts. verified on olmoe
2026-02-20 16:11:22 -05:00
Nicolas Pinto
aa905db7f7
ptx: use setp.neu for float CMPNE (#14805)
* ptx: use setp.neu for float CMPNE

* test ptx float CMPNE renders setp.neu

* check NaN behavior, not grep ptx strings...

* skip WEBGPU for test_cmpne_nan (Vulkan NaN behavior)

---------

Co-authored-by: Nicolas Pinto <41171+npinto@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2026-02-20 16:11:04 -05:00
chenyu
f9536f3cd4
wrap UOp.__float__ with float [pr] (#14913)
fix warning
tinygrad/test/null/test_uop_resolve.py:56: DeprecationWarning: UOp.__float__ returned non-float (type ConstFloat).  The ability to return an instance of a strict subclass of float is deprecated, and may be removed in a future version of Python.
    self.assertEqual(float(u), 11.5)
2026-02-20 14:03:53 -05:00
chenyu
697d0b06c2
update env for testmacpytest (#14912)
CI: ""
CAPTURE_PROCESS_REPLAY: "0"
2026-02-20 13:42:50 -05:00
chenyu
07d145debd
compile3 0.10.1 driving_vision in mac pytest (#14911)
* compile3 0.10.1 driving_vision in mac pytest

* sync before re-executing onetime kernels
2026-02-20 12:23:52 -05:00
chenyu
d895713116
remove temp onnx migration CI job (#14910) 2026-02-20 11:38:44 -05:00
George Hotz
2611907afb
start ripping out old scheduler -- no maps (#14909)
* start ripping out old scheduler -- no maps

* no more metadata
2026-02-20 21:05:04 +08:00
nimlgen
1b3b94a72a
fix mockam mypy (#14908) 2026-02-20 15:15:05 +03:00
George Hotz
55d3a5def9
preallocate all realized buffers (#14823)
* preallocate all realized buffers

* contiguous

* work

* comment that out

* move to schedule

* better

* correct fix

* just buffer

* disk bufs

* fixes disk tensor stuff

* fix symbolic stuff

* fix multi

* 162 failures

* bugfixes

* don't check that anymore

* fix schedule tests

* mnist should be contiguious

* type and buffer

* fix tests

* shrink axis correction

* mypy fixes

* tests skips

* same 37 failures

* dedup

* no shrink in the graph

* 29 failures

* skips

* fix custom kernel

* fix training

* those optimizations aren't supported currently

* simpler

* more correct

* tests

* 14 failures

* works

* fix that test

* broken

* 11 failures

* only kernel counts left

* fixes

* all tests pass

* remove tensor_map

* op test

* 200 -> 230

* test fixes

* fixes

* revert test_tiny thing

* guard

* revert that

* test tiny passes

* no contigs there

* base realize back

* Revert "no contigs there"

This reverts commit c45bb9fcfd.

* revert that

* chop many assigns

* 12 failures

* fix tests

* tests

* apply after

* pre-commit

* remove old code

* delete that

* fix types

* remove extra contig

* fix dataloader

* torch fix

* disk fix

* update kernel fusion numbres

* runs on amd

* restore kernel count

* add that rule back

* that

* disable that

* wrong

* add the correct rule for that folding

* more tests

* guard c1.arg

* no newlines

* realize those

* split into a different file

* remove detach/contig back

* skip 2

* update that
2026-02-20 20:05:54 +08:00
nimlgen
dbf894215a
init mockam (#14889)
* mockam

* more tests

* linter

* x
2026-02-20 14:09:11 +03:00
wozeparrot
4b9825c829
make optim _step return update (#14906) 2026-02-20 02:43:56 -08:00
George Hotz
6610255654
add the correct rule for gcd div/mod folding (#14905)
* add the correct rule for that folding

* more tests

* guard c1.arg
2026-02-20 18:11:54 +08:00
George Hotz
a28fc2fba7 hotfix: remove wrong symbolic rule 2026-02-20 17:09:18 +08:00