Commit graph

12,131 commits

Author SHA1 Message Date
George Hotz
2fe45b0660
Merge branch 'master' into call_inline 2026-02-10 14:12:08 +08:00
Christopher Milan
cdb78954cb
better cl compiler name (#14660)
cl_compiler instead of compiler because overriding Compiled.compiler seems more confusing
2026-02-10 01:03:46 -05:00
George Hotz
1d88723aa0
Merge branch 'master' into call_inline 2026-02-10 14:02:43 +08:00
George Hotz
cc9bf8ccbc
move more to null/unit tests (#14658)
* move more to null tests

* move test_gc

* no test fusion op
2026-02-10 13:35:17 +08:00
George Hotz
b0dd3af093 inline all these calls 2026-02-10 13:25:29 +08:00
chenyu
83f6d28579
two less realize in setitem (#14655) 2026-02-09 23:45:24 -05:00
George Hotz
e89221e9aa add inline flag for call 2026-02-10 12:19:51 +08:00
wozeparrot
69574542ab
fix: use correct fa implementation in eval (#14651) 2026-02-09 18:20:44 -08:00
chenyu
0dedf4063c
minor test_setitem cleanup (#14654) 2026-02-09 20:40:29 -05:00
Christopher Milan
b36b62eb59
don't push docker cache for PRs (#14652) 2026-02-09 19:55:55 -05:00
Christopher Milan
e6562a5061
remove CompilerPair (#14638) 2026-02-09 19:51:18 -05:00
Christopher Milan
396e1320fb
bump cache version for z3 (#14650) 2026-02-09 19:32:07 -05:00
chenyu
9e3f24db9f
assign realize fix (#14649)
fix the need for explicit assign. track pending assigns for each buffer, and run those before the main realize in order
2026-02-09 17:46:46 -05:00
chenyu
0913c068ea
clean up setitem disk path (#14648) 2026-02-09 15:58:04 -05:00
chenyu
205a1212b7
delegate non Tensor src setitem to assign (#14647)
cannot do this for DISK in the unified path
2026-02-09 13:53:20 -05:00
chenyu
e9f40f49d4
explicitly check advanced setitem (#14644)
advanced setitem DISK would failed in rangeify with bad error, now it's checked directly in setitem. eventully DISK can use regular setitem path
2026-02-09 13:36:46 -05:00
chenyu
20a132b1c4
relax atol for test_uop_scan_matmul (#14646)
flaky, also log max diff
2026-02-09 13:25:19 -05:00
qazal
50d3f6cea5
EVAL_BS=0 in llama profile (#14643) 2026-02-10 00:49:43 +09:00
chenyu
8a2c23d3dc
raise RuntimeError for setitem dtype mismatch (#14642) 2026-02-09 10:37:08 -05:00
qazal
80b0119cef
llama: add new asm gemm shape (#14611)
* llama: add new asm gemm shape

* work

* cleanup

* half dtype

* more comment
2026-02-10 00:34:29 +09:00
chenyu
a49e038c0c
dont manually broadcast in setitem (#14641)
handled by assign
2026-02-09 09:34:09 -05:00
chenyu
2c3e3559eb
remove a contiguous in basic setitem (#14640)
handled in rangeify
2026-02-09 09:19:46 -05:00
chenyu
6c0c8e2ac3
setitem push a realize to basic setitem (#14637)
advanced setitem does not need it
2026-02-09 08:54:07 -05:00
nimlgen
e087c58ae0
print tables in llama/profile.sh (#14639) 2026-02-09 12:32:54 +03:00
Christopher Milan
27f7ea478b
new style DSP renderer (#14636)
* new style DSP renderer

* cleanup
2026-02-09 00:39:03 -05:00
Christopher Milan
efac5b9ef6
new style NV/CUDA renderers, try 2 (#14634)
* new style NV/CUDA renderers, try 2

* fix diskcache
2026-02-08 22:58:48 -05:00
Christopher Milan
0ebb508b85
new style metal compiler (#14632) 2026-02-08 21:58:25 -05:00
Christopher Milan
9eef9f38ad
new style python renderer (#14631) 2026-02-08 21:45:07 -05:00
Christopher Milan
5f2f2cc956
Revert "new style NV/CUDA renderers (#14627)" (#14633)
This reverts commit 0e505951b0.
2026-02-08 21:16:03 -05:00
Christopher Milan
4ad787ece2
new style CPULLVMRenderer (#14629) 2026-02-08 21:05:01 -05:00
Christopher Milan
0e505951b0
new style NV/CUDA renderers (#14627)
* new style NV/CUDA renderers

* fix pickle

* oops

* fix CUDA_CC=NVCC

* mockgpu uses PTXCompiler

* oops

* ruff

* dont discard stderr

* ugh
2026-02-08 21:04:51 -05:00
Filip Brzek
1667669c46
fix: python3 -m tinygrad.device reporting on AMD/CPU (#14622)
* test: device module expects PASS in -m tinygrad.device for CPU

* fix: use device._compiler_name instead of unwrap_class_type(compiler).__name__ in enumerate_devices_str
2026-02-08 20:22:35 +03:00
nimlgen
01a4ee4d66
do not hive_reset when amdgpu (#14624) 2026-02-08 19:14:13 +03:00
nimlgen
a615b9d781
am: f8_mode for gfx94x only (#14620) 2026-02-08 17:38:48 +03:00
chenyu
c28f7d0167
remove realize in Tensor.svd (#14623) 2026-02-08 09:36:31 -05:00
qazal
087dab4c3b
gemm/asm: split out cdna tests from CI (#14619)
* gemm/asm: split out cdna tests from CI

* reorder

* work
2026-02-08 21:33:42 +09:00
George Hotz
183d38b128
remove CUSTOM_KERNEL / directly construct it (#14604)
* remove CUSTOM_KERNEL / directly construct it

* clean that up

* simpler multi

* custom kernel spec

* remove Kernel

* fix multi

* use sharded shape

* explicit regression test
2026-02-08 18:43:33 +08:00
nimlgen
e29a88ca09
hive_reset respects lock (#14618) 2026-02-08 10:47:25 +03:00
qazal
b10802eb53
use existing VIZ ContextVar instead of getenv (#14610) 2026-02-08 15:37:55 +09:00
chenyu
510b65489e
style change rangeify assign [pr] (#14616)
consistent naming, also a standalone fucntion to replace complicated lambda
2026-02-07 15:47:32 -05:00
chenyu
b7afd4471c
use arg instead of 3rd op for ASSIGN [pr] (#14613) 2026-02-07 14:17:10 -05:00
nimlgen
88c3022223
amd: kfd iface early exit (#14612)
* amd: kfd iface early exit

* l

* revert
2026-02-07 18:57:10 +03:00
nimlgen
ce7bfc6ce8
nv: use nv_flags for all fields (#14607) 2026-02-07 15:01:38 +03:00
qazal
c2544e2252
viz: remove outdated comment (#14608) 2026-02-07 20:05:24 +09:00
nimlgen
6838b35cff
mockgpu: hevc (#14606)
* mockgpu: hevc

* eng
2026-02-07 12:27:55 +03:00
chenyu
884592f6c8
pin z3-solver version (#14605)
found exact input that crashes z3 4.15.4
2026-02-06 22:49:31 -05:00
George Hotz
7a2a3b5c71
Remove Ops.KERNEL, it's all Ops.CALL now (#14603) 2026-02-07 10:21:54 +08:00
George Hotz
ca6604eae2
kernel is call (#14577)
* call is kernel

* closer

* fix bugs

* dedup

* pm_gate_kernel_sink

* better

* Revert "better"

This reverts commit b4c799b810.

* Reapply "better"

This reverts commit e53f094ce7.

* cleanups

* work

* remove junk

* subtle fix

* index

* viz cleanups

* disable assert for now
2026-02-07 10:10:14 +08:00
wozeparrot
d87ae1c84c
feat: tinyfs load test in benchmark (#14602) 2026-02-06 18:00:00 -08:00
ttomsa
462b455562
cleanup linearize (#14523) 2026-02-07 08:54:02 +08:00