Commit graph

5,694 commits

Author SHA1 Message Date
b1tg
24d328e313
onnx parser (#10435)
* onnx parser

* fix compile, lint

* onnx.load -> onnx_load

* compatible with ModelProto

* fix test external_test_onnx_ops.py

* fix tests

* fix signed int

* reduce to 261 lines

* fix TypeProto.Optional

* debug for _parse_message, add TypeProto.Sequence, cleanup

* onnx_load from Tensor

* remove BufferedReader

* 174 lines and reduce tensor copy

* cleanup

* use onnx_load in external_model_benchmark.py

* fix qcom test

* [onnx] parser support external data

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-06-09 12:44:28 -04:00
George Hotz
81b9c04574
move high level stuff to unit tests [pr] (#10708)
* move high level stuff to unit tests [pr]

* process replay on unit tests

* fix pr, less compute

* set omp num threads

* set 200MB buffer size limit

* delete junk

* fix tests

* faster

* move test_indexing to unit

* faster
2025-06-08 14:05:56 -07:00
George Hotz
4e2c3560b4
smaller tests are faster tests [pr] (#10704)
* remove del spam from CI

* more

* preconstruct default buffer spec

* ignore those errors

* check exception

* more exception check

* skip stuff

* smaller tests mean faster tests

* a few more
2025-06-08 10:54:19 -07:00
George Hotz
32e9949052
rename lazydata to uop (#10698) 2025-06-08 08:42:22 -07:00
uuuvn
8e3f337075
Skip flaky test in ci (#10696)
`test_data_parallel_resnet_train_step` is already skipped on LLVM/CPU:

```python
@unittest.skipIf(CI and REAL_DEV in ("CUDA", "NV", "LLVM", "CPU"), "slow, and flaky on LLVM/CPU")
@unittest.skipIf(REAL_DEV == "WEBGPU" and not OSX, "WEBGPU Vulkan can only run kernels with up to 10 buffers")
def test_data_parallel_resnet_train_step(self):
```

It looks like `test_data_parallel_resnet` (no `_train_step`) is flaky in a similar way:
https://github.com/tinygrad/tinygrad/actions/runs/15472667248/job/43560773882?pr=10642#step:9:64
2025-06-08 08:24:09 -07:00
George Hotz
8c76250d31
speed up a few tests (#10692) 2025-06-07 20:39:25 -07:00
ihar
40c1479267
added unit tests for 'argfix' (#10678) 2025-06-07 22:17:10 -04:00
ihar
74b849b5e1
remove unnecessary 'argfix' because 'view' is an alias to 'reshape'. all functionality must be inside 'reshape' (#10677)
* remove unnecessary 'argfix' because 'view' is an alias to 'reshape'. all functionality must be inside 'reshape'

* added the same set of unit tests for 'view' as for 'reshape' since 'view' is just an alias for 'reshape'

* improved tests for 'view' op
2025-06-07 22:15:31 -04:00
Sieds Lykles
c29a56dd51
Fix whisper OOB (#10685)
* fix whisper and test

* remove import
2025-06-07 20:23:50 -04:00
George Hotz
53ed64e133
ci speed work 1 (#10676)
* skip a few slow tests

* use a venv for python packages

* create venv

* no user, it's in venv

* ignore venv

* venv

* new cache key

* try that

* this

* version the python cache
2025-06-07 16:33:11 -07:00
qazal
cb61774ab6
move shared viz fields out of serve.py [pr] (#10684)
* move shared viz fields out [pr]

* update javascript

* update test_viz
2025-06-07 17:18:18 +03:00
qazal
b515d796fb
inline viz get_name [pr] (#10682)
* inline viz get_name [pr]

* changing name_fxn makes this simpler

* waitUntil dom
2025-06-07 11:16:16 +03:00
wozeparrot
e3805171e2
feat: variable bs bitcast (#10674) 2025-06-06 17:21:53 -07:00
George Hotz
54db1f8ee8
prevent huge waste of multi ram (#10669)
* prevent huge waste of multi ram

* fix ram usage

* only define var

* add resolve

* fix tests

* fix cifar training

* remove that logic

* fix test without long
2025-06-06 17:17:21 -07:00
George Hotz
b68b7dbc2a
test winograd is close to normal conv [pr] (#10557)
Co-authored-by: chenyu <chenyu@fastmail.com>
2025-06-06 19:11:49 -04:00
leopf
eb7305e6a4
Tensor.keccak("sha3_256") (#7186)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: George Hotz <geohot@gmail.com>
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-06-06 15:24:05 -07:00
chenyu
bdede4924e
fix odd number in get_test_global_size (#10671)
factor might not be a integer if input global_size has an odd number in it
2025-06-06 17:31:35 -04:00
George Hotz
7f0f97aa76
new test_multitensor tests (#10667)
* new test_multitensor tests

* cleanup scheduler
2025-06-06 10:26:28 -07:00
chenyu
4a6d84c4c3
hotfix llama start_pos vmax is max_context-1 (#10659)
* hotfix llama start_pos vmax is max_context-1

fixed `IGNORE_OOB=0 python3 examples/llama3.py --size 1B --benchmark --temperature 0`

* hotfix: multitensor transformer test tests kv cache

---------

Co-authored-by: George Hotz <geohot@gmail.com>
2025-06-06 00:41:25 -04:00
George Hotz
5eb6e1e65a Revert "hotfix: multitensor transformer test tests kv cache"
This reverts commit ad9f88419a.
2025-06-05 21:15:34 -07:00
George Hotz
ad9f88419a hotfix: multitensor transformer test tests kv cache 2025-06-05 21:08:57 -07:00
George Hotz
8325c4f192
tests for multi assign (#10658)
* tests for multi assign

* transformer tests

* add that assert
2025-06-05 20:56:40 -07:00
wozeparrot
0d86f8d375
fix failed threefry (#10646) 2025-06-05 17:17:42 -07:00
chenyu
ff1aad7b69
fix const float pow to int tensor (#10655)
was incorrectly casted into int
2025-06-05 19:15:12 -04:00
George Hotz
baba274a76
minimal mstack pr to fix allreduce (#10649)
* minimal mstack pr to fix allreduce

* fix webgpu
2025-06-05 15:14:53 -07:00
George Hotz
4c315f8e17
MSTACK little non-functional changes (#10648) 2025-06-05 13:20:22 -07:00
chenyu
46811d0d3c
minor external_model_benchmark cleanup (#10644) 2025-06-05 14:13:28 -04:00
qazal
26afbc954f
delete redundant tests from test_schedule [pr] (#10643) 2025-06-05 20:08:39 +03:00
chenyu
80ebce421d
remove metal buffer limit in external_model_benchmark [pr] (#10642)
not needed anymore
2025-06-05 13:00:51 -04:00
qazal
28c4997236
check for matching shape order in fused reduce (#10641)
* failing test

* shapes match with ones removed
2025-06-05 19:37:22 +03:00
qazal
1190062812
prevent grouper can_chase while fusing arange [pr] (#10623) 2025-06-05 18:50:21 +03:00
qazal
8c5ea00522
push permutes through fused reduces (#10628)
* fix pushing reshapes through reduceops

* reduceop_view_right should assert on ndims mismatch

* update that, view.reshape asserts it
2025-06-05 16:14:04 +03:00
chenyu
d0969f5a1f
cleanup multi tests (#10635) 2025-06-05 00:28:44 -04:00
qazal
571c0296a9
linearizer failure from FUSE_ARANGE default diff (#10629)
* start with test_arange_sum

* test_arange_avgpool2d

* device.renderer.supports_float4
2025-06-04 19:11:52 +03:00
qazal
5056d21b29
add failing TestSchedule.test_arange_sum [pr] (#10627) 2025-06-04 17:23:59 +03:00
qazal
7114b6ab31
viz browser tests (#10626)
* viz browser tests

* expect failure if js/ isn't included

* back green
2025-06-04 14:58:24 +03:00
wozeparrot
4d1686f767
clean: becnhmark -> benchmark (#10620) 2025-06-03 19:28:18 -07:00
qazal
ce9f12dc13
reorder cast before masking constants (#10609)
* failing test from fuzzer

* .numpy() handles bfloat16 better

* const->view->cast becomes const->cast->view

* update TestMovedConstFolding.test_cast_padded
2025-06-03 15:44:03 +03:00
qazal
910cabb081
add kernel count to grouper process replay differ [pr] (#10611) 2025-06-03 15:21:27 +03:00
Ahmed Harmouche
650404a143
[webgpu] Proper shared mem size for packed types (#10585)
* Proper shared mem size in webgpu

* Add test

* Refactor test
2025-06-01 20:18:33 -04:00
qazal
3cc73a0172
simpler process replay main loop [pr] (#10588)
* simpler process replay main loop [pr]

* use logging

* default to 1
2025-06-01 15:03:21 +03:00
qazal
dc882d3d7d
merge process replay and viz captures [pr] (#10581)
* refactoring

* test script

* work

* more work

* diff

* repr splits lines correctly

* that

* add location

* add location

* also don't need name_override

* k.copy

* [pr]

* name_override 2

* err
2025-06-01 12:30:10 +03:00
qazal
1f8a8721e9
remove test_unaligns_idxs, UOps don't have order like this [pr] (#10587) 2025-06-01 12:16:14 +03:00
Ahmed Harmouche
35eb4d357a
[webgpu] Fix atomic shared mem load inside loop (#10530)
* Disable shared mem atomics on webgpu

* allow_any_len in load pattern matcher to fix temp load inside loop
2025-05-31 09:29:02 -04:00
qazal
5b59728c75
refactor LOAD(DEFINE_GLOBAL, VIEW) in kernels to LOAD(VIEW(DEFINE_GLOBAL)) (#10541)
* changes to core tinygrad

* fixups pt1

TC=3
docs/abstractions2.py
IMAGE=2
test_quantize_dsp
test_schedule

* more tests

* green now

* images stay images
2025-05-30 14:27:58 +03:00
chenyu
116ffc4e92
cstyle strips paren for AND and OR (#10560) 2025-05-30 07:09:05 -04:00
qazal
bbf05110a2
use kernelize in TestLinearizer.test_indexing_multireduce [pr] (#10571) 2025-05-30 11:27:09 +03:00
qazal
7051bf3fd5
fixup hardcoded asts ptr dtype and constants [pr] (#10570)
* fixup hardcoded asts ptr dtype and constants [pr]

* use kernelize for test_kernel_count
2025-05-30 09:38:32 +03:00
qazal
066196415f
UOp.valid and const_like work with just shapes [pr] (#10569)
* UOp.valid and const_like work with just shapes [pr]

* pm_quant left

* pm_quant
2025-05-30 08:55:06 +03:00
George Hotz
b3b43a82c4
remove Tensor.no_grad, it's meaningless now [pr] (#10556) 2025-05-28 22:20:02 -07:00