Commit graph

5,694 commits

Author SHA1 Message Date
George Hotz
1253819151
make beautiful indexing use a Variable (#10063)
* make beautiful indexing use a Variable

* stunning test

* better color

* training is broken

* fix tests

* fix variable indexing

* fix test

* no contiguous

* revert that

* revert that too

* indexing two bind

* skip for webgpu

* make not slow
2025-04-27 08:22:38 -04:00
chenyu
4c1ce1a299
don't simplify if div folding resulted in negative numerator (#10064)
* don't simplify if div folding resulted in negative numerator

* test
2025-04-26 17:01:18 -04:00
George Hotz
1805403821
fix rand arange folding (#10060)
* test rand range

* --amend

* fix rand arange folding

* reduce_rangeless fix
2025-04-26 12:24:05 -04:00
qazal
d13c100981
don't sort dims in verify_sink_dims [pr] (#10059)
* don't sort dims in verify_sink_dims [pr]

* 1 can exist with n

* put process_replay warn last

* assert shape is the same

* bring that back
2025-04-26 23:24:30 +08:00
George Hotz
11113c9d07
reduce_unparented (#10056) 2025-04-26 09:48:16 -04:00
George Hotz
ea5dddc537
reduce collapse generic (#10045)
* reduce collapse generic

* new arange folder

* new range folding

* correct with sym

* all tests pass

* indexing ops passes

* failing tests

* fix tests, remove unused

* revert that

* torch indexing is fast

* skip on webgpu

* touchups

* comments
2025-04-26 09:13:24 -04:00
quortus
5cdc96409e
Update outdated renderer.render calls (#10044) 2025-04-26 07:35:19 -04:00
nimlgen
0fc85a2b0a
hcqfuzz: init (#10049)
* hcqfuzz: init

* fix fuzz

* linter

* graph

* taht test

* update readme
2025-04-25 23:19:21 +03:00
Ignacio Sica
76a86735c0
hotfix amd bf16 is supported case (#10039)
* hotfix amd and amd_llvm

* bf16 not supported in ci

* hotfix amd_llvm is not a device

* remove default

* dont gate on ci and amd_llvm

* minor cleanup

* skip bf16 tc test for amd_llvm
2025-04-24 21:29:27 -03:00
Ignacio Sica
b4f823acbe
fix helper_tc_allclose (#9606)
* fix helper_tc_allclose

* cleanup

* hotfix

* cleanup

* cleanup

* check real buffer and add cast for bf16

* cleanup

* fix padded for ops_python

* avoid assert on amd emulated tc

* swap dimensions

* revert, should have nothing to do with padded

* revert fix, should not go in this pr

* remove skip
2025-04-24 18:36:40 -03:00
Rory Clear
3a189fa561
More yolo processing in tinygrad (#9928)
* more tg less np

* update webgpu html for new compile

* resize boxes

* remove text

* add back note

* fix indentation

* fix indentation

* remove magic num

* remove now unused funcs

* back to numpy nms

* no loop

* fix iou suppression

* update test

* dont suppress other classes

* add working scale

* fix expected value, rounded up 0.24 was being counted

* add postprocess bool for onnx test

* fix indents

* clean

* clean

* fix indent

* remove print

* fix indent

* remove unused import

* remove hardcoded 0.25

* space

* spacing

* clean label_predictions func

* remove single item lists

* space

* use postprocess output in test

* space

* clean

* clean

* remove redundant threshold

* remove redundant threshold

* clean

* rename var

* move loop into func

* unhardcode iou_threshold

* remove unused values

* clean

* add note

* clean

* keep const

* move back funcs

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-04-24 16:21:46 -04:00
Ignacio Sica
51ca19d061
set test_tensor_cores_padded_amd to expectedFailure (#10036)
* init

* add expected failure to correctly track progres

* hotfix

* skip for amd_llvm as well

* add skip

* add pr number

* move comment to amd test

* change reason
2025-04-24 17:11:40 -03:00
uuuvn
779aa1e2e9
Enable image tests on cloud if clouddev supports image (#9903)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-04-24 14:30:12 -04:00
Ignacio Sica
373ca59b7f
use is_dtype_supported to check dtype support in tc tests (#10035) 2025-04-24 14:59:14 -03:00
uuuvn
754d789f51
Fix and enable jit tests on CLOUD (#10031) 2025-04-24 18:39:31 +03:00
George Hotz
aec75f51ef
fixup some slow CI tests [pr] (#10027)
* fixup some slow CI tests [pr]

* shrink test index
2025-04-24 09:20:49 -04:00
qazal
c990aac2b1
skip flaky test_transcribe_file1_OOB (#10026) 2025-04-24 21:09:43 +08:00
Sieds Lykles
e75be6eafc
[bounty] [pr] index validation with z3 (#9981)
* index validation with z3

* Change comment

* toposort -> toposort()

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-04-24 08:06:08 -04:00
quortus
9e49721c47
CPUGraph support for clang (#10014)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-04-24 07:52:35 -04:00
Park Jun
c3ad7b2a84
create randperm and support pytorch backend (#10019) 2025-04-24 07:29:02 -04:00
nimlgen
1c5e353249
am: use mmio iface (#10012)
* am: use mmio iface

* linters

* fixes

* fixes + cleanups

* mute

* mypy

* style
2025-04-24 00:27:04 +03:00
George Hotz
2ed3acd767
toposort is a function [pr] (#10004) 2025-04-23 16:25:03 +01:00
uuuvn
0730ff0e50
Skip test that requires lru if device's allocator isn't lru (#10003) 2025-04-23 16:12:56 +01:00
uuuvn
9de73ccc22
Skip test that requires python 3.12 on older versions (#10001)
`out.cast(it.dtype.fmt).tolist()` fails with `ValueError: memoryview: destination format must be a native single character format prefixed with an optional '@'`
2025-04-23 10:09:26 -04:00
George Hotz
71ecc7fa1a
use a pattern matcher for upcast [pr] (#10000) 2025-04-23 14:24:23 +01:00
George Hotz
cc1087d2ec
move simplify into views_to_indexed_uops (#9999)
* move simplify into views_to_indexed_uops

* cache that
2025-04-23 13:50:27 +01:00
pkotzbach
dbbd755cba
FP8s truncate (#9937)
* truncate fp8

* fix

* maybe like that?

* fix linters

* ruff

* move from extra and add ml_types to tests

* minor changes

* str to dtypes and nan support

---------

Co-authored-by: pkotzbach <pawkotz@gmail.com>
2025-04-22 19:12:49 -04:00
qazal
f4ec57baff
new schedule linearizer enqueues KERNEL UOps [pr] (#9993)
* new schedule linearizer enqueues kernels [pr]

* no defaultdict

* diff

* minor
2025-04-23 05:17:58 +08:00
George Hotz
d1f6701eb7 hotfix: lower amd threshold + improve block reorder test 2025-04-22 20:44:29 +01:00
nimlgen
db51133537
rename HWInterface -> FileIOInterface (#9989)
* rename HWInterface -> FileIOInterface

* ugh
2025-04-22 22:18:57 +03:00
George Hotz
c1539b0319
putting add first orders loads as expected (#9991) 2025-04-22 20:12:05 +01:00
nimlgen
bd580d8ea4
hcq: use mmio interface in nv (#9986)
* hcq: start mmio interface

* allow double cast

* revert

* faster?

* simpler, not needed more now

* dd

* types

* fix
2025-04-22 21:58:12 +03:00
George Hotz
feee6986c9
faster block reorder (#9990)
* faster block reorder [pr]

* that shouldn't change order

* key just in sorted

* ind
2025-04-22 19:18:57 +01:00
qazal
6cb2d18c03
refactor schedule linearize to defaultdict [pr] (#9984)
* refactor schedule linearize to defaultdict [pr]

* skip that

* don't need .get
2025-04-23 00:00:23 +08:00
chenyu
9e5e371999
make DISABLE_COMPILER_CACHE a ContextVar [pr] (#9983) 2025-04-22 10:32:54 -04:00
qazal
bbc324f5dc
remove CAST_AFTER_EXPAND (#9980) 2025-04-22 21:06:11 +08:00
George Hotz
c519b553db
non recursive toposort is 2x+ faster (#9979)
* non recursive toposort is 2x+ faster

* don't change the order
2025-04-22 13:59:38 +01:00
qazal
7b55846e08
prep STORE UOp creation for multi output [pr] (#9975)
* prep STORE UOp creation for multi output [pr]

* test_multioutput_ast
2025-04-22 19:34:52 +08:00
George Hotz
e358e0a0c6
move metadata set to tensor [pr] (#9976)
* move metadata set to tensor [pr]

* only track that in tensor.py
2025-04-22 12:30:35 +01:00
George Hotz
f5dc70c624
microbenchmarks + micro speed ups (#9972)
* microbenchmarks

* forgot the ubenchs

* clean up type verify
2025-04-22 11:30:46 +01:00
qazal
1cf4e24ca5
fix kernelize usage with pm_gradient (#9953)
* fix kernelize usage with pm_gradient

* remove that
2025-04-22 17:26:05 +08:00
qazal
36ed3c3253
fix kernelize with VIEW children (#9961) 2025-04-21 23:38:46 +08:00
qazal
e8910540f6
Kernelize can be called multiple times on a Tensor (#9949)
* Kernelize can be called multiple times on a Tensor

* add (failing) test_kernelize_bw
2025-04-21 06:28:47 +08:00
qazal
1d90be2cff
match kernelize API in process replay (#9948) 2025-04-21 05:23:41 +08:00
qazal
e20ef7196a
Tensor.kernelize (#9845)
* add kernelize

* remove that

* kernelize returns self

* update abstractions2.py

* kernelize in test_schedule

* temp: assert BUFFER_VIEW's existence

* ASSIGN must have a buffer or subbuffer target

* assert and shrink

* fix

* padded setitem

* var

* toposort once

* extra

* base_buffer

* end with BUFFER_VIEW

* setitem for disk

* test_setitem_becomes_subbuffer

* mul slice test

* torch backend fix 1

* non-deterministic

* keep subbuffer
2025-04-20 20:53:49 +08:00
qazal
dd16087f62
fold double ASSIGN to same target (#9941) 2025-04-20 19:06:38 +08:00
qazal
9a9aba4cd5
setitem tests (some failing) from kernelize (#9940) 2025-04-20 18:47:55 +08:00
chenyu
6c30948df6
hand_coded_optimizations returns list[Opt] [pr] (#9938)
new api looks like `k.apply_opts(hand_coded_optimizations(k))`
2025-04-19 20:26:59 -04:00
chenyu
720f20865b
remove required_optimizations (#9848) 2025-04-19 16:51:16 -04:00
Ignacio Sica
023b1c28a2
test_tensor_cores_padded refactor (#9724)
* set pad t 3 for amd padded tc test

* change pad for amd regardless CI

* test tc padded uops and correctness separately

* add test_tensor_cores_padded_uops test to ci

* remove redundant chack for amd device

* cleanup
2025-04-18 17:05:54 -03:00