Commit graph

5,694 commits

Author SHA1 Message Date
chenyu
36b4a492a1
explicitly check getitem indices can have at most one ellipsis (#5087)
* explicitly check getitem indices can have at most one ellipsis

previous error with multiple `...`:
```
if index_type not in [None, int, slice, Tensor]: raise IndexError(f"{index_type=} not supported")
                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: index_type=<class 'ellipsis'> not supported
```

this pr:
```
if len(ellipsis_idx) > 1: raise IndexError("an index can only have a single ellipsis ('...')")
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: an index can only have a single ellipsis ('...')
```

* oh we have that already

* test that

* test these
2024-06-21 12:33:18 -04:00
nimlgen
f1e758bacb
graph fuzzer (#5082)
* graph fuzzer

* more options

* mypy

* no underscores for funcs
2024-06-21 18:47:23 +03:00
qazal
5717a54b28
don't use Tensor.empty in kernel opts tests (#5086) 2024-06-21 18:41:03 +03:00
qazal
8aa786232d
docs for running process replay locally (#5083) 2024-06-21 09:55:08 -04:00
nimlgen
fb1bf48cfe
io_uring for copies from disk (#5035)
* exp uring

* fixes and old version

* nv

* cleaner

* cmp vs aio

* fix

* no lib

* fix nv

* linter

* disk_speed_test now runs default

* fixes

* uring -> io_uring

* linter happy

* get_temp_buf comment added

* tiny nits

* put wait back

* test runs everywhere

* remove consts

* remove mmap consts

* do not require iouring to run test, they are generic
2024-06-21 11:36:51 +03:00
chenyu
f6d6760f71
don't cast tuple to list before creating Tensor (#5071)
Tensor constructor supports creating from tuple now
2024-06-20 13:32:56 -04:00
George Hotz
6f6b3b10c9
import from uops, not linearizer (#5064) 2024-06-20 08:08:44 -07:00
chenyu
50700171ef
minor cleanup to reshape arg handling (#5070)
moved None handle to be with argfix, and only resolve -1 if there's a -1
2024-06-20 10:27:27 -04:00
chenyu
f4355d0f1b
check Tensor.permute input arg is a valid permutation (#5069)
also added support of negative axes
2024-06-20 10:01:28 -04:00
qazal
24c89a2a33
move assert_equiv_uops to helpers + use == for dtypes (#5067)
* dtypes should use ==

* use TestUOps

* should use assertIs
2024-06-20 16:39:34 +03:00
chenyu
e8f39fcaaa
check arg to Tensor.flip can appear only once (#5068)
* check arg to Tensor.flip can appear only once

raise RuntimeError if there are multiple

* fix test
2024-06-20 09:33:42 -04:00
qazal
55e02cdd84
generic gate folding (#5061)
* add assert

* fold truthy gates [run_process_replay]

* fold falsy gates [run_process_replay] [no_assert]

* redo asserts

* check both barriers

* spec start

* spec end

* assert srcs

* make test_fold_gated_load_local better

* [run_process_replay] [no_assert]
2024-06-20 16:10:08 +03:00
qazal
ee01e464e3
use process replay as a diff creator (#4903)
* add no_assert option [run_process_replay] [no_assert]

* test [run_process_replay] [no_assert]

* [run_process_replay]

* back to normal [run_process_replay]

* remove the log
2024-06-19 18:17:31 +03:00
chenyu
cc2be9064f
fix out of bound python list into numpy array (#5043)
numpy 2.0 does not allow oob python const and recommends writing as `np.array(value).astype(dtype)`
2024-06-18 18:05:21 -04:00
chenyu
4e5add4d01
move test_tqdm to test/unit/ (#5042) 2024-06-18 17:41:39 -04:00
chenyu
2b2488f2e2
revert creating Tensor from a list without numpy (#5041)
the change was incomplete and broke creating Tensor from a list of np array
2024-06-18 17:31:22 -04:00
chenyu
e2c5054bdd
update resnet.load_from_pretrained (#5040) 2024-06-18 16:29:22 -04:00
chenyu
a3ed4176c8
use tinytqdm in active tests and examples (#5038)
* use tinytqdm in active tests and examples

stress test this before 0.9.1

* no set_description
2024-06-18 16:01:19 -04:00
kormann
fe332464d2
src->vin [run_process_replay] (#5036) 2024-06-18 22:23:49 +03:00
reddyn12
f171006ded
Should this symbolic test fail? (#4501)
* add test

* skip test

* use expected failure decorator

---------

Co-authored-by: schlimeszn <schlimeszn@gmail.com>
Co-authored-by: reddyn <nikidsniper@gmail.com>
2024-06-18 15:21:26 -04:00
kormann
7c3b877216
rename uop [run_process_replay] (#5031)
* rename

* fix unittests

* rename vin

* fix test

* fix type [run_process_replay]

* rm pre commit hook change
2024-06-18 21:34:05 +03:00
chenyu
dc942bf1f6
jit sampling functionn in test_randomness.test_multinomial (#5034)
* jit sampling functionn in test_randomness.test_multinomial

`THREEFRY=1 python3 -m pytest test/test_randomness.py::TestRandomness::test_multinomial --durations 1` 7 sec -> 1.2 sec

* skip that
2024-06-18 14:21:05 -04:00
Francis Lam
8d33998e0d
[run_process_replay] linearizer: fix get_grouping_dims to respect global/local max (#4855)
* linearizer: fix get_grouping_dims to respect global/local max

* fix lidx variable index offset and unrestrict clang/llvm global len

* test reverse variable indexing when reverse_dims is true

* change the collapse axis to be the right most if reversed
2024-06-18 16:51:27 +03:00
Junjun Dong
c8cd6e725c
Remove BinaryOps.SUB. Replace SUB by ADD and NEG in all tests. Regenerate dataset (#4977)
* feat: remove BinaryOps.SUB

* remove SUB in test_early_end_local

* regenerate dataset. remove SUB in test_linearizer_*

* reenable overflow tests

* simplify tensor.sub function by returning a+(-b)

* remove whitespaces

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-06-18 09:06:13 -04:00
chenyu
620fa6e5a2
check Tensor.reshape can have at most one -1 (#5026)
raise RuntimeError to match torch. on master it throws weird errors from shapetracker
2024-06-18 08:17:12 -04:00
chenyu
acaf9a490d
RECIP(-0.0) should be -inf (#5024)
* RECIP(-0.0) should be -inf

added test_dtype_alu for PYTHON backend

* catcht that

* fix those two
2024-06-17 22:26:58 -04:00
chenyu
03b367c014
handle float16 overflow in PYTHON (#5022)
* handle float16 overflow in PYTHON

use `truncate` when constructing tensor from list to make sure all values are packable (might be slow, but should be correct). add truncate_fp16 to cast overflowed values to inf/-inf.

* all valid fmt supports truncate
2024-06-17 21:12:52 -04:00
chenyu
c0139b05d8
python_alu sin(inf) is nan (#5020)
* python_alu sin(inf) is nan

without special handling, it throws ValueError: math domain error

* skip CUDACPU
2024-06-17 19:47:30 -04:00
chenyu
4296507021
Tensor.sum returns in acc_dtype if specified (#5012)
* Tensor.sum returns in acc_dtype if specified

* skip PYTHON for now

* revert that

* relax that
2024-06-17 16:35:52 -04:00
Ray
1ad3b25461
fix einsum output str (#4998)
* fix einsum output str

* new line to satisfy linter

* removed redundant cast (satisfy linter)
2024-06-17 12:18:14 -04:00
nimlgen
794acefbf3
hcq update waits and signals in place (#4984)
* hcq update waits and signals in place

* start amd

* amd works

* prettier

* test

* normal messages

* linetr

* linter 2
2024-06-17 17:19:07 +03:00
qazal
04feeb37e6
look for unsafe pad ops in multiview ShapeTrackers (#5002) 2024-06-17 00:28:12 +03:00
chenyu
72c9b22833
sort vars in jit when building expected input args (#4990)
* sort vars in jit when building expected input args

fixed symbolic jit bugs with two variables.

* sort in clanggraph

* space

* one more
2024-06-16 15:55:51 -04:00
qazal
71aad183fd
check Program from HEAD [run_process_replay] (#4996)
* use the same prg [run_process_replay]

* put var back
2024-06-16 20:12:30 +03:00
chenyu
2b07847f2b
matmul returns in acc_dtype if specified (#4994)
more flexible to not automatically downcast, can fix bert mixed precision training with this
2024-06-16 12:56:15 -04:00
George Hotz
1d6f1a15e1
add lt and ge uop methods [run_process_replay] (#4995)
* add lt and ge uop methods [run_process_replay]

* more correct (should still run process replay)
2024-06-16 09:33:53 -07:00
George Hotz
dac96f177e
ignore indexing in the flopcounter (#4993) 2024-06-16 08:59:55 -07:00
Timmy
01b26756d6
Multireduce Scheduler Tests (#4972)
* scheduler tests

* linters

* cleaning up tests

* fixing tests

* syntax

* fixing metal
2024-06-16 16:30:22 +03:00
chenyu
50bc14d186
re-enable test that loads torch pkl format (#4986) 2024-06-15 14:11:30 -04:00
uuuvn
033fb53f9e
Incomplete/buggy rule breaks process replay on #4976 (#4978)
* Incomplete/buggy rule breaks process replay on #4976

* test passes

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-06-15 15:18:35 +03:00
qazal
d91f0ee85b
add regression test for the neg folding pattern (#4979) 2024-06-15 15:08:28 +03:00
wozeparrot
8209cd3c55
easier llama3 + fetch subdir (#4938) 2024-06-14 13:47:27 -07:00
chenyu
64cda3c481
raise TypeError calling len() on a 0-d tensor (#4970)
matched numpy and torch
2024-06-14 16:34:27 -04:00
chenyu
67e8df4969
remove numpy from dtype (#4969)
replaced all dtype.np with _to_np_dtype defined in tensor.py.

after this, the only numpy usages are (1) Tensor(np.ndarray), (2) construct .numpy() output, (3) numpy random buffer
2024-06-14 15:38:45 -04:00
chenyu
dae1c8abe2
create Tensor from bytes without numpy (#4964) 2024-06-14 13:37:27 -04:00
chenyu
5eee974b2a
construct Tensor from python list/tuple directly (#4947)
* construct Tensor from python list/tuple directly

no numpy. annoying that half memoryview is 3.12 feature...

* simpler, and test

* flat already

* simpler

* cute

* 10% faster

* 5%
2024-06-14 11:36:05 -04:00
geohotstan
90332eb529
Getitem pin None dimension (#4960)
* fix

* remove torch out of bounds test

* 1 more test case
2024-06-14 10:48:59 -04:00
George Hotz
14189bca68
graph_dedup function [run_process_replay] (#4955) 2024-06-14 04:24:37 -07:00
George Hotz
63a8add2c2
move uops add logic to linearize (#4952)
* move logic to linearize

* idk how this should work

* empty
2024-06-14 03:52:37 -07:00
George Hotz
9823752397
make uops.add private (#4950)
* make uops.add private

* modernize all tests
2024-06-14 03:23:25 -07:00