Commit graph

4,777 commits

Author SHA1 Message Date
chenyu
798ea61377
widen test_ops [low, high] and more strict atol (#4906)
default [low, high] changed from [-1.5, 1.5] to [-2, 2] (except tan).
dropped several explicit atol if it's unnecessarily larger than default 1e-6.
tested on mac, tinybox red / green
2024-06-10 20:47:09 -04:00
chenyu
97b05f567e
revert the .detach() in layernorm (#4904)
* revert the .detach() in layernorm

it's only correct in LayerNorm where input is the data, and not correct in GroupNorm and InstanceNorm that reused layernorm.
Added backward tests for weights, bias and input for these norms.

* bigger atol for llvm

* relax backward more
2024-06-10 18:02:05 -04:00
qazal
8b5bcf309a
process replay in all of CI (#4884) 2024-06-10 14:49:29 -04:00
chenyu
c8cd637236
test case for Tensor.var reducing over size = 1 axis (#4902)
backward failed when correction >= reducing n
2024-06-10 12:11:39 -04:00
chenyu
b56ae5606c
cosmetic changes to uop _match (#4897)
minor cleanup before fixing two level match
[run_process_replay]
2024-06-09 18:29:42 -04:00
SnakeOnex
b1db2d0094
tqdm replacement (#4846)
* tqdm replacement almost

* formatting

* formatting

* imports

* line len

* fix

* removed set description :(

* removed set description :(

* fix

* fix

* green check?

* rewrote as class, fixed several bugs

* types spacing

* removed imports

* fix

* iterable

* typing

* mypy disagreement

* imports

* more e2e tests vs tqdm

* removed seed setting

* robustness against time.sleep() flakiness

* flaky fix

* automatic bar closing when count==total

* cleanup

* clang error with tqdm

* tqdm back

* use os lib, print to stderr (fixes the clang bug, where the bar was leaking into the generated c program

* back to shutil

* unit_scale + unit_scale test

* custom unit to tests

* pretty

* clean

* removed flaky test

* less test iters

* empty line

* remove disable
2024-06-09 23:46:03 +02:00
qazal
1dde829e34
UOps.IF* to graph spec (#4894) 2024-06-09 07:00:12 -04:00
George Hotz
b9afb0d577
test uop as symbolic (#4870)
* start work

* more tests passing

* more tests passing

* more

* 34 failures

* expect the failures

* remove broken rule

* render is fine in just the test

* simplify and put in test
2024-06-09 12:15:11 +02:00
nimlgen
654a8b9ef7
retire hsa (#4885)
* retire hsa

* EMULATE_AMD
2024-06-09 11:33:03 +03:00
chenyu
e33efd6a3d
test cases for multitensor adds const (#4892)
Tested const remained const in ast. Removed the TODO in _to_const_val too
2024-06-08 22:57:48 -04:00
nimlgen
d24e57c615
amd support kernel with bf16 (#4863)
* amd support kernels with dispatch_ptr

* fixes

* line savings

* one line

* try

* Revert "try"

This reverts commit 5f340dfdd4.

* not used will be back when hsa is gone

* gone will be back

* add this as well
2024-06-08 22:52:32 +03:00
qazal
1e3325f369
raise assert [run_process_replay] (#4879) 2024-06-08 08:31:44 -04:00
qazal
66dfd5e7bf
faster codegen process replay (#4858)
* faster codegen process replay

* use self.copy

* regenerate

* delete copy

* test a real error [run_process_replay]

* revert the error change
2024-06-07 16:20:57 +03:00
nimlgen
47bfd7c2b7
fix sync of offset buffers in graphs (#4850)
* correctly sync offset buffers

* test

* style

* run less

* just use base
2024-06-06 16:09:45 +03:00
chenyu
99e7a1d5e9
support symbolic reshape with non-contiguous (#4844)
* support symbolic reshape with non-contiguous

pre-requisite for symbolic arange (make symbolic ones that can be folded).

* test cases

* typo

* shorter
2024-06-05 16:01:19 -04:00
chenyu
a352b6d9ce
symbolic Tensor.var (#4843)
taken from #4446 and add more tests
2024-06-05 12:55:54 -04:00
Timmy
887643cf34
Multireduce atomic local load/store test (#4786)
* atomic load/store test

* tests for nested & unrolled

* check barriers

* linters

* cleaning up diff

* fix assert in _temp_create_multireduce_ast changes

* cleaning up the check for redundant barriers

* minor cleanups for the assert

* always seed randn, helps with debuggability

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-06-05 14:41:19 +03:00
Szymon Ożóg
273945df67
Regression tests for bitshift (#4829)
* Regression tests for bitshift

* Add test for bitshift not triggered

* Enable tests
2024-06-05 11:42:34 +02:00
Alec Chen
5ac30c29d8
Construct UOps patterns using UPat (#4821)
* Allow UPat pattern definitions

* Convert pattern matcher tests to UPat constructions

* Convert constant_folder patterns to upat constructions

* Convert assembly patterns to upat constructions

* [run_process_replay] Drop UPat.from_dict
2024-06-05 10:29:37 +02:00
Szymon Ożóg
e47277d18a
Disable for PTX as well (#4838)
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
2024-06-05 10:37:59 +03:00
Francis Lam
890e7c12bb
test/external/verify_kernel: add support for single pickled kernel (#4836) 2024-06-04 18:59:21 -04:00
Elias Wahl
04e237328b
Refactor to class style (#4804) 2024-06-04 14:08:31 -07:00
David Hou
cddce0e168
don't cast before view on shape changing bitcast (#4833)
* don't cast before view on shape changing bitcast

* make sure cast before view triggers
2024-06-04 16:04:52 -04:00
Alec Chen
4909a0d16f
Fix arg set in pattern matcher (#4830) 2024-06-04 15:10:09 -04:00
Alec Chen
c96026ac65
Add arg set regression test for pattern matcher (#4827)
* Add arg set regression test for pattern matcher

* real regression

---------

Co-authored-by: qazalin <qazal.software@gmail.com>
2024-06-04 13:35:09 -04:00
chenyu
a70e8a80d7
test_ops test cmp with special floats (#4826)
prepare to fix nan, it did not work with ge and le before either
2024-06-04 12:10:21 -04:00
chenyu
3afc914617
CMPEQ -> CMPNE and make it safe to pad (#4818)
* CMPNE

* new dataset
2024-06-03 18:02:15 -04:00
Szymon Ożóg
bb7b031c5c
Bitshift (#4728)
* WIP

* Cleanup

* Cleanup

* Fix variable, refactor to use set

* right shift should be signed/unsigned

* Test for bitshifts

* Allow a neg
2024-06-03 21:16:01 +02:00
nimlgen
e78a9bf3f2
support view in nv/amd (#4812)
* support view in nv/amd

* fix amd

* fix

* run test on nv/amd
2024-06-03 22:11:52 +03:00
chenyu
45083ccb43
canonicalize 0 in shape in View.create (#4815)
set strides to 0, offset to 0, mask to None, and contiguous to True with size 0 view.
2024-06-03 13:37:37 -04:00
qazal
f64fa51a64
process replay for test/* (#4799)
* add input to unit tests [run_process_replay]

* add setup [run_process_replay]

* run tests [run_process_replay]

* add cuda and amd [run_process_replay]

* run everything but BEAM=2 [run_process_replay]

* skip export_model [run_process_replay]

* fix amd CI

* add concurrency back
2024-06-03 12:01:58 +03:00
Timmy
ca32921f84
Multireduce PADTO Test (#4785)
* padto test

* expanded multireduce padto tests

* cuda doesnt run on ci

* moving padto_where_multireduce test to SUM so that we can check the reduce axis

* cleaning up tests some more

* add wanna_outputs

* refactor test_padto_sum_multireduce

* fix max and refactor where

* fix axis

---------

Co-authored-by: qazal <qazal.software@gmail.com>
2024-06-02 13:46:53 +03:00
chenyu
1ffa5ec492
unit test ShapeTracker.consecutive (#4800) 2024-06-01 10:10:51 -04:00
chenyu
8942230b1f
minor cleanups of test_tensor and extend some cases (#4794) 2024-05-31 10:43:22 -04:00
qazal
637f482588
configure derandomizing CI tests (#4793) 2024-05-31 17:06:58 +03:00
chenyu
7cc883ecee
CMPLT is safe to pad (#4790)
0 < 0 evals to False
2024-05-30 22:50:48 -04:00
chenyu
236390aafb
fix lazy r const folding with variable shape (#4783)
currently not supporting const fold symbolic shape. I think it's possible with a refactor to Tensor.from_node.
also added some failed required tests for symbolic arange.
2024-05-30 15:19:28 -04:00
chenyu
4921de1945
fix cumsum of 0-d tensor (#4781)
* fix cumsum of 0-d tensor

* _resolve_dim for all
2024-05-30 12:41:09 -04:00
chenyu
4cf0eadf8f
failed test case for ellipsis in einsum (#4779)
from #4156
2024-05-30 11:14:42 -04:00
Alec Chen
e89bc42cc7
Add UOps pattern matcher regression tests (#4725)
* add pattern matcher regression tests

* Remove test for dtype str after rebasing

* Make test uops match type spec

* leave const const, add const alu vin test

* correct uops

* actually correct uops
2024-05-30 17:12:20 +03:00
qazal
c2945be0a3
add fused tensor core opts tests (#4775)
* add fused tc opts tests

* n=64
2024-05-30 13:50:00 +03:00
chenyu
f1bf916b8a
apply NOOPT in test_arange complexity (#4774)
with hcopt, arange(2560) uses less ops than arange(256)
2024-05-29 23:12:35 -04:00
chenyu
cde7a7cda7
isolate the 134ms kernel in train_gpt2.py (#4773)
133ms on tinybox red with BEAM=2
2024-05-29 17:26:24 -04:00
chenyu
59c6472b9f
check contiguous in View.create after canonicalizing mask and offset (#4770)
mask / offset / strides can change during canonicalization, and contiguous can be True at the end
2024-05-29 11:31:13 -04:00
nimlgen
019f4680e5
check dims before execution on nv (#4756)
* check dims before execution on nv

* fix linter
2024-05-28 16:57:28 +03:00
qazal
0e824741c4
pre multi reduce codegen/* cleanup (#4755)
* refactor self.reduceop

* free lines

* fix test
2024-05-28 08:15:48 -04:00
chenyu
53b9081aab
check arg types of Tensor.randint (#4751)
raise TypeError if low, high, dtype are not ints
2024-05-27 20:24:10 -04:00
qazal
0e69b22629
multireduce OptOps tests (start) (#4733)
* start

* full tests

* add skips

* unrelated

* notes
2024-05-27 12:21:33 +03:00
qazal
c7b1d802f1
delete duplicate tests in test_linearizer (#4723)
* delete duplicate test

test_simplify_uop isnt needed

max works

* ci

* remove skip

* add skip back
2024-05-26 08:11:42 +03:00
Szymon Ożóg
de5c69c4c9
Unify test_dtype naming conventions (#4730) 2024-05-25 10:12:40 -04:00