Commit graph

3,355 commits

Author SHA1 Message Date
chenyu
f502c9b08f
minor cleanup of View.reshape (#3088)
* minor cleanup of View.reshape

removed some redundant logic

* new_strides

* revert that
2024-01-11 13:05:54 -05:00
chenyu
f40299c3fe
remove the third merging state in view._merge_dims (#3085)
no logic depends on state == 0 or state == 2
2024-01-11 12:07:43 -05:00
chenyu
7f9590d357
hotfix disable flaky mac runner wino cifar (#3087) 2024-01-11 11:57:05 -05:00
Yixiang Gao
adcc844755
cat works (#3086) 2024-01-11 08:25:20 -08:00
chenyu
cdeab9ad97
mem_estimate is always int, not symbolic (#3083)
* mem_estimate is always int, not symbolic

op_estimate can be symbolic, but mem_estimate is always int, thus we don't need to sym_infer it.
fixed some long lines too. update_stats is a very big function

* operator does not need underscores
2024-01-10 23:39:51 -05:00
Francis Lam
162fa61a32
wmma: clean up device specific tensor core code (#3081) 2024-01-10 21:03:09 -05:00
chenyu
d218d13885
minor cleanups of lazy.py (#3080) 2024-01-10 20:17:56 -05:00
chenyu
56dda33fc6
Tensor.expand resolves the new_shape before shortcut return (#3078)
similar to how reshape is done. also updated shrink shortcut criteria to read similar to pad
2024-01-10 14:29:15 -05:00
Yixiang Gao
6842476ca6
better test demonstration (#3077)
* a better test demonstration

* fix white space
2024-01-10 10:50:52 -08:00
chenyu
507e0afba0
fix onehot and jit in examples/transformer (#3073)
trained to 0.999 in < 6 seconds on M1 Max consistently
2024-01-10 02:22:41 -05:00
chenyu
4342fccc83
filter_strides -> canonicalize_strides (#3072) 2024-01-10 01:06:48 -05:00
chenyu
023f5df0e9
simpler idxs_to_idx (#3071) 2024-01-10 00:30:10 -05:00
George Hotz
2495ca95c7
early gate the graph (#3070) 2024-01-09 20:17:13 -08:00
George Hotz
ff0d6e4551
jit autorealizes output (#3069) 2024-01-09 20:10:22 -08:00
George Hotz
ae83733431 hotfix: examples/transformer.py 2024-01-09 19:28:09 -08:00
chenyu
145718a90f
unbind view or shapetracker also returns var_val (#3067)
* unbind view or shapetracker also returns var_val

4% faster for llama compile time

* one line less

* unbound_views
2024-01-09 21:45:05 -05:00
jxdv
ef3aa6d7fb
update gh actions (#3033)
* update checkout actions

* update upload artifact

* update setup python

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-01-09 17:52:22 -08:00
George Hotz
3f80c1a098
speedtweaks3: apply shouldn't use the tensor constructor (#3065)
* speedtweaks3: apply shouldn't use the tensor constructor

* replace 0 size with CONST, not 0 in shape
2024-01-09 17:42:33 -08:00
George Hotz
0abe72b677 hotfix: use is for enum compare, a few more 2024-01-09 16:53:13 -08:00
George Hotz
b2b5849f74 hotfix: use is for enum compare 2024-01-09 16:47:27 -08:00
George Hotz
ac3f246c11
cached size (#3060)
* cached size

* simplify simplify

* 0 doesn't have base

* fix test

* cleaner cache

* hmm, metal is flaky on this...might be real(ish) but useless as test

* short circuit reshape/expand properly

* better reshape bypass
2024-01-09 16:37:37 -08:00
Yixiang Gao
73b72b8de2
test scaled dot product attention (#3063)
* add test

* add initial test for scaled dot product attention

* test pass for scaled dot product attention
2024-01-09 14:30:57 -08:00
chenyu
55ac2a2cf7
Tensor.cat with 0 shape tensors (#3062)
* Tensor.cat with 0 shape tensors

supported both 0 in cat axis (for a subset of input), or 0 in non-cat axis (all needs to be 0)

* no shp
2024-01-09 16:54:06 -05:00
chenyu
f0d7ad8aaa
fix gpt2 attention with start_pos = 0 (#3061)
* fix gpt2 attention with start_pos size 1

test cases taken from ll_transformer branch

* fix interpreted
2024-01-09 16:14:55 -05:00
George Hotz
39b91131bc
Speed tweaks (#3059)
* base doesn't have to be a function

* no double fetch

* pop, don't check

* make the gc happy

* avoid hasattr

* cache canonicalize

* remove assert, faster base

* don't redefine that every time
2024-01-09 11:34:17 -08:00
George Hotz
bf6281f316 hotfix: remove useless slow assert from ShapeTracker 2024-01-09 10:56:36 -08:00
George Hotz
4b687af98f
explicit lazybuffer caching (#3058) 2024-01-09 10:52:37 -08:00
George Hotz
2c6f2e899d
No extra vars call (#3054) v0.8.0
* remove unused reciprocal

* comment

* remove unneeded call to vars

* free speedup
2024-01-09 09:52:58 -08:00
Yixiang Gao
259bf9bffc
add multigpu test for RMSNorm (#3056)
* need all gather

* add two multigpu test scenarios for RMSNorm
2024-01-09 09:52:51 -08:00
chenyu
dab8214103
unit tests for Device.canonicalize (#3055) 2024-01-09 12:47:20 -05:00
George Hotz
374f7659a7
remove unused reciprocal (#3053)
* remove unused reciprocal

* comment
2024-01-09 08:59:04 -08:00
Yixiang Gao
a686663657
make Embedding device aware for multigpu (#3051)
* make Embedding device aware for multigpu

* split line instead of igore because that's cheating

* add test incomplete

* add test complete

* remove comment

* fix white space

* remove nn.Embedding
2024-01-08 20:09:26 -08:00
chenyu
19298e7a3f
Device._buffers -> Device._devices (#3052)
backend devices used to be called buffers
2024-01-08 21:30:38 -05:00
chenyu
4f4e8634b8
use in_features directly in nn.Linear.__init__ bound check (#3050)
* use in_features directly in nn.Linear.__init__ bound check

get rid of the unnecessary check of isinstance int

* that is always int

* long lines
2024-01-08 19:32:35 -05:00
chenyu
ee6a73826b
clean up test_nn.py (#3049)
used Tensor.train decorator, reordered to always tinygrad instances first, and removed redundant idx cast
2024-01-08 18:45:03 -05:00
chenyu
3eb3664074
fix nn.Embedding with empty length input (#3048) 2024-01-08 18:08:36 -05:00
George Hotz
7ea2e0035b
move children for speed (#3047)
* move children for speed

* no children anymore
2024-01-08 15:02:32 -08:00
George Hotz
655c6f61d3
St real size (#3046)
* track the size in the lazybuffer

* shapetracker real size

* lint
2024-01-08 14:44:53 -08:00
chenyu
1d730b8853
remove ACCUM_FP32 in simple_matmul.py (#3045)
* remove ACCUM_FP32 in simple_matmul.py

accumate for half inputs is always in float

* move test llama compile speed to metal
2024-01-08 17:37:57 -05:00
George Hotz
47d67da830
track the size in the lazybuffer (#3044) 2024-01-08 13:44:55 -08:00
George Hotz
c003be7309
Revert "track size in shapetracker" (#3043)
* Revert "track size in shapetracker (#3026)"

This reverts commit a8ba1ac08f.

* st.size
2024-01-08 13:13:39 -08:00
George Hotz
50754f1494
add caches there (#3042)
* add caches there

* no curl
2024-01-08 13:02:16 -08:00
George Hotz
c5a941d466
webgl backend in extra (#3041)
* WebGL WIP

* 84% of ops passing test

* tests passing 100%

* Cleanup, refactor

* Shave off some lines

* Work on dtypes

* TestOps at 100% again

* Efficient net shaders compile in browser webgl2

* Compile all efficientnet shaders in browser

* Create empty textures for tensor buffers

* Run program. Up next weight loading

* Exported WebGL model working

* Add tests, refactor

* Explicit cast alu for GLSL

* Fix CI tests

* WebGL efficientnet demo

* Compile and run yolov8 in browser

* Fix imports

* Simplify yolo compile

* Fix bool*bool and cast cmplt to float

* More tests

* Do std tests pass on CI?

* Skip std tests on CI

* Remove explicit_cast_alu hack, and solve it in code_for_op

* Move to new dtype-less alloc api

* Remove local size hack: optimize local_size only if device has local

* Remove glsl.py, and move content to cstyle

* dont_use_locals in opts

* Fix dtype tests

* type_map in CStyleLanguage

* Make core changes smaller, cleaner, refactor export_model and demo

* Skip pad_slice

* Simplify: render_const, render_conditional

* solve bool alu for other binops, cleaner ops_webgl

* Fix noopt hack

* Remove some skipIfs

* WebGL image hack

* type_names is a better name

* global_max

* Fix dtype import

* Fix type_names -> type_map

* Fix lint

* Remove webgpu, back to 5k lines (#3040)

* remove webgpu

* max 5000 lines

* revert those to master

* retain that cstyle

---------

Co-authored-by: Ahmed Harmouche <ahmedharmouche92@gmail.com>
2024-01-08 09:29:13 -08:00
George Hotz
8cbcd1b342
Remove webgpu, back to 5k lines (#3040)
* remove webgpu

* max 5000 lines
2024-01-08 09:10:07 -08:00
George Hotz
cf2eea961c more beautiful_cartpole with exposed hparams 2024-01-07 17:41:09 -08:00
Yixiang Gao
44618427f1
add bf16 type_map for both cuda and hip (#3036)
* add typemap bfloat16 for cuda and hip

* add render_dtype

* add def in CStyleLanguage

* fix def

* save one line

* add header file for cuda bf16
2024-01-07 14:26:55 -08:00
chenyu
ef5f545fd8
add more Tensor.clip test cases (#3034)
* add more Tensor.clip test cases

add cases for same low/high and both negative etc

* case min > max
2024-01-07 13:08:59 -05:00
chenyu
c9371f0d31
hotfix llama conversation mode (#3031)
without contiguous on keys and values, it runs but the update is incorrect
2024-01-06 16:57:07 -05:00
chenyu
fa707c81e5
move beautiful cartpole action sampling inside jit (#3028)
tested by getting 3 full scores in a row
2024-01-06 00:39:55 -05:00
George Hotz
ebb81e8f11 hotfix: st.size() -> st.size in llama 2024-01-05 20:18:52 -08:00