tinygrad/extra
Ahmed Harmouche 10618aba98
Bring back WebGPU (#7063)
* Start from andredaprato:webgpu-clean

* Fix infs

* inf wgsl function is not needed

* Emulated ulong for threefry, more tests passing

* Randomness tests passing

* Update model export to support new changes in webgpu, efficientnet export works again

* Simplify shift emulation in wgsl

* Delete test file

* Fix bigger than u32 u32 literal

* Why was skip copies added here?

* Python3.12 for webgpu tests

* Fix model export syntax error

* Get test ops passing with some skips

* Fix lint

* Much simpler shift

* Run more tests

* Timestamp queries are not supported in CI, so skip search tests

* All fancy indexing passing

* r is ctx

* Run more dtype tests by using is_dtype_supported

* Cleanup ulong shift rendering

* UPat -> Pat, UOps -> Ops

* Pat -> UPat

* Refactor render_ushift if-else

* Pattern to avoid ulong mul

* Remove vals_dtype

* is_nan trick + rewrite, test_isnan passing

* Rewrite a * select(1, nan, gate) -> select(a, nan, gate)

* No arg, just op

* Support char, uchar, short, ushort

* Run test_index_mnis now that we have uint8

* Fix pyling

* Save 3 lines by using base Compiler

* No more long emulation

* Remove fixup_binops

* No more external_local_bufx wgsl specific cstyle modif, use base extra_pm

* Simpler, faster copyin/out

* Skip some new tests that use long

* Fix typo

* copyout touchup

* Save lines by using render_cast

* WebGL is not supported in core, delete it from is_dtype_supported

* More narrow test skips for some unary tests

* TernaryOps, UnaryOps -> Ops

* TinyGrad supports WebGPU

* StableDiffusion demo: f16tof32 gpu is a lib, update UI

* Packed load/store, no more scale_size, no core tinygrad changes

* Rename copyin, copyout

* Device -> dev

* Fix lint

* Pattern matcher rule for packed load/store

* Refactor

* Shorter packed load/store

* this should fix lint

* Fix mypy

* SD compile script working

* New SD webgpu UI

* New default prompt

* New SD weights

* Fix title when webgpu not available

* Run symbolic tests, simplify is_nan, use round_up

* Show step time on UI

* Bump minimum wgpu version to v0.19

* Fix latent

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-11-26 12:26:40 +08:00
..
accel move things, clean up extra (#2292) 2023-11-13 20:18:40 -08:00
assembly s/UOps/Ops (#7500) 2024-11-03 11:26:10 +08:00
backends Bring back WebGPU (#7063) 2024-11-26 12:26:40 +08:00
datasets set PAGE_SIZE=1 and generate new dataset (#7559) 2024-11-05 11:25:01 -05:00
disassemblers/adreno qcom fix disasm (#6703) 2024-09-24 15:23:43 +08:00
dsp add qcom dsp runtime (#6112) 2024-09-13 21:01:33 +03:00
gemm BufferSpec and ProgramSpec [pr] (#7814) 2024-11-21 12:18:05 +08:00
hip_gpu_driver feat: autogen from kernel register offset headers (#6056) 2024-08-12 14:08:35 -07:00
hiprtc use comgr to compile (#3248) 2024-01-26 18:27:49 -08:00
junk coder.py can write and run code (#2439) 2023-11-25 12:27:54 -08:00
mockgpu Hook memoryview via class instead of a function (#7627) 2024-11-11 09:07:06 +08:00
models Tensor.cummax (#7854) 2024-11-22 15:55:02 -05:00
nv_gpu_driver nv fix shared_memory_size (#7239) 2024-10-23 21:59:47 +03:00
optimization Remove UnaryOps, BinaryOps, TernaryOps, MetaOps [pr] (#7725) 2024-11-16 20:56:56 +08:00
qcom_gpu_driver qcom match texture/sampler descriptors to OpenCL (#7622) 2024-11-11 21:56:51 +03:00
resnet18 beat mlx at resnet 18 (#6611) 2024-09-20 11:28:01 +08:00
archprobe.py move dtypes to dtype.py (#2964) 2024-01-01 14:58:48 -08:00
augment.py [ready] Replacing os with pathlib (#1708) 2023-08-30 10:41:08 -07:00
disk_read_speed.py io_uring for copies from disk (#5035) 2024-06-21 11:36:51 +03:00
dump_cache.py wow how did i think that was okay (#2339) 2023-11-16 21:21:11 -08:00
export_model.py Bring back WebGPU (#7063) 2024-11-26 12:26:40 +08:00
f16_w_uint32.py fix various examples (#4691) 2024-05-22 20:43:21 -04:00
gradcheck.py Fix: Jacobian tests [WIP] (#1126) 2023-07-05 15:36:22 -07:00
hip_events.py move autogen to runtime/autogen (#3254) 2024-01-26 12:44:19 -08:00
introspection.py remove graph [pr] (#7085) 2024-10-16 11:40:07 +08:00
lr_scheduler.py use at least float32 for optim.lr (#4297) 2024-04-25 14:42:28 -04:00
mcts_search.py safe softmax trick in MCTS ucb_explored_children (#7515) 2024-11-03 15:59:31 -05:00
multitensor.py multitensor start (#2676) 2023-12-07 17:07:05 -08:00
onnx.py remove copied is_dtype_supported from onnx [pr] (#7646) 2024-11-11 19:20:32 -05:00
onnx_ops.py add inverse trig functions to Tensor (#7805) 2024-11-21 09:13:36 -05:00
ring_copy.py ring copy example (#3185) 2024-01-19 23:34:30 -05:00
thneed.py new style device (#2530) 2023-11-30 17:07:16 -08:00
threefry.py feat: make buffer (#6745) 2024-09-25 18:31:03 +08:00
to_movement_ops.py s/UOps/Ops (#7500) 2024-11-03 11:26:10 +08:00
training.py tinytqdm.set_description and tinytrange (#5101) 2024-06-22 14:45:06 -04:00
transfer_speed.py hotfix: copy size is in bytes 2024-01-17 16:44:15 +00:00