Bring back WebGPU (#7063)

* Start from andredaprato:webgpu-clean * Fix infs * inf wgsl function is not needed * Emulated ulong for threefry, more tests passing * Randomness tests passing * Update model export to support new changes in webgpu, efficientnet export works again * Simplify shift emulation in wgsl * Delete test file * Fix bigger than u32 u32 literal * Why was skip copies added here? * Python3.12 for webgpu tests * Fix model export syntax error * Get test ops passing with some skips * Fix lint * Much simpler shift * Run more tests * Timestamp queries are not supported in CI, so skip search tests * All fancy indexing passing * r is ctx * Run more dtype tests by using is_dtype_supported * Cleanup ulong shift rendering * UPat -> Pat, UOps -> Ops * Pat -> UPat * Refactor render_ushift if-else * Pattern to avoid ulong mul * Remove vals_dtype * is_nan trick + rewrite, test_isnan passing * Rewrite a * select(1, nan, gate) -> select(a, nan, gate) * No arg, just op * Support char, uchar, short, ushort * Run test_index_mnis now that we have uint8 * Fix pyling * Save 3 lines by using base Compiler * No more long emulation * Remove fixup_binops * No more external_local_bufx wgsl specific cstyle modif, use base extra_pm * Simpler, faster copyin/out * Skip some new tests that use long * Fix typo * copyout touchup * Save lines by using render_cast * WebGL is not supported in core, delete it from is_dtype_supported * More narrow test skips for some unary tests * TernaryOps, UnaryOps -> Ops * TinyGrad supports WebGPU * StableDiffusion demo: f16tof32 gpu is a lib, update UI * Packed load/store, no more scale_size, no core tinygrad changes * Rename copyin, copyout * Device -> dev * Fix lint * Pattern matcher rule for packed load/store * Refactor * Shorter packed load/store * this should fix lint * Fix mypy * SD compile script working * New SD webgpu UI * New default prompt * New SD weights * Fix title when webgpu not available * Run symbolic tests, simplify is_nan, use round_up * Show step time on UI * Bump minimum wgpu version to v0.19 * Fix latent --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2026-06-24 02:14:17 +00:00 · 2024-11-26 05:26:40 +01:00 · 2024-11-26 05:26:40 +01:00 · 10618aba98
commit 10618aba98
parent ff3f2a9c1a
18 changed files with 654 additions and 397 deletions
--- a/examples/compile_efficientnet.py
+++ b/examples/compile_efficientnet.py
@ -1,7 +1,7 @@
 from pathlib import Path
 from extra.models.efficientnet import EfficientNet
 from tinygrad.tensor import Tensor
-from tinygrad.nn.state import safe_save
+from tinygrad.nn.state import get_state_dict, safe_save, safe_load, load_state_dict
 from extra.export_model import export_model
 from tinygrad.helpers import getenv, fetch
 import ast
@ -9,11 +9,15 @@ import ast
 if __name__ == "__main__":
  model = EfficientNet(0)
  model.load_from_pretrained()
+  dirname = Path(__file__).parent
+  # exporting a model that's loaded from safetensors doesn't work without loading in from safetensors first
+  # loading the state dict from a safetensor file changes the generated kernels
+  if getenv("WEBGPU") or getenv("WEBGL"):
+    safe_save(get_state_dict(model), (dirname / "net.safetensors").as_posix())
+    load_state_dict(model, safe_load(str(dirname / "net.safetensors")))
  mode = "clang" if getenv("CLANG", "") != "" else "webgpu" if getenv("WEBGPU", "") != "" else "webgl" if getenv("WEBGL", "") != "" else ""
  prg, inp_sizes, out_sizes, state = export_model(model, mode, Tensor.randn(1,3,224,224))
-  dirname = Path(__file__).parent
  if getenv("CLANG", "") == "":
-    safe_save(state, (dirname / "net.safetensors").as_posix())
    ext = "js" if getenv("WEBGPU", "") != "" or getenv("WEBGL", "") != "" else "json"
    with open(dirname / f"net.{ext}", "w") as text_file:
      text_file.write(prg)