Commit graph

889 commits

Author SHA1 Message Date
nimlgen
777d2aec05
metal profiler + cpu_profile (#8291)
* metal + cpu_profile

* gpt example

* linter + revert gpt2 for now

* a bit of readme

* linter

* unrelated

* tests

* linter

* b
2024-12-18 00:06:56 +03:00
nimlgen
af87e4b53c
viz profiler (#8287)
* only hcq

* fix get_metadata

* linter

* oops

* tiny

* linter

* time

* print pm

* hmm

* nits
2024-12-17 20:00:53 +03:00
George Hotz
cda34ccadf hotfix: time.time -> time.perf_counter 2024-12-16 11:32:49 -08:00
nimlgen
a2a4ff30dc
hcq better timout haandling (#8269) 2024-12-16 13:44:55 +03:00
chenyu
f05fd118a2
few minor code cleanups [pr] (#8267) 2024-12-15 23:44:51 -05:00
chenyu
2e4c7d4cfb
add "tinygrad" to be part of cache_dir [pr] (#8188)
instead of having sqlite / http download / metal compile to add "tinygrad" separately. also make it non-private since it's used in metal
2024-12-12 12:09:44 -05:00
nimlgen
bf7d1fcd2c
tiny import fixes in hcq graph (#8184) 2024-12-12 16:30:06 +03:00
Ahmed Harmouche
2f2b1e792c
wgsl and ops_webgpu simplifications [pr] (#8182)
Simplify wgsl and ops_webgpu
2024-12-12 14:21:58 +01:00
Ahmed Harmouche
1b94cc095a
Bump back wgpu to latest (#8179) 2024-12-12 09:40:52 +01:00
chenyu
aaa3cc235d
unused from __future__ import annotations (#8171) 2024-12-11 19:05:04 -05:00
George Hotz
8f4299fcc8 hotfix: suppress shutdown errors in CLProgram 2024-12-11 08:08:32 -08:00
nimlgen
3a7d64b96c
hcq remove update from args state (#8104)
* hcq remove update from args state

fix amd

ugh

qcom?

qcom ops

ops

qcom fix

qcom texture info

fx

qcom fix

qcom

qcom, sry

minor

works

* remove old code

* unrelated+sint

* qcom

* typing

* rm comments
2024-12-08 15:22:05 +03:00
nimlgen
d6e66095fd
hcq buffer is a class (#8106)
* hcq buffer is a class

* qcom

* no from_mv in qcom

* remove qcombuffer

* useless cast

* mypy

* qcom fix

* _md -> meta
2024-12-08 13:29:43 +03:00
nimlgen
8b1fa9cb7d
nv hcq queue touchups (#8102) 2024-12-07 14:09:38 +03:00
nimlgen
e180a31c5e
tiny metal cleanup (#8089)
* tiny metal cleanup

* cast

* sry
2024-12-06 21:44:32 +03:00
nimlgen
d1282da7e8
hcq bump alloc (#8078)
* hcq bump alloc

* hm

* nv

* typo
2024-12-06 19:19:04 +03:00
nimlgen
c0240855b9
qcom has not transfer (#8075)
* qcom alloc is not hcq alloc

* maybe base?

* test
2024-12-06 14:45:01 +03:00
JaSpa99
3c5d5f9414
mypy==1.13.0 (#7990)
* explicit instantiation and narrowing asserts

* explicit cast

* bump

* one line assert

* handle case for no copy_queue_t

* Revert "handle case for no copy_queue_t"

This reverts commit 38347806ca.

* more readable control flow

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-12-06 12:09:14 +08:00
nimlgen
78c01a5c2b
amd general _gpu_alloc (#8056)
* amd general _gpu_alloc

* hmm

* ops
2024-12-05 15:50:23 +03:00
nimlgen
8071600897
nv one _gpu_alloc (#8055) 2024-12-05 15:22:03 +03:00
uuuvn
e9c5b23ba1
Use MTLCompiler directly (v2) (#7920)
* Use MTLCompiler directly (v2)

* to_block_literal and REQUEST_TYPE_COMPILE

* Rewrite command encoding

* Revert to_block_literal

* Maybe that's more readable to some people?

* Typo and comment about stdlib caching

* Update ops_metal.py

* Update ops_metal.py

* Update ops_metal.py

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-12-04 16:36:48 +08:00
nimlgen
7fda464b08
hcq c-like args state (#8020)
* hcq c-like args state

* ugh

* Dfix

* rename

* i
2024-12-03 23:53:35 +03:00
George Hotz
32675a8a77
sacrifice ClangGraph on the altar of lines [pr] (#8009) 2024-12-03 21:11:15 +08:00
Ahmed Harmouche
146e1caea3
Downgrade wgpu to prevent sd segfault (#7969) 2024-12-02 15:48:44 +01:00
wozeparrot
077e7e8ed2
fix: private segment sgpr on gfx103x (#7987)
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-12-02 20:54:50 +08:00
nimlgen
10f431b96d
hcq replace update with sint (#7899)
* try sym hcq

* start with amd

* move to nv

* nv works

* cache and qcom

* fixes

* signals

* fix nv

* qcom fixes

* linter

* linter

* cache + typings

* fixes

* tiny fixes

* linter

* linter

* lntr

* ugh

* comments
2024-11-29 20:08:13 +03:00
nimlgen
d3660ccc51
prereqs for hcq updates removal (#7959)
* hcq signals touch ups

* hcq compiled has device id

* helpers

* prreq hcq api

* oops
2024-11-29 18:20:07 +03:00
nimlgen
309dcb1044
hcq signal add sleep (#7955)
* hcqsignal sleep

* fixes

* typing

* time ms is int
2024-11-29 14:04:45 +03:00
nimlgen
81d415be03
amd pkt3 refactor (#7923)
* amd pkt3 refactor

* replace this

* linter

* fix

* cmt

* fast

* simpler

* linter

* smth

* missing
2024-11-28 11:06:37 +03:00
JaSpa99
38f34ca0cb
prepare mypy==1.13.0: legacy cast (#7866)
* use helper to narrow literal type

* narrow with asserts instead of cast

* remove parantheses

* tensor.item() calls tensor.data()

* no copy

* proper indexing
2024-11-27 10:33:35 -05:00
nimlgen
84f96e48a1
hcq signal tiny refactor (#7913)
* hcq signal tiny refactor

* no mv

* fix

* fix2

* fix3
2024-11-26 21:48:38 +03:00
Ahmed Harmouche
10618aba98
Bring back WebGPU (#7063)
* Start from andredaprato:webgpu-clean

* Fix infs

* inf wgsl function is not needed

* Emulated ulong for threefry, more tests passing

* Randomness tests passing

* Update model export to support new changes in webgpu, efficientnet export works again

* Simplify shift emulation in wgsl

* Delete test file

* Fix bigger than u32 u32 literal

* Why was skip copies added here?

* Python3.12 for webgpu tests

* Fix model export syntax error

* Get test ops passing with some skips

* Fix lint

* Much simpler shift

* Run more tests

* Timestamp queries are not supported in CI, so skip search tests

* All fancy indexing passing

* r is ctx

* Run more dtype tests by using is_dtype_supported

* Cleanup ulong shift rendering

* UPat -> Pat, UOps -> Ops

* Pat -> UPat

* Refactor render_ushift if-else

* Pattern to avoid ulong mul

* Remove vals_dtype

* is_nan trick + rewrite, test_isnan passing

* Rewrite a * select(1, nan, gate) -> select(a, nan, gate)

* No arg, just op

* Support char, uchar, short, ushort

* Run test_index_mnis now that we have uint8

* Fix pyling

* Save 3 lines by using base Compiler

* No more long emulation

* Remove fixup_binops

* No more external_local_bufx wgsl specific cstyle modif, use base extra_pm

* Simpler, faster copyin/out

* Skip some new tests that use long

* Fix typo

* copyout touchup

* Save lines by using render_cast

* WebGL is not supported in core, delete it from is_dtype_supported

* More narrow test skips for some unary tests

* TernaryOps, UnaryOps -> Ops

* TinyGrad supports WebGPU

* StableDiffusion demo: f16tof32 gpu is a lib, update UI

* Packed load/store, no more scale_size, no core tinygrad changes

* Rename copyin, copyout

* Device -> dev

* Fix lint

* Pattern matcher rule for packed load/store

* Refactor

* Shorter packed load/store

* this should fix lint

* Fix mypy

* SD compile script working

* New SD webgpu UI

* New default prompt

* New SD weights

* Fix title when webgpu not available

* Run symbolic tests, simplify is_nan, use round_up

* Show step time on UI

* Bump minimum wgpu version to v0.19

* Fix latent

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-11-26 12:26:40 +08:00
chenyu
04bee97d2a
hotfix ctypes.c_ulong(size) for metal _alloc (#7902)
fix `Tensor.ones(1000, 1000, 1000).contiguous().realize()` on METAL
2024-11-25 18:25:33 -05:00
George Hotz
1d6d842887
move DSP to extra (room for webgpu) [pr] (#7836) 2024-11-22 11:32:57 +08:00
George Hotz
6fc7013463
put all DSP in dsp file [pr] (#7833) 2024-11-22 11:22:59 +08:00
George Hotz
e39af63156
no loop assert in ops_python [pr] (#7834) 2024-11-22 11:17:36 +08:00
George Hotz
d18b948f48
ptxcompiler isn't a cudacompiler [pr] (#7832)
* ptxcompiler isn't a cudacompiler [pr]

* hcq types
2024-11-22 10:57:22 +08:00
mesozoic-egg
855f9a767a
add restype for msg method for type annotation and consistency (#7828)
* no need to explicitly set objc_id as restype

* add restype for type annotation

---------

Co-authored-by: Mesozoic Egg <mesozoic.egg@proton.me>
2024-11-22 09:17:58 +08:00
George Hotz
df6f1815ad
remove jit_cache from self in GraphRunner [pr] (#7817)
* remove jit_cache from self in GraphRunner [pr]

* add back unused
2024-11-21 13:26:37 +08:00
George Hotz
e9ae2ccd09
_prg to match _buf [pr] (#7816) 2024-11-21 12:44:48 +08:00
George Hotz
439911b2e6
disable disable_abstract_method [pr] (#7815) 2024-11-21 12:28:57 +08:00
George Hotz
c5d458ce02
BufferSpec and ProgramSpec [pr] (#7814)
* BufferSpec and ProgramSpec [pr]

* delete preallocate, it's unused

* Revert "delete preallocate, it's unused"

This reverts commit dcfcfaccde.
2024-11-21 12:18:05 +08:00
George Hotz
490a6130af
more hcq typing [pr] (#7813)
* more hcq typing [pr]

* minor

* less generic
2024-11-21 11:23:07 +08:00
George Hotz
9df5a62c5e
unify to HWQueue [pr] (#7812)
* unify to HWCommandQueue [pr]

* all is HWQueue
2024-11-21 10:33:08 +08:00
George Hotz
eb0bb7dc0b
final dname to device [pr] (#7806)
* final dname to device [pr]

* oops, fix nv
2024-11-20 20:20:28 +08:00
George Hotz
bc977fec53
dname -> device [pr] (#7804)
* dname -> device [pr]

* a few more

* only one left
2024-11-20 17:57:14 +08:00
George Hotz
0a74acd90e
add proper typing to HCQ [pr] (#7803)
* add proper typing to HCQ [pr]

* more types

* and qcom

* HCQProgram has device type

* typed allocator
2024-11-20 17:20:39 +08:00
George Hotz
6688539bc9
rename device to dev so Buffer can be Allocator [pr] (#7799)
* rename device to dev to Buffer can be Allocator [pr]

* missed those

* update the Program classes also

* more renames

* oops
2024-11-20 15:47:26 +08:00
George Hotz
913a27ee27
from_buffer on metal was never called [pr] (#7791) 2024-11-20 00:35:17 +08:00
George Hotz
d71fe7faa5
rename allocator methods to not conflict [pr] (#7788)
* rename allocator methods to not conflict [pr]

* forgot those

* transfer + offset
2024-11-20 00:10:29 +08:00