tinygrad/extra
nimlgen c18307e749
AM driver (#6923)
* connect to gpu

* rlc init?

* gfx comp start init

* early init is hardoded, some progress with fw

* gart

* progress, next mqd

* ring setup, still does not execute anything

* ugh write correct reg

* pci2: vm

* pci2: start psp

* vm seems to work

* pci2: gfx start

* pci2: fix psp ring resp

* pci2: try ring

* pci2: mes and some fixes

* pci2: some progress

* pci2: progress

* pci2: mm

* pci2: discovery

* pci2: correct apertures

* pci2: b

* pci2: i

* pci2: l

* pci2: o

* pci2: cmu

* pci2: mes_kiq works

* pci2: mes

* pci2: kcq does not work(

* pci2: unhalt gfx

* ops_am

* minor

* check if amdgpu is there, or we will crash

* bring back graph, it just works

* less prints

* do not init mes (not used)

* remove unused files

* ops_am: start move into core

* ops_am: works

* clcks, but still slower

* faster + no mes_kiq

* vm frags + remove mes

* cleanup fw

* gmc tiny cleanup

* move to ops_amd

* comment out what we dont really need

* driverless

* close in speed

* am clean most of ips

* gmc to ips

* cleaner

* new vm walker

* comment old one

* remove unsued autogens

* last write ups

* remove psp hardcoded values

* more

* add logs

* ih

* p2p and sdma

* vfio hal and interrupts

* smth

* amd dev iface

* minor after rebase

* bind for sdma

* Revert "bind for sdma"

This reverts commit a90766514d.

* tmp

* debug new mm

* ugh, allreduce hangs fixed

* p1

* works

* no pci.py

* cleaner a bit

* smth

* tiny cleanups

* cleaner a bit

* pciiface

* linter

* linter 2

* linter 3

* linter

* pylint

* reverted unrelated changes

* unrelated

* cmp tool

* ugh wrong fw

* clockgating

* unrelated

* alloc smaller chunks

* this

* opt sigs

* collect stat

* ops

* upd

* proclogs

* proclogs2

* vfio

* ruff

* linter pylint

* oops

* mypy p1

* mem fix

* mypy p2

* mypy p3

* mypy p4

* correct

* minor

* more tests

* linter in tests

* pci_regs header

* minor write up

* setup

* do not require libs

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2024-12-31 23:06:17 +03:00
..
accel move things, clean up extra (#2292) 2023-11-13 20:18:40 -08:00
amdpci AM driver (#6923) 2024-12-31 23:06:17 +03:00
assembly s/UOps/Ops (#7500) 2024-11-03 11:26:10 +08:00
backends bring back the DSP runtime 2024-12-31 12:01:42 -05:00
datasets assign early folding (#8093) 2024-12-07 17:02:55 +08:00
disassemblers/adreno qcom fix disasm (#6703) 2024-09-24 15:23:43 +08:00
dsp add qcom dsp runtime (#6112) 2024-09-13 21:01:33 +03:00
gemm create_schedule([x.lazydata]) -> x.schedule() in tests (#8449) 2024-12-31 03:15:52 +08:00
hip_gpu_driver create_schedule([x.lazydata]) -> x.schedule() in tests (#8449) 2024-12-31 03:15:52 +08:00
hiprtc use comgr to compile (#3248) 2024-01-26 18:27:49 -08:00
junk coder.py can write and run code (#2439) 2023-11-25 12:27:54 -08:00
models Fix FC layer ResNet load_from_pretrained error (#8387) 2024-12-26 18:11:27 -05:00
nv_gpu_driver nv fix shared_memory_size (#7239) 2024-10-23 21:59:47 +03:00
optimization assign early folding (#8093) 2024-12-07 17:02:55 +08:00
qcom_gpu_driver qcom match texture/sampler descriptors to OpenCL (#7622) 2024-11-11 21:56:51 +03:00
resnet18 beat mlx at resnet 18 (#6611) 2024-09-20 11:28:01 +08:00
archprobe.py move dtypes to dtype.py (#2964) 2024-01-01 14:58:48 -08:00
augment.py [ready] Replacing os with pathlib (#1708) 2023-08-30 10:41:08 -07:00
disk_read_speed.py io_uring for copies from disk (#5035) 2024-06-21 11:36:51 +03:00
dump_cache.py wow how did i think that was okay (#2339) 2023-11-16 21:21:11 -08:00
export_model.py encapsulate the exported webgpu model (#8203) 2024-12-13 10:55:37 +01:00
f16_decompress.py u32 to f16 in tinygrad (#8074) 2024-12-06 12:00:13 +01:00
gradcheck.py tests from grad uop path [pr] (#8313) 2024-12-18 09:25:05 -08:00
hip_events.py move autogen to runtime/autogen (#3254) 2024-01-26 12:44:19 -08:00
introspection.py rename LazyBuffer -> UOp [pr] (#8169) 2024-12-11 16:15:52 -08:00
lr_scheduler.py use at least float32 for optim.lr (#4297) 2024-04-25 14:42:28 -04:00
mcts_search.py safe softmax trick in MCTS ucb_explored_children (#7515) 2024-11-03 15:59:31 -05:00
multitensor.py multitensor start (#2676) 2023-12-07 17:07:05 -08:00
onnx.py Tensor.mod (#8458) 2024-12-31 11:31:42 -05:00
onnx_ops.py example to benchmark onnx [pr] (#8459) 2024-12-31 11:38:33 -05:00
ring_copy.py ring copy example (#3185) 2024-01-19 23:34:30 -05:00
thneed.py new style device (#2530) 2023-11-30 17:07:16 -08:00
threefry.py feat: make buffer (#6745) 2024-09-25 18:31:03 +08:00
to_movement_ops.py s/UOps/Ops (#7500) 2024-11-03 11:26:10 +08:00
training.py tinytqdm.set_description and tinytrange (#5101) 2024-06-22 14:45:06 -04:00
transfer_speed.py hotfix: copy size is in bytes 2024-01-17 16:44:15 +00:00