tinygrad/extra
nimlgen 30bd6a619f
usb gpu (#8766)
* start gpu

* progress

* fixes

* read correct

* libusb

* libusb works

* support asm24

* hmm

* one access file

* fix extra

* start AMBar

* works on am

* back to usb

* patch fw

* full fast write into a bar

* ugh, minus one gpus, next please

* mute libusb for now

* usb for asm24

* 63

* hmm

* ops

* rescan

* and gpu shoudl be there

* enumerate them?

* usbgpu bus 4, 100% reliable (draft)

* lil

* works

* comments

* add DEBUG

* cleaner

* simplest

* Revert "simplest"

This reverts commit 1d00354c16.

* Revert "cleaner"

This reverts commit c5662de956.

* assert we find gpu

* that's simpler

* this back

* simpler?

* correcT

* work

* nonsense

* works with more checks

* this works

* the 6s in the right place

* reliable now

* fix after reboot

* set config

* 1s timeouts

* close to fw loading

* streams

* usbhub works

* endpoints

* fix

* want to test tiny10

* move to tiny 10

* fix gpu

* ugly speed

* smth

* mostly broken, but signals and dmas

* do not reset gpu every time

* changes to run kernels

* ugh, not working

* t10

* pg and sc files

* some prog

* um?

* somehow it works

* patched for 24

* some tries

* minimal

* moving

* back to working

* so sloooooow

* move to controller

* usb.py rewrite

* rework

* cleaner 1

* cleaner 2

* cleaner 3

* new abstractions

* aft merge

* init controller

* cleaner 4

* cleaner 5

* patcher + tiny changes

* ignore that

* cleaner 6

* after rebase

* cleaner 7

* bring it back

* start linter war

* linter 2

* autogen was missing

* fix autogen

* typing

* better?

* mypy

* extra/legacy rename and cleaner

* shuffle

* better printing

* tiny changes and tests

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-05-01 18:03:47 +03:00
..
accel move things, clean up extra (#2292) 2023-11-13 20:18:40 -08:00
amdpci am: move boot memory to vram start (#10115) 2025-04-30 19:12:19 +03:00
assembly s/UOps/Ops (#7500) 2024-11-03 11:26:10 +08:00
backends CLANG -> CPU (#9189) 2025-02-20 18:03:09 -05:00
datasets lint mlperf model_train (#10038) 2025-04-24 16:19:44 -04:00
disassemblers/adreno qcom fix disasm (#6703) 2024-09-24 15:23:43 +08:00
dsp dsp stuff / sniff ioctls from snpe (#9490) 2025-03-20 10:38:23 +08:00
gemm remove required_optimizations (#9848) 2025-04-19 16:51:16 -04:00
hcqfuzz hcqfuzz: init (#10049) 2025-04-25 23:19:21 +03:00
hip_gpu_driver MI300X support (WIP) (#9585) 2025-03-29 19:46:42 +08:00
hiprtc use comgr to compile (#3248) 2024-01-26 18:27:49 -08:00
huggingface_onnx add onnx frontend stub [pr] (#9558) 2025-03-24 12:24:34 +08:00
junk coder.py can write and run code (#2439) 2023-11-25 12:27:54 -08:00
models simple symbolic slice in llama [pr] (#10112) 2025-04-30 14:36:35 -04:00
nv_gpu_driver nv fix shared_memory_size (#7239) 2024-10-23 21:59:47 +03:00
optimization [bounty] [pr] index validation with z3 (#9981) 2025-04-24 08:06:08 -04:00
qcom_gpu_driver qcom match texture/sampler descriptors to OpenCL (#7622) 2024-11-11 21:56:51 +03:00
remu remu: only write v_cmp result if exec is set (#10084) 2025-04-28 20:31:52 +08:00
resnet18 beat mlx at resnet 18 (#6611) 2024-09-20 11:28:01 +08:00
sqtt use tuple in isinstance for type checking (#9583) 2025-03-26 19:36:48 +08:00
torch_backend don't modify the ranges on reduce rewrite (#10062) 2025-04-28 12:01:19 -04:00
torch_hook use tuple in isinstance for type checking (#9583) 2025-03-26 19:36:48 +08:00
usbgpu usb gpu (#8766) 2025-05-01 18:03:47 +03:00
webgpu Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) (#8646) 2025-02-07 15:16:59 +08:00
archprobe.py move dtypes to dtype.py (#2964) 2024-01-01 14:58:48 -08:00
augment.py [ready] Replacing os with pathlib (#1708) 2023-08-30 10:41:08 -07:00
disk_read_speed.py io_uring for copies from disk (#5035) 2024-06-21 11:36:51 +03:00
dump_cache.py wow how did i think that was okay (#2339) 2023-11-16 21:21:11 -08:00
export_model.py fix assertion message for supported device in export_model (#9957) 2025-04-21 09:23:44 -04:00
f16_decompress.py u32 to f16 in tinygrad (#8074) 2024-12-06 12:00:13 +01:00
gradcheck.py tests from grad uop path [pr] (#8313) 2024-12-18 09:25:05 -08:00
hip_events.py move autogen to runtime/autogen (#3254) 2024-01-26 12:44:19 -08:00
hip_large_kernel.py minimum change for rdna4 [pr] (#9455) 2025-03-16 13:39:24 +08:00
hook_cuda.py cuda hooking (#9180) 2025-02-20 19:20:01 +08:00
introspection.py rename LazyBuffer -> UOp [pr] (#8169) 2024-12-11 16:15:52 -08:00
lr_scheduler.py use at least float32 for optim.lr (#4297) 2024-04-25 14:42:28 -04:00
mcts_search.py [TIP-9] rename Opt's amt to arg 2 (#8770) 2025-01-27 14:19:04 -05:00
multitensor.py multitensor start (#2676) 2023-12-07 17:07:05 -08:00
onnx.py Revert "ONNX add output shape validation (#9720)" (#9904) 2025-04-16 03:15:56 -04:00
onnx_helpers.py Revert "ONNX add output shape validation (#9720)" (#9904) 2025-04-16 03:15:56 -04:00
reduce_speed.py VALIDATE_WITH_CPU [pr] (#9488) 2025-03-18 15:15:04 +08:00
replay_pkl.py hand_coded_optimizations returns list[Opt] [pr] (#9938) 2025-04-19 20:26:59 -04:00
ring_copy.py ring copy example (#3185) 2024-01-19 23:34:30 -05:00
setup_mock_amd_osx.sh add script to install amd mockgpu on macOS (#8536) 2025-01-09 01:29:25 +03:00
setup_mock_nv_osx.sh hotfix: setup_mock_nv_osx 2025-02-13 12:26:15 +08:00
thneed.py new style device (#2530) 2023-11-30 17:07:16 -08:00
threefry.py feat: make buffer (#6745) 2024-09-25 18:31:03 +08:00
to_movement_ops.py full fix for as_strided in torch backend (#9257) 2025-02-26 22:34:05 +08:00
training.py tinytqdm.set_description and tinytrange (#5101) 2024-06-22 14:45:06 -04:00
transfer_speed.py hotfix: copy size is in bytes 2024-01-17 16:44:15 +00:00
viz_sz.py merge viz back into one file (#9672) 2025-04-01 19:52:02 +08:00