Commit graph

1,334 commits

Author SHA1 Message Date
nimlgen
1bb1f1aee8
hcq: fix race in _at_profile_finalize (#11368) 2025-07-25 14:14:02 +03:00
George Hotz
490a93902c
define reg doesn't have init anymore (#11365)
* define reg doesn't have init anymore

* remove that

* no special logic for dr

* fix amd uop matmul
2025-07-24 19:15:49 -07:00
George Hotz
9da3f72495
identity store for DEFINE_REG (#11363)
* identity store for DEFINE_REG

* identity store for DEFINE_REG

* noop continue
2025-07-24 16:41:29 -07:00
nimlgen
3b3de8df61
hcq: graphed copies (#11302)
* fast copies p2

* upd and fix

* graph supports

* fixes

* fixes

* fixes

* fix

* fix

* fix mockgpu

* fix alignment

* smaller in ci
2025-07-24 17:36:19 +03:00
nimlgen
bf12041910
hcq: mapping of cpu to all hcq devices (#11354)
* hcq: mapping of cpu to all hcq devices

* fix kfd

* nv

* simpler

* cleaner

* correct skip

* fix ifaces

* system fixes

* mypy
2025-07-24 12:52:38 +03:00
nimlgen
0f374e10d2
cpu: use mmap for allocations (#11349)
* cpu: use mmap for allocations

* ops

* fix mypy
2025-07-23 20:30:18 +03:00
nimlgen
ca09c180dc
cpu: remove del spam (#11343)
* cpu: remove del spam

* fix
2025-07-23 12:02:37 +03:00
George Hotz
c65b5aab62
small things from endrange (#11339)
* small things from endrange

* store
2025-07-22 19:45:37 -07:00
George Hotz
09431d4ad1
make DEFINE_REG behave like the others (#11273)
* simpler define reg

* cast

* PTRCAT define_acc

* cleanups

* fix uops stats

* fix linearizer tests

* llvm

* define reg sets const

* define reg sets const

* no assign

* collapse that

* fix test_max_pool2d_bigger_stride_dilation

* use index, fix webgpu

* devec

* fix tests

* fix webgpu

* fix llvm

* threads for python

* fix ops_python

* only for reg

* acc_half is real now in the emulator

* fix llvm

* fix webgpu init

* fix wgpu test

* fix some tests

* fix ptx

* fix ptx bool acc

* cleanups

* broken, meh. will fix with ENDRANGE

* line count
2025-07-22 13:53:56 -07:00
nimlgen
3faa352dcc
am: bump version after mm changes (#11328) 2025-07-22 21:54:10 +03:00
nimlgen
53b3d87456
am: use 4-lvl pdir (#11326) 2025-07-22 20:58:15 +03:00
nimlgen
de2df92551
hcq: use devices instead of ids in HCQGraph (#11303)
* hcq: use devices instead of ids in HCQGraph

* fiz
2025-07-21 20:03:12 +03:00
uuuvn
178dbf3f66
Remote scheduler changes (#11177) 2025-07-21 09:29:44 -07:00
nimlgen
cc3c1e4c14
hcq: move cpu to hcq (#11262)
* hcq: move cpu to hcq

* import time

* upd

* fix

* windows support

* hm

* cleaner

* fix timer

* fix timing

* std is ns

* skip profiler

* mypy

* cleaner

* cleanups

* after merge

* default is back
2025-07-21 15:10:38 +03:00
nimlgen
816c01c2d4
hcq: default copy_queue_t=None (#11297) 2025-07-21 14:45:20 +03:00
nimlgen
9c533e5c38
hcq: cpu prereq (#11296) 2025-07-21 13:35:18 +03:00
nimlgen
e87a42e243
hcq: prepare for windows (#11293)
* hcq: prepare for windows

* comments
2025-07-21 13:08:56 +03:00
nimlgen
df3ba0a7c0
autogen: fix imports in libusb (#11294) 2025-07-21 13:04:27 +03:00
nimlgen
dd6a2d432f
hcq: default timestamp metrics is ns (#11295) 2025-07-21 12:56:30 +03:00
wozeparrot
53345ef4e2
feat: make ops_disk work on block devices (#11291) 2025-07-20 14:39:50 -07:00
chenyu
54924f9969
type remove Union and Optional [pr] (#11283)
use `|` for consistency
2025-07-19 14:05:52 -04:00
nimlgen
188ed38315
replace from_mv with lightweight mv_address (#11280) 2025-07-19 13:50:51 +03:00
nimlgen
9a88bd841c
hcq: refactor into peer_groups (#11277)
* hcq: refactor into peer_groups

* fix fors

* fixes

* ooops

* mypy

* tiny fixes
2025-07-18 16:34:18 +03:00
nimlgen
f432eef708
hcq: rename CPU -> KICK in graph for kickoff signal (#11278) 2025-07-18 15:54:35 +03:00
nimlgen
cfb229473f
hcq: refactor buffer mapping (#11271)
* hcq: refactor buffer mapping

* fix

* fix mypy
2025-07-17 15:16:49 +03:00
uuuvn
6f0ddcc24c
Remote cross-host graph (#11229) 2025-07-16 13:27:54 -07:00
nimlgen
6aa20c607d
nv: graceful shutdown to cold state (#11265) 2025-07-16 19:49:35 +03:00
nimlgen
197d345804
nv: print rpc msg with DEBUG>=3 (#11247) 2025-07-15 16:39:58 +03:00
nimlgen
756ba1a5f9
nv: support ampere in nvpci (#11230) 2025-07-14 15:35:44 +03:00
nimlgen
c4a920d95c
nv: use last signature (#11227) 2025-07-14 13:00:39 +03:00
nimlgen
a830d37881
nv: check wpr2 is inited (#11226) 2025-07-14 11:46:14 +03:00
Alisher Zhubanyshev
4ef6b46b34
hcq: reduce launch overhead (#11193)
* nv: improve mmio creation speed

* add memoryview test

* fix indents

* move mv bench to `test_helpers`, remove comparison
2025-07-13 19:25:50 +03:00
nimlgen
1cc2b3f845
nv: use wait_cond (#11212) 2025-07-13 19:25:20 +03:00
nimlgen
6cce3a5d58
generic wait_cond (#11210)
* generic wait_cond

* fix linter

* fix linter
2025-07-13 16:59:21 +03:00
nimlgen
55c54d9745
nv: sync after gpfifo setup (#11209) 2025-07-13 14:40:11 +03:00
George Hotz
770a558585
lil cleanups from uop branch [pr] (#11197) 2025-07-12 09:46:28 -07:00
nimlgen
ea7f2f779c
hcq: p2p nv-amd (#11195)
* hcq: p2p between diff devices

* fix
2025-07-12 18:53:34 +03:00
nimlgen
6f5250d158
nv: fix typing in rpc_rm_control (#11189) 2025-07-12 16:09:42 +03:00
uuuvn
d11b20129d
DMARef infra (#10753)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-07-11 14:09:47 -07:00
nimlgen
f9e4c4e57a
nv: nvpci blackwell support (#11127)
* nv: start 5090

* gsp init 5090

* mmu

* works

* after merge

* clenaer

* rwk

* x

* fx

* finish?

* fix

* unrelated

* fix

* commenbt
2025-07-11 17:02:09 +03:00
nimlgen
c7f6b617b4
nv: do not hardcode lv0 pd size (#11180) 2025-07-11 16:26:18 +03:00
nimlgen
27922c986a
nv: generic mmu impl (#11179) 2025-07-11 16:26:09 +03:00
nimlgen
cc6ed30f4f
nv: relative lv addressing in NVPageTableEntry (#11164) 2025-07-10 22:35:50 +03:00
qazal
bde80c0cdf
record GraphEvents in metal graph (#11145)
* record GraphEvents in metal graph

* add TestProfiler.test_graph, revert old stuff

* move profile capture to MetalGraph

* comment

* don't double record graph command buffers

* wait_check

* explicit delete
2025-07-10 21:32:06 +03:00
nimlgen
581397110f
nv: use classes in GSP_IP (#11163) 2025-07-10 17:47:12 +03:00
nimlgen
705de6b8a6
nv: parse sizes of ctx buffers (#11161) 2025-07-10 17:46:48 +03:00
Pyry Kovanen
32117402dd
metal: fix incorrect _free on interpreter exit (#11158) 2025-07-10 14:01:30 +03:00
George Hotz
53ae153404
tc should be in opt (#11148)
* tc should be in opt [pr]

* fix import
2025-07-09 14:12:21 -07:00
wozeparrot
6697d0089d
initial gfx950 kfd support (#11151)
* feat: initial gfx950 support

* fix: lint
2025-07-09 13:45:16 -07:00
nimlgen
b6981404ed
memory: use page shifts in memory manager (#11149)
* memory: use page shifts in memory manager

* fix
2025-07-09 22:05:00 +03:00