Commit graph

51 commits

Author SHA1 Message Date
nimlgen
bf12041910
hcq: mapping of cpu to all hcq devices (#11354)
* hcq: mapping of cpu to all hcq devices

* fix kfd

* nv

* simpler

* cleaner

* correct skip

* fix ifaces

* system fixes

* mypy
2025-07-24 12:52:38 +03:00
nimlgen
cc3c1e4c14
hcq: move cpu to hcq (#11262)
* hcq: move cpu to hcq

* import time

* upd

* fix

* windows support

* hm

* cleaner

* fix timer

* fix timing

* std is ns

* skip profiler

* mypy

* cleaner

* cleanups

* after merge

* default is back
2025-07-21 15:10:38 +03:00
nimlgen
9a88bd841c
hcq: refactor into peer_groups (#11277)
* hcq: refactor into peer_groups

* fix fors

* fixes

* ooops

* mypy

* tiny fixes
2025-07-18 16:34:18 +03:00
chenyu
a0438012af
remove Kernel.get_program [pr] (#11203) 2025-07-12 20:50:29 -04:00
nimlgen
ea7f2f779c
hcq: p2p nv-amd (#11195)
* hcq: p2p between diff devices

* fix
2025-07-12 18:53:34 +03:00
George Hotz
92678e59ee
move kernel to opt (#10899) 2025-06-20 15:22:28 -07:00
George Hotz
32e9949052
rename lazydata to uop (#10698) 2025-06-08 08:42:22 -07:00
nimlgen
0e1beaf44f
nv: align copies + better test (#10118) 2025-04-30 20:09:53 +03:00
nimlgen
2ec3b722e2
nv: fix copies larger than 4g (#10117) 2025-04-30 18:43:17 +03:00
nimlgen
5c7d004da5
hcq: refactor int ptrs to hcqbuffers (#10105)
* hcq: refactor int ptrs to hcqbuffers

* more refactors

* linter

* use in allocator

* test fiz

* fx

* ops

* final?

* simpler

* keep this for now
2025-04-30 00:12:18 +03:00
George Hotz
68c5f7ba80
load fast in sdxl (#10072)
* load fast in sdxl

* back to that with the ret

* no context
2025-04-27 11:58:51 -04:00
nimlgen
fa888ee077
minor test cleanups (#9770)
* fix test_graph on max

* pcie5
2025-04-07 15:29:12 +03:00
nimlgen
8cae00833c
flaky test in ci (#9321) 2025-03-02 16:27:22 +03:00
nimlgen
70db8c3003
hcq: dyn alloc signals (#9238)
* hcq: dyn alloc signals

* types and uniqueue devs

* typing

* mypy

* mypy one more time

* test

* make fds to not intersect in mockgpu between drivers
2025-02-25 17:22:24 +03:00
Ignacio Sica
b240f12593
[TIP-9] rename Opt's amt to arg 2 (#8770)
* rename Opt amt to arg

* ignore_beam_cache for test_tiny

* move ignore_beam_cache to test_tiny

* move to separate pr

* revert space change

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-01-27 14:19:04 -05:00
George Hotz
3ed146a5ff
Revert "rename Opt amt to arg (#8767)" (#8769)
This reverts commit bf041659a5.
2025-01-27 23:46:37 +09:00
Ignacio Sica
bf041659a5
rename Opt amt to arg (#8767) 2025-01-27 23:36:47 +09:00
nimlgen
d224d0ed7f
nv: fix fault info (#8587)
* nv: fix fault info

* and emu for amd

* skip if not mock
2025-01-13 14:38:43 +03:00
nimlgen
92b59c9b7a
test_hcq limits for mockgpu not (only) ci (#8555)
* test_hcq limits for mockgpu not (only) ci

* rm CI
2025-01-10 17:37:28 +03:00
nimlgen
31fcfe764d
adjust hcq test for ci macos (#8534) 2025-01-08 16:18:31 +03:00
qazal
866dfa1f23
create_schedule([x.lazydata]) -> x.schedule() in tests (#8449) 2024-12-31 03:15:52 +08:00
George Hotz
29c14f1cbf hotfix: update tests for no uop mut 2024-12-30 10:05:37 -05:00
nimlgen
0a139b1436
amd iface abstraction (#8413)
* start on amd iface

* t

* unused import

* fixes

* internal api
2024-12-27 15:53:53 +03:00
nimlgen
af87e4b53c
viz profiler (#8287)
* only hcq

* fix get_metadata

* linter

* oops

* tiny

* linter

* time

* print pm

* hmm

* nits
2024-12-17 20:00:53 +03:00
nimlgen
10f431b96d
hcq replace update with sint (#7899)
* try sym hcq

* start with amd

* move to nv

* nv works

* cache and qcom

* fixes

* signals

* fix nv

* qcom fixes

* linter

* linter

* cache + typings

* fixes

* tiny fixes

* linter

* linter

* lntr

* ugh

* comments
2024-11-29 20:08:13 +03:00
George Hotz
e9ae2ccd09
_prg to match _buf [pr] (#7816) 2024-11-21 12:44:48 +08:00
George Hotz
c5d458ce02
BufferSpec and ProgramSpec [pr] (#7814)
* BufferSpec and ProgramSpec [pr]

* delete preallocate, it's unused

* Revert "delete preallocate, it's unused"

This reverts commit dcfcfaccde.
2024-11-21 12:18:05 +08:00
George Hotz
eb0bb7dc0b
final dname to device [pr] (#7806)
* final dname to device [pr]

* oops, fix nv
2024-11-20 20:20:28 +08:00
George Hotz
27995a2a04
vcount + cleanups (#7393)
* Revert "Revert "Restore vcount [pr] (#7390)" (#7392)"

This reverts commit 4ca53db604.

* ugh bugfix [pr]

* uops_to_dtypes function

* fixups

* varnames

* fix mypy

* just 4,8

* tests
2024-10-30 12:50:15 +08:00
George Hotz
6b063450df
move hcq device to runtime [pr] (#6879)
* things that are only used in one place don't belong in helpers [pr]

* start moving hcq device [pr]

* fix paths
2024-10-04 22:26:50 +08:00
nimlgen
f0019ad29c
bump ci test timeout for test_speed_exec_time (#6715)
* bump ci test timeout for test_speed_exec_time

* more
2024-09-24 18:44:09 +08:00
George Hotz
a1a882b006
arange folding with new ge (#6604)
* arange folding with new ge

* bump allowed gated

* bump allowed speed
2024-09-19 18:01:28 +08:00
nimlgen
6c4ddd6260
hcq skip tests when no multidev (#6235)
* hcq skip tests when no multidev

* linter

* a bit higher tinout
2024-08-22 18:27:16 +03:00
chenyu
b36a7273c6
RUF018 assignment-in-assert [run_process_replay] (#6172)
assertion should not have side effect or `-O` breaks.

initially just wanted to fix the one in rearrange, but it also made some long lines less long
2024-08-19 00:34:52 -04:00
wozeparrot
0c5189de25
threefry half (#6154) 2024-08-18 15:23:12 -07:00
nimlgen
183c4c91a3
fix non-jitted transfers in profile (#5980)
* fix transfers in profile

* fix linter

* sync to be sure everythin is recorded
2024-08-08 17:58:08 +03:00
nimlgen
8d8704af2d
fix amd exec_update for locals (#5966) 2024-08-07 21:02:56 +03:00
nimlgen
590b9ebb34
hcq copy queue is optional (#5909)
* hcq copy queue is optional

* one more

* this
2024-08-05 14:03:25 +03:00
nimlgen
2777784b91
add dependency viewer to hcq profiler (#5874)
* hcq profiler support deps

* clean up

* cleaner

* cleanup

* revert this

* linter

* mypy

* add test

* sync is strange, need to take the end

* linter + test
2024-08-02 22:07:01 +03:00
George Hotz
53fcac9e80 hotfix: increase time on flaky NV test 2024-08-01 10:20:07 -07:00
nimlgen
ed1d784077
test profiler timer sync across devs (#5751)
* test profiler timer sync across devs

* more correct

* typo
2024-07-27 16:47:37 +03:00
nimlgen
1384f08cd4
hcq profile tests (#5654)
* profile tests

* fixes

* remove linter
2024-07-23 18:40:33 +03:00
nimlgen
26fc4610a0
amd more accurate cache managment (#5631)
* amd more accurate cache managment

* fix amd

* add memory_barrier + copies tests

* tranfer test as well

* linter
2024-07-22 19:07:01 +03:00
Vyacheslav Pachkov
edc58e6b6e
hcq: remove duplicate allocation of kernel args by abstracting (#5633) 2024-07-22 18:29:41 +03:00
nimlgen
b1782e3fef
hcq refactor signal into class (#5575)
* hcq refactor signal into class

* fix amd

* amd do not use amd_signal_t

* cleanup

* signal setter

* fix linter

* docs

* more docs + types

* fix types
2024-07-19 23:23:05 +03:00
nimlgen
9d7edc9269
hcq rename HCQCompat -> HCQ (#5577) 2024-07-19 11:34:17 +03:00
nimlgen
61822d1a14
nv fix timeline signal rollover on copy queue (#5473)
* hotfix: nv rollover to 32bits

* test both queues
2024-07-14 16:06:12 +03:00
nimlgen
8835d6c49a
cleanup nv/amd program (#5449)
* cleanup nv/amd program

* fix amd

* a bit cleaner

* ugh, typo

* linter

* fix nv

* tiny thing
2024-07-14 14:08:35 +03:00
nimlgen
1678199b15
add update_copy to hcq spec (#5348)
* add update_copy to hcq spec

* fix amd
2024-07-09 20:44:44 +03:00
nimlgen
7be776f9af
add _alloc_signal/_free_signal to hcq (#5264)
* add _alloc_signal/_free_signal api

* oops, revert this

* linter
2024-07-02 23:35:39 +03:00