Commit graph

9,615 commits

Author SHA1 Message Date
George Hotz
c89fb841f8 cleanups 2025-07-28 19:22:35 -07:00
George Hotz
29f7d03d43 k5 support 2025-07-28 19:11:27 -07:00
George Hotz
b41e49472c kernel4 written in uops 2025-07-28 17:02:24 -07:00
George Hotz
165d8e1263 k4 in python 2025-07-28 16:56:14 -07:00
George Hotz
4b57aa2655 Revert "move simplify views to merge views"
This reverts commit 1e07dff384.
2025-07-28 16:19:51 -07:00
George Hotz
ad1a2a68d5 add amd kernel 4 2025-07-28 16:14:11 -07:00
George Hotz
1e07dff384 move simplify views to merge views 2025-07-28 13:55:23 -07:00
George Hotz
fddc645668
HL=2 top matmul (#11406)
* HL=2 top matmul

* top colored
2025-07-28 12:32:38 -07:00
nimlgen
c7b4ab86e4
fix llvm tc on mi350 (#11404) 2025-07-28 21:37:43 +03:00
chenyu
9f7c72ff8f
remove UOp.valid method [pr] (#11402)
only used in add_buffer_ops
2025-07-28 11:29:08 -04:00
chenyu
b22a34331b
remove const valid in fixup_ast [pr] (#11401) 2025-07-28 11:07:59 -04:00
qazal
7737cbb2a0
viz: tabulate runtime stats (#11400) 2025-07-28 15:56:39 +03:00
chenyu
ab6a27f627
remove a branch in UOp.r [pr] (#11398) 2025-07-27 18:00:01 -04:00
uuuvn
052191eae4
Remote multihost (p2p with infiniband verbs) (#9746)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-07-27 14:44:32 -07:00
qazal
a22417cc75
viz: fix bug with wrong program links (#11396) 2025-07-28 02:52:06 +08:00
nimlgen
a5371f514b
cpu: copies in profile (#11392)
* cpu: copies in profile

* fix

* rename to tiny?
2025-07-27 20:56:27 +03:00
George Hotz
8c10085459
assert shape on lowerer store [pr] (#11395)
* assert shape on lowerer store [pr]

* fix ptx
2025-07-27 10:41:57 -07:00
qazal
6174cfa828
viz: only show match counts greater than 0 (#11394) 2025-07-28 00:25:00 +08:00
qazal
3466a220de
viz: disassembly viewer (#11393)
* test

* CPU=1 disasm works

* METAL=1 disasm works

* fix that

* work

* can unwrap

* work p2

* don't crash
2025-07-27 18:44:28 +03:00
qazal
3bb232eb29
viz: query path in rewrite steps (#11391) 2025-07-27 14:51:47 +03:00
b1tg
b7ef73babd
fix wmma ptx (#11389)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-07-26 23:28:35 -07:00
b1tg
8dfcdb123d
less wmma args (#11385)
* less wmma args

* scalar

* ops_python

* mypy

* lint

* dedup

* helper wmma_args

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-07-26 21:24:05 -07:00
George Hotz
dfeee63d30
uop matmul work (#11388)
* uop matmul work

* works with locals
2025-07-26 21:23:55 -07:00
George Hotz
3923e78061
no_vectorized_acc keeps single DEFINE_REG (#11387)
* no_vectorized_acc keeps single DEFINE_REG

* fix ptx, skip flaky test
2025-07-26 11:44:09 -07:00
qazal
4866ad57da
viz: add runtime stats (#11383)
* viz: add runtime stats

* lint

* better

* flat
2025-07-26 20:40:46 +03:00
George Hotz
2c70eaf18c
fix load / barrier (#11386)
* fix load / barrier

* cleanups

* fix CI
2025-07-26 10:27:37 -07:00
nimlgen
65673e68ca
hcq: do not import during __del__ (#11384)
* hcq: do not import during __del__

* ignore
2025-07-26 13:58:55 +03:00
George Hotz
466ab5a3f2
store/load not pass through index (#11381)
* noop

* fix noop

* store cat is NOOP

* store dtype is void

* stores aren't passed through anymore

* meh, skip those for ptx

* correct ptx skip

* hl runs
2025-07-25 21:01:47 -07:00
George Hotz
0a5f37946b
unused permute arg on r (#11379) 2025-07-25 19:52:37 -07:00
George Hotz
48562cb2db
full shape simpler (#11376) 2025-07-25 18:27:48 -07:00
chenyu
3d68feb67d
minor onnx Gather cleanup (#11375)
removed a type ignore and one error code skip
2025-07-25 21:08:08 -04:00
chenyu
88c338bfcc
add kernelize to keccak for each data block (#11370)
* add kernelize to keccak for each data block

test_long works now. this prevents internal uops from growing propotional to data length and eventually too deep

* this?

* hash stuff

* gate test

* mv
2025-07-25 16:07:20 -04:00
chenyu
dab07bcad9
use next instead of full list in UOp._device [pr] (#11369)
prevents exponential fan out
2025-07-25 10:04:29 -04:00
nimlgen
1bb1f1aee8
hcq: fix race in _at_profile_finalize (#11368) 2025-07-25 14:14:02 +03:00
George Hotz
490a93902c
define reg doesn't have init anymore (#11365)
* define reg doesn't have init anymore

* remove that

* no special logic for dr

* fix amd uop matmul
2025-07-24 19:15:49 -07:00
George Hotz
9da3f72495
identity store for DEFINE_REG (#11363)
* identity store for DEFINE_REG

* identity store for DEFINE_REG

* noop continue
2025-07-24 16:41:29 -07:00
chenyu
cc795c6656
simplify keccak pad mask code (#11362) 2025-07-24 19:24:10 -04:00
chenyu
c0c4bc9d7c
use int32 for keccak reorder_indexes (#11360)
it's used for tensor indexing, so int32 instead of uint64 is slightly faster
2025-07-24 15:54:50 -04:00
George Hotz
0602b22086
kernel spec (#11359)
* kernel spec

* ops.VIEW

* work
2025-07-24 12:45:38 -07:00
qazal
519f1d13cc
viz: generic stuff from gpu counters ui (#11358)
* viz: generic stuff from gpu counters ui

* move pointer

* pre fetch

* move timeout
2025-07-24 20:29:24 +03:00
nimlgen
3b3de8df61
hcq: graphed copies (#11302)
* fast copies p2

* upd and fix

* graph supports

* fixes

* fixes

* fixes

* fix

* fix

* fix mockgpu

* fix alignment

* smaller in ci
2025-07-24 17:36:19 +03:00
nimlgen
3046ead6e8
jit: graph reports ei support (#11356) 2025-07-24 16:35:10 +03:00
nimlgen
bf12041910
hcq: mapping of cpu to all hcq devices (#11354)
* hcq: mapping of cpu to all hcq devices

* fix kfd

* nv

* simpler

* cleaner

* correct skip

* fix ifaces

* system fixes

* mypy
2025-07-24 12:52:38 +03:00
chenyu
82e6de7fc6
more keccak reference tests (#11329) 2025-07-23 22:06:39 -04:00
George Hotz
b0dc97d1f7
write out kernel 3 in uops (#11352)
* write out kernel 3 in uops

* matmul is correct

* gemm passes spec

* bugfix to match speed

* cleanups
2025-07-23 17:32:38 -07:00
chenyu
5b570196e4
support DEV= to specify device (#11351) 2025-07-23 17:40:55 -04:00
uuuvn
76a2ddbd78
Move remote tests out of onnx (#11310)
Co-authored-by: wozeparrot <wozeparrot@gmail.com>
2025-07-23 13:25:55 -07:00
George Hotz
7f0a41df4d
move optional out of devectorize [pr] (#11350)
* move optional out of devectorize [pr]

* fast idiv
2025-07-23 11:26:05 -07:00
nimlgen
0f374e10d2
cpu: use mmap for allocations (#11349)
* cpu: use mmap for allocations

* ops

* fix mypy
2025-07-23 20:30:18 +03:00
George Hotz
ae07a93814
simple block barrier (#11341)
* simple block barrier

* simple
2025-07-23 10:14:11 -07:00