Commit graph

9,777 commits

Author SHA1 Message Date
George Hotz
2feeb8c8a6 cleanups 2025-08-11 08:57:30 -07:00
George Hotz
04fa825a26 careful w the cache 2025-08-10 15:52:41 -07:00
George Hotz
706188ad16 bugfix 2025-08-10 15:48:07 -07:00
George Hotz
fbe9909d90 update for master 2025-08-10 15:43:27 -07:00
George Hotz
16d2d9daac
Merge branch 'master' into no_merge_views 2025-08-10 15:39:37 -07:00
George Hotz
996c907c0b
rewrite not ready + children machinery (#11607)
* rewrite not ready + children machinery

* it doesn't like track rewrites
2025-08-10 15:28:30 -07:00
George Hotz
48ca6d888d was dumb 2025-08-10 14:36:35 -07:00
George Hotz
b7ea16f161 localish fa 2025-08-10 14:30:25 -07:00
George Hotz
cc34518a52 RewriteNotReady 2025-08-10 13:56:38 -07:00
Sieds Lykles
1875bc69f9
Late rewrite rules for CMPLT (#11591)
* add rules

* more rules

* fix comment spelling

* remove two rules
2025-08-10 22:18:13 +02:00
George Hotz
76a97e04b0 cleanups 2025-08-10 12:24:16 -07:00
George Hotz
0d64aa1f1e this does work...but with a global 2025-08-10 12:00:55 -07:00
nimlgen
5403a4aeaf
null dev: support offset on buffers (#11606)
* null dev: support offset on buffers

* nolimit
2025-08-10 21:58:37 +03:00
geohotstan
b0dab6a4cd
onnx Resize OP clean up (#11603)
* start

* slight clean up
2025-08-10 14:10:39 -04:00
George Hotz
9979730f3f children stuff that doesn't work 2025-08-10 10:57:54 -07:00
Sieds Lykles
10540414cd
Add Ops.CMPEQ (#10431)
* Add op

* add to Groupop.ALU

* fix spec

* fix ptx

* temporary pickle by name to see process replay

* add Ops.EQ to binary ops

* Actuall rename properly

* add test to assert CMPEQ is being used

* Ops.CMPEQ is automatic cast to bool

* add Ops.CMPEQ to llvm

* add Ops.CMPEQ to llvm
2025-08-10 13:13:16 +02:00
chenyu
f7aa1b85fe
minor sort cleanups (#11602) 2025-08-10 01:51:23 -04:00
chenyu
dfb702ef33
fix sort for small dim (#11601)
* fix sort for small dim

* fixed test_sort_empty
2025-08-10 01:17:41 -04:00
chenyu
ef17af85c6
remove .float call in llama logit (#11598)
* remove .float call in llama logit

* bfloat item
2025-08-10 00:02:18 -04:00
chenyu
dd3d2eb36c
add training llama3 test in ci (#11599) 2025-08-09 22:35:39 -04:00
chenyu
3e64467322
remove freqs_cis contiguous in llama (#11597) 2025-08-09 21:11:12 -04:00
chenyu
7338ffead0
small beautiful_mnist update (#11596)
gather is fast now. there's a conv/bw kernel that only gets fast with BEAM, but whole thing runs < 5 seconds now regardless
2025-08-09 19:51:14 -04:00
chenyu
45baec1aab
model parallel llama (#11588)
MP=8 GRADIENT_ACC_STEPS=3 BS=1 DEFAULT_FLOAT=bfloat16 OPTIM_DTYPE=bfloat16 LLAMA3_SIZE=70B SEQLEN=512 PYTHONPATH=. MODEL=llama3 python3 examples/mlperf/model_train.py
2025-08-09 16:54:27 -04:00
nimlgen
09bc377da3
search: print runtime failures on debug (#11593) 2025-08-09 23:01:19 +03:00
nimlgen
14f99ff1a1
amd: doorbell_cpu_addr is not used (#11592)
* amd: doorbell_cpu_addr is not used

* hm
2025-08-09 20:03:21 +03:00
Sieds Lykles
01c770c77b
Fix z3 float cast in indexing (#11590)
* adjust dtype of z3_renderer and add rule for cast

* dtypes.bool is also cast noop

* add regression test

* make embedding smaller

* even smaller test
2025-08-09 17:59:23 +02:00
Sieds Lykles
10d388499d
Refactor optional.py (#11578)
* move fast_idiv to transcendental

* move optional.py

* adjust comment

* change import

* mypy needs this?
2025-08-09 17:35:05 +02:00
George Hotz
7ddcb8632f simpler 2025-08-09 08:14:02 -07:00
nimlgen
20e46a175c
do not use disk with usb (#11119)
* not use disk with usb

* better name
2025-08-09 11:58:02 +03:00
George Hotz
e268eb2d5c tform ffn 2025-08-08 18:29:48 -07:00
George Hotz
ee06481036 ranges 2025-08-08 18:18:27 -07:00
George Hotz
38c9b5ed2c conv hack 2025-08-08 14:36:48 -07:00
qazal
53179953fc
viz: factor out memory graph render (#11586) 2025-08-08 20:18:11 +03:00
George Hotz
7249a711c2 half contig 2025-08-08 08:43:41 -07:00
qazal
8ce72d3fad
simpler disassembly table spec (#11583)
* simpler disassembly table spec

* update ui

* move to scalar/vec render
2025-08-08 17:59:26 +03:00
qazal
44a222a9b2
viz: move resource usage summary to server (#11582) 2025-08-08 17:08:28 +03:00
qazal
793ace530e
update amd_uop_matmul.py import (#11581)
Using this for testing SQTT
2025-08-08 17:07:35 +03:00
chenyu
b232c60def
benchmark openpilot 0.9.9 (#11575)
* benchmark openpilot 0.9.9

not sure what to do with the 0.9.7 ones with IMAGE=2 and validate

* name
2025-08-08 01:26:14 -04:00
qazal
16f0edbe90
pass opts arg in get_program process replay [pr] (#11571)
* fix ptx process replay

* keyword arg

* renderer is also optional [pr]

* test_linearizer fixup

* name function order is args,ret,kwargs

* can use opts_to_apply

* pass through p.applied_opts

* sink_arg

* now it opens devices too
2025-08-08 03:05:09 +03:00
qazal
960cc6533a
pass through name function args in track_rewrites (#11572) 2025-08-08 02:28:52 +03:00
George Hotz
efdf08f3e2 global rangeify 2025-08-07 15:12:00 -07:00
George Hotz
9a2f55425b global rangeify 2025-08-07 15:08:45 -07:00
wozeparrot
1826004ef9
feat: add tinyos builder link (#11570) 2025-08-07 17:42:18 -04:00
George Hotz
9ed409d0f8
Merge branch 'master' into no_merge_views 2025-08-07 14:42:07 -07:00
George Hotz
82be8abfd2
move opt under codegen (#11569) 2025-08-07 14:19:17 -07:00
chenyu
702e38dc19
remove FUSE_ARANGE_UINT (#11567)
also add IGNORE_OOB=1 to bert runs. lowered BS on tinybox to 90 since 96 oom during eval without reset
2025-08-07 16:49:06 -04:00
George Hotz
6ed2dfd187
delete the arange dim mismatch restriction (#11568)
* delete the arange dim mismatch restriction

* skip that test race
2025-08-07 13:46:17 -07:00
wozeparrot
7ae4335127
feat: generate blend index (#11566) 2025-08-07 14:20:28 -04:00
chenyu
594cbdc66f
skip AM ResNet50 benchmark (#11565)
hanging with FUSE_ARANGE?
2025-08-07 14:07:01 -04:00
chenyu
aa1a6f2132
support threshold in Tensor.softplus (#11564)
fix gradient for large input
2025-08-07 13:43:18 -04:00