Commit graph

11,106 commits

Author SHA1 Message Date
George Hotz
82aa943cd4 fix that test 2025-11-19 08:48:49 -08:00
George Hotz
e16782cf9e
Merge branch 'master' into python_speed 2025-11-19 08:41:40 -08:00
George Hotz
1c47ee729e fix names of rewrite rules 2025-11-19 08:41:34 -08:00
George Hotz
a8f9e69bd9 work on python speed 2025-11-19 08:34:15 -08:00
George Hotz
385618d45b
skip process replay by default (#13353) 2025-11-19 08:25:34 -08:00
George Hotz
ffff194e93 skip process replay by default 2025-11-19 08:14:44 -08:00
chenyu
fba4535289
remove hacks for threefry long removal when padded [pr] (#13352) 2025-11-19 11:11:39 -05:00
George Hotz
225eb1500f
generic range changes that work for str + int (#13350)
* generic range changes that work for str + int

* opt range counts up
2025-11-19 08:07:49 -08:00
chenyu
1a72ac16a6
move where same false branch rule to symbolic_simple [pr] (#13349) 2025-11-19 10:15:38 -05:00
chenyu
79055ddb8b
clean propagate_invalid more [pr] (#13347) 2025-11-19 09:47:50 -05:00
nimlgen
0c9fbf87e1
nvioctl: classes (#13346) 2025-11-19 16:14:15 +03:00
qazal
f2221130bb
viz: pick shape by event type (#13279) 2025-11-19 20:15:52 +08:00
wozeparrot
be72b78dcb
tk: small fixes (#13345)
* fix: handle case where final uop isn't a tk wrapped one

* clean: remove after from mma
2025-11-19 00:58:50 -08:00
wozeparrot
e4fbde5b3b
fix: extra options need to go on second step too (#13344) 2025-11-19 00:58:09 -08:00
George Hotz
1a332afa76
spec test on 3.14 (#12957) 2025-11-19 00:43:04 -08:00
Christopher Milan
a438c277de
autogen tests for 3.14 (#13343) 2025-11-18 22:16:59 -05:00
chenyu
722e7a16ed
remove rule in propagate_invalid [pr] (#13342) 2025-11-18 21:38:33 -05:00
George Hotz
1afa3c0877
vmap on full model (#13340)
* vmap on full model

* vmap gemm

* reduce sums on end

* outer reduce

* only if there's ranges

* put those rules in symbolic

* ranges

* do opt later

* add zero range
2025-11-18 16:06:06 -08:00
chenyu
46cb65e692
delete rules from sym [pr] (#13339) 2025-11-18 14:57:35 -05:00
George Hotz
9c59b3d19e
vmap grad needs reduce_backward (#13336)
* vmap grad needs reduce_backward

* fuse and outer
2025-11-18 10:08:30 -08:00
qazal
a647c9eca6
sqtt ui minor fixes (#13335)
* roc.py cleanups

* direct append

* viz index cleanup

* simd row details
2025-11-19 01:27:56 +08:00
George Hotz
06e39a88a9
outer vmap works (#13334)
* outer vmap works

* fuse works

* vmap outer works

* outer ranges work

* grad work

* should be good to merge
2025-11-18 09:27:48 -08:00
chenyu
805de27e07
no load substitute in uop_given_valid [pr] (#13333) 2025-11-18 11:47:58 -05:00
chenyu
05294bc648
fix some mypy cast [pr] (#13331) 2025-11-18 09:23:42 -05:00
qazal
5623e765c8
VIZ=2 enables SQTT (#13330) 2025-11-18 22:20:31 +08:00
nimlgen
331f70aa75
roc: ctrlc (#13255)
* roc: ctrl-c works

* rm
2025-11-18 19:29:28 +08:00
George Hotz
583560ab72
this is the right way to write vmap (#13328) 2025-11-17 20:20:52 -08:00
Christopher Milan
8e8e53c886
int8_t is c_byte (#13326) 2025-11-17 21:25:40 -05:00
George Hotz
e4fead8a86
write scan in uops (#13321)
* write scan in uops

* ops range

* no need for variable

* meh, later

* shorter
2025-11-17 16:58:08 -08:00
wozeparrot
8894a5409d
feat: hipcc compiler (#13319) 2025-11-17 15:13:32 -08:00
George Hotz
6d3385c284
print special ops in postrange (#13318)
* print special ops in postrange

* fix on OSX
2025-11-17 14:43:23 -08:00
chenyu
b637093be9
remove a few rules in pm_lower_index_dtype [pr] (#13317) 2025-11-17 17:04:56 -05:00
George Hotz
98e9e73286 hotfix: amd_uop_matmul getenvs 2025-11-17 13:26:01 -08:00
qazal
e7e1935225
cleanup sqtt/test_timing (#13315) 2025-11-18 04:28:05 +08:00
wozeparrot
33773fda87
tk initial mi350 (#13289) 2025-11-17 11:46:32 -08:00
nimlgen
e2cee64050
Revert "hcq: add tag to exec events (#13311)" (#13314)
This reverts commit f63ded5817.
2025-11-17 22:15:31 +03:00
chenyu
646372490c
move tiktoken import in llama3 (#13316)
only Tokenizer requires that
2025-11-17 14:09:37 -05:00
qazal
a37f221e44
viz: visualize waves in the timeline (#13292)
* viz: visualize waves in the timeline

* timeline in format

* per step

* rm that
2025-11-17 22:04:21 +08:00
nimlgen
f63ded5817
hcq: add tag to exec events (#13311)
* hcq: add tag to exec events

* f

* fix

* fix
2025-11-17 16:59:30 +03:00
qazal
50a443f558
viz: add shader engine to wave exec payload (#13310)
* viz: show sqtt shader engine

* order it from smallest unit

* easier to config
2025-11-17 19:11:34 +08:00
nimlgen
9bb17c53ea
amd: timer fix (#13267) 2025-11-17 13:59:03 +03:00
George Hotz
55be95da15
cleanup sqtt raw parser (#13309)
* cleanup sqtt raw parser

* better names (don't merge yet)

* clean up amd

* a few more names

* one more filter
2025-11-16 13:11:51 -08:00
George Hotz
cabd4add48
more work parsing SQTT, separate VIZ/PROFILE (#13308)
* more work parsing SQTT

* more minimal runner

* sep VIZ/PROFILE

* parse print new

* improve parser

* more filter

* that

* split them

* lil cleanup

* skip flaky test

* AQL in mmapeak
2025-11-16 10:40:39 -08:00
qazal
13efdf8c31
test s_nop stall (#13307) 2025-11-17 00:59:39 +08:00
George Hotz
295600dc5a
saturday coffee shop work parsing the att format (#13295)
* saturday coffee shop work parsing the att format

* add examples

* parser

* classes of packets

* fully vibe coded parser

* vibing

* empty

* some vibe names

* vibes

* most of these are wrong

* more vibes

* better names

* parsing

* parse

* cleanup parser

* touchups
2025-11-16 08:25:51 -08:00
Christopher Milan
a9ed241172
properly suppress NIRRenderer.__del__ error (#13299) 2025-11-16 18:58:04 +03:00
qazal
c70b06ec19
sqtt test_timing work (#13304)
* sqtt test_timing cleanups

* only the instruction

* v_mfma_f32_16x16x32_f16 16 cycles, only after second one though
2025-11-16 23:49:24 +08:00
chenyu
8f0e747b3a
Tensor._tri with arange (#13297) 2025-11-16 10:21:16 -05:00
chenyu
6372c95094
disable benchmark MobileNetV2 on DSP (#13305)
failed on tinyc2
2025-11-16 09:42:52 -05:00
Christopher Milan
61625a3898
fix objc finalizing bug (#13296) 2025-11-16 12:43:04 +03:00