Commit graph

27 commits

Author SHA1 Message Date
George Hotz
d631716858
remove const without STACK (#16639)
* remove const without STACK

* fix GEP rewrite

* fix null tests

* fix openpilot regression

* it's 10 in CI
2026-06-16 21:25:42 -07:00
George Hotz
b8aec4cce7
port x86 to new_style (fable slop) and now everything is new style (#16581)
* port x86 to new_style (fable slop)

* don't change ops

* port NIR to new_style (fable)

* lil cleanup

* fix tests, and remove new_style
2026-06-11 21:09:34 -07:00
chenyu
6b7d2b91df
update test_uop_graph (#16470)
use UOp methods instead of constructing UOp directly, some of it violated spec
2026-06-02 08:53:54 -04:00
George Hotz
20242fdf1d
update test + spec from shrink_in_render (#16467)
* update test + spec from shrink_in_render

* cast
2026-06-01 19:24:43 -07:00
George Hotz
124d2f8227
anon addrspace from new renderer (#16461)
* anon addrspace from new renderer

* use max_numel in python renderer

* add sizes to ptrs in tests

* more

* correct fix
2026-06-01 14:42:02 -07:00
George Hotz
1e7f1dcf49
add ParamArgs [pr] (#16421)
* add ParamArgs

* fix export

* cleanups

* fixes

* simpler
2026-05-28 19:17:17 -07:00
George Hotz
6815f28849
dtype.vec shapes (#16287)
* dtype.vec shapes

* something

* Closer

* more passes

* shape is in spec

* fix reduce

* image dtype shape correct

* lil

* use reshape on image

* need BUFFER there

* remove that test

* fix ptx + x86

* fix nir

* x86 fix maybe

* x86 fixups

* x86 fix

* don't check that for NOOP
2026-05-21 11:56:49 -07:00
George Hotz
55515747b7
Remove Ops.VCONST (#16267)
* start removing vconst

* remove a lot of vconst

* const folding + strict ordering

* update tests

* spec from minigen

* move that
2026-05-19 16:35:24 -07:00
George Hotz
8294d105a7
Update the spec in spec.py to match the current state (#16132)
* start work on specv2

* more spec

* more spec

* fix amd emulator

* more spec

* more

* fix test_uop_graph

* move those

* spec=2

* skip those questionable tests

* ptx fix

* more spec=2

* store

* allow custom function in tensor

* spec 2

* fix beam search for tensor cores

* delete the old specs

* fix import
2026-05-11 20:07:47 -07:00
George Hotz
daed602569
rename BUFFERIZE to STAGE (#16125) 2026-05-10 09:26:46 -07:00
George Hotz
b796bbae87
fix valid in indexing tests (#16087) 2026-05-07 14:11:28 -07:00
George Hotz
5f441ecffc
unify reduce + reduce_axis (#15973)
* unify reduce + reduce_axis

* fix all tests

* lil cleanups
2026-04-29 10:29:56 -07:00
George Hotz
0c3260d5d9
rename VECTORIZE to STACK (#15880) 2026-04-23 10:43:42 +08:00
chenyu
2f7d085450
shared _normalize_indices for getitem (#15625)
* shared _normalize_indices for getitem

* list
2026-04-06 17:45:36 -04:00
Christopher Milan
67a50fb738
move where on load with casts (#15492) 2026-03-26 22:11:27 -04:00
chenyu
b7960841af
support shape broadcast in UOp.alu (#15442)
i think it can integrate tighter, but now Tensor also does ufix from UOp and implicit dtype upcast
2026-03-24 10:14:57 -04:00
George Hotz
85dee83f5d
amd flash attention cleanups + emulator fixes (#15431)
* amd flash attention cleanups

* simpler

* params

* fix emulator bugs

* fix idiv bug

* remove that test

* more emu fixes
2026-03-24 10:10:46 +08:00
George Hotz
c62dea6881
ai slop flash attention (it works) (#15401)
* ai slop flash attention (it works)

* speed up, 2 TFLOPS + 7 GB/s

* simpler

* simpler

* optimize

* faster

* warp shuffle

* sqtt: link dispatch to exec (#15396)

* sqtt packet linking infra

python

* javascript

* ~doubly linked list

* ui works

* work

* exec can also highlight the pc, coloring work

* more work

* rm sqtt/model.py, doesn't need to be upstreamed

* viz: no context enters in cli, update llama profile (#15404)

* removed unused named arg in rules [pr] (#15414)

* viz: sqtt printer in viz/cli.py (#15411)

* work

* sqtt timeline in CLI

* format all printers nicely

* s/Showed/Printed

* ansistrip

* sys.exit

* keep colors in list

* work from amd_copy_matmul

* has_more always gets returned

* linter

* don't print colors

* more colors

* wow this is so deep

* work

* minor details

* selected

* improve progress bar

* remove it

* 22, global_load_vaddr is so long

* remove *0 hack in sign, gradient materializes zeros for unconnected nodes (#15416)

Amp-Thread-ID: https://ampcode.com/threads/T-019d1612-6322-706b-a94d-a812400a55cb

Co-authored-by: Amp <amp@ampcode.com>

* works

* cnt=20

* revert that

* uop slice tests

* simpler

---------

Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
Co-authored-by: chenyu <chenyu@fastmail.com>
Co-authored-by: gg <ggordbegli@gmail.com>
Co-authored-by: Amp <amp@ampcode.com>
2026-03-23 16:15:10 +08:00
Christopher Milan
0c89340a1e
automatically emulate unsupported (tiny) floats [skip_process_replay] (#15366) 2026-03-20 02:31:44 -04:00
chenyu
da1700e16b
dtypes.index -> dtypes.weakint (#15377) 2026-03-20 01:08:46 -04:00
Christopher Milan
dabdc986df
shrink guarded ranges, try 2 (#15272) 2026-03-14 04:24:05 -04:00
Christopher Milan
7cf4b16c91
Revert "shrink guarded ranges" (#15271) 2026-03-14 03:44:38 -04:00
Christopher Milan
d9951e2f8e
shrink guarded ranges (#15263) 2026-03-14 03:38:48 -04:00
George Hotz
6fd18ef875
rename CAT to VCAT (#15167) 2026-03-06 18:46:28 +08:00
George Hotz
c331798201
move tests to test/backend (#14691)
* move tests to test/backend

* fix imports

* fix CI

* revert that one

* Fix formatting in README for test command
2026-02-12 11:09:44 +08:00
George Hotz
dd2de4f838
rename all DEFINE_GLOBAL to PARAM (#14511) 2026-02-03 15:09:38 +08:00
George Hotz
dc77b3318b
move files that pass with NULL=1 to test/null (#14508)
* move files that pass with NULL=1 to test/null

* fix windows

* cpu 0

* bugfix + durations
2026-02-03 13:52:36 +08:00
Renamed from test/unit/test_uop_graph.py (Browse further)