Commit graph

11,919 commits

Author SHA1 Message Date
George Hotz
8c79751937 Q5_K 2026-01-29 17:04:03 +08:00
George Hotz
aeacd3b2fb ggml_type_13 2026-01-28 21:19:52 +08:00
George Hotz
202b74b369
assembly/amd: continue refactors (#14386)
* simpler

* merge

* flat

* no ctx

* use the correct apis

* dup code

* write clean code

* remove bad helpers

* bits junk remove

* junk remove

* smem test

* fix tests

* correct fix + tests

* Fmt matters it seems

* wmma refactor

* a lil more

* kimi cleanups

* line
2026-01-28 17:33:03 +08:00
qazal
5bffa17f82
llama train: better NULL=1 EMULATE=AMD_CDNA4 dev experience (#14395)
* beam opens devices

* switch to hip renderer

* amd: true?

* llvm true is for test_autogen
2026-01-28 17:31:22 +09:00
qazal
0294014108
fix bufferize cost function for multi, improve VIZ=-1 cli (#14394)
* improve cli

* remove_bufferize change
2026-01-28 15:53:18 +09:00
qazal
c158acea29
failing multi ram usage test from llama gemm (#14392) 2026-01-28 14:32:32 +09:00
Christopher Milan
067e27857e
nested composite actions don't work (#14393) 2026-01-28 00:13:30 -05:00
Christopher Milan
9dddf3d478
don't save caches for PRs, try 2 (#14391) 2026-01-27 23:30:17 -05:00
Christopher Milan
68fe5d8b36
Revert "don't save caches for PRs (#14389)" (#14390) 2026-01-27 23:22:26 -05:00
Christopher Milan
4ab228b498
don't save caches for PRs (#14389) 2026-01-27 23:21:31 -05:00
Christopher Milan
5e36482314
decompose long to ints where unsupported, try 2 (#14383) 2026-01-27 23:20:43 -05:00
wozeparrot
e496547720
llama3 gradacc (#14291) 2026-01-27 19:48:10 -08:00
George Hotz
88bc5ee212
assembly/amd: rename to better names (#14384)
* assembly/amd: rename to better names

* might help fuzzing segfault

* emu2 -> emu
2026-01-28 10:00:54 +08:00
George Hotz
065b95cfb0
Revert "add retry to fetch (#14370)" (#14385)
This reverts commit dc4d7f2d55.
2026-01-28 09:35:37 +08:00
Eitan Turok
dc4d7f2d55
add retry to fetch (#14370) 2026-01-27 14:04:25 -08:00
chenyu
8d1f3c8885
fix copysign for inf input (#14381)
* fix copysign for inf input

* llvm olt
2026-01-27 16:45:48 -05:00
Christopher Milan
289a3e415e
also skip test_nonoverlapping_shrink_assignment (#14382) 2026-01-27 16:26:26 -05:00
Christopher Milan
f34efc1ad1
DISABLE_FAST_IDIV actually works as a ContextVar (#14378) 2026-01-27 16:12:42 -05:00
chenyu
8c899e4aaf
fix copysign for -0 (#14380)
test both x and 1/x < 0 work too. and found another big with the * 0 hack
2026-01-27 15:44:58 -05:00
chenyu
62884585a7
failed test case for copysign -0.0 (#14379)
* failed test case for copysign -0.0

* skip those
2026-01-27 14:37:17 -05:00
nimlgen
ec1b28bc2c
am: exit early in case of failures (#14376)
* am: exit early in case of failures

* sorry, pre-linter

* reset when error state
2026-01-27 22:10:02 +03:00
chenyu
cd22ee9ed0
add InvalidType to ConstType [pr] (#14373)
* add InvalidType to ConstType [pr]

TYPED=1 python test/test_tiny.py passes.
added PyConst = float|int|bool for some Tensor level input types

* hcq
2026-01-27 14:09:34 -05:00
Christopher Milan
5b42a1357b
SCACHE=0 works with DEBUG (#14377) 2026-01-27 13:12:43 -05:00
chenyu
db010a31be
IGNORE_OOB -> CHECK_OOB [pr] (#14374)
flip the meaning
2026-01-27 12:20:59 -05:00
chenyu
c22667b0c4
also skip test_overlapping_shrink_assignment_reverse (#14375)
crashing
2026-01-27 12:20:39 -05:00
nimlgen
e52d58b041
autogen: update amd (#14372) 2026-01-27 19:53:14 +03:00
nimlgen
cbf94a0a95
nv: exit early in case of failures (#14363)
* nv: exit early in case of failures

* f

* cleaner
2026-01-27 19:16:22 +03:00
nimlgen
ec691cb299
am: print sq intrs (#14366)
* am: print sq intrs

* cleaner
2026-01-27 18:28:13 +03:00
qazal
a5f3d46423
hcq: do not assume kernel names are unique (#14371)
* hcq: do not assume kernel names are unique

* colored kernel name
2026-01-27 23:03:15 +09:00
George Hotz
e5df7e640b
fix branches in amd_asm_matmul (#14369) 2026-01-27 20:48:42 +08:00
George Hotz
0ced258726 HOTFIX: skip crashing assign test 2026-01-27 20:35:17 +08:00
George Hotz
131ae604de
force_transcendental on sqrt (#14368) 2026-01-27 20:24:41 +08:00
imaolo
14574c68fa
Add ContextVar to disable the scheduler cache (#14257)
* add scheduler cache ContextVar

* test scheduler cache context var

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2026-01-27 19:55:29 +08:00
George Hotz
bfc88bcfb8
assembly/amd: emu refactors + enable PYTHON_REMU by default (#14361)
* assembly/amd: start refactors

* cleanups

* those are global

* methods on ctx

* const cleanup

* range helper

* types and imports

* cleanups

* cleanups

* remove stale name

* fix emu2 types

* more typing

* more mypy

* cleanups

* fxns

* scc cleanup

* cleanups

* cleanups

* simpler parse_pcode

* laneid

* no defaults for pcode

* pcode is not optional

* cleanups

* functions cleanup

* splat

* expr_parser functions

* single tok

* invert global loops

* try_eat

* minor

* run parser on all

* no silent 0

* tests
2026-01-27 17:42:24 +08:00
Christopher Milan
2e72625652
Revert "decompose dtypes.long to ints where unsupported (#14261)" (#14362) 2026-01-27 02:04:59 -05:00
qazal
f866b2a513
mfma loop in asm dsl (#14349)
* mfma loop in asm dsl

* work
2026-01-27 11:11:37 +09:00
Christopher Milan
0793319929
decompose dtypes.long to ints where unsupported (#14261)
* add works

* use carry not overflow

* bitwise ops

* use tag instead of vec

* cleaner

* mul somewhat works

* mul actually works

* SUB and NEG work

* SHL/SHR

* ulong support

* this should work?

* oops

* fix indexing

* all ALU mostly works

* refactor

* test_dtype passing

* signed division works

* format

* clean

* some tests

* ruff
2026-01-26 18:34:13 -05:00
wozeparrot
a987a4abc3
feat: llama8b dev_beam.sh (#14358) 2026-01-26 14:51:23 -08:00
Christopher Milan
c9c533fc78
libclang path is homebrew on macos (#14357)
* libclang path is homebrew macos

* typo

* ugh

* typo

* regen

* no LIBCLANG_PATH
2026-01-26 17:32:09 -05:00
chenyu
d641e63189
improve min/max for AND (#14356) 2026-01-26 15:44:18 -05:00
chenyu
f16372487a
fix assign hazard on shrink (#14355)
* fix assign hazard on shrink

possible to have race if both assign src and dest are shrink

* test_nonoverlapping_shrink_assignment
2026-01-26 14:46:30 -05:00
chenyu
145df879c1
find_permutes -> fix_assign_hazard [pr] (#14354)
some noop tweaks and comment updates
2026-01-26 14:05:19 -05:00
nimlgen
e152f1b0f5
llama: use ALL2ALL (#14353) 2026-01-26 22:01:53 +03:00
nimlgen
3f25eb3026
am: ih (#14346)
* am: ih

* um

* fix

* line

* no trap and fix ring

* keep

* fix
2026-01-26 20:11:04 +03:00
chenyu
823bc17fb5
failed test case for shrink overlap assigns (#14350)
* failed test case for shrink overlap assigns

current logic can create a race resulted in wrong output

* skip for now
2026-01-26 11:58:45 -05:00
George Hotz
204f51e739
assembly/amd: bug fixes for PYTHON_REMU (#14347)
* default PYTHON_REMU to 1

* mockgpu

* less size

* normal compile path

* uniqie

* more

* fix clamp

* Change PYTHON_REMU default to 0 in _try_dlopen_remu
2026-01-27 00:48:22 +08:00
chenyu
231305603d
remove REAL_DEV [pr] (#14337)
it's just Device.DEFAULT now
2026-01-26 10:08:16 -05:00
Martin Szewieczek
9cbe99348a
func meshgrid: change param index to type str (#14331) 2026-01-26 10:07:56 -05:00
George Hotz
3b43d26f10
assembly/amd: emu speed (#14344)
* assembly/amd: emu speed

* fix spec

* go

* don't do this

* simpler

* no stupid consts

* hack

* simpler

* no index

* no where

* faster linearizer

* fix spec

* no index dtype
2026-01-26 22:21:34 +08:00
George Hotz
774a454bb5
assembly/amd: fix scratch SVE (#14340)
* assembly/amd: default python REMU

* mem_used

* no lane

* sve

* remove that

* needs s_code_end in tests
2026-01-26 21:03:51 +08:00