Commit graph

13,471 commits

Author SHA1 Message Date
Graham Robbins
4ca844e96b
add Q1_0 gguf type (#15683)
* add Q1_0

* better description

* fix trailing whitespace

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2026-04-11 18:17:24 +08:00
George Hotz
5156a04cf5
add support for AM_POWER_LIMIT (#15684)
* add support for AM_POWER_LIMIT

* level None
2026-04-11 17:14:54 +08:00
wozeparrot
457508d5a0
llama: save more 2 (#15681) 2026-04-11 01:03:36 -07:00
George Hotz
29238b772f AMD USB: support for 0xF3 power toggle 2026-04-11 13:04:38 +08:00
George Hotz
b5a9465b13
llm: add support for moonlight (deepseek MLA) (#15466)
* add gguf Q5_0

* it works

* rebase

* simpler test

* class

* less diff

* dicts

* normal names

* simplify

* this

* simpler

* work

* work
2026-04-11 10:32:48 +08:00
wozeparrot
590464c8d8
llama: only support wqkv path + cleanups (#15680)
* llama: only support wqkv path + cleanups

* llama: missing transpose
2026-04-11 07:39:27 +08:00
nimlgen
aa012d6f08
usb: faster custom (#15678)
* usb: _f0_out_buf for e4 cmd as well

* custom speed

* fast
2026-04-10 23:00:31 +03:00
nimlgen
58646f9569
usb fast copyout (#15677)
* usb

* fix usb
2026-04-10 21:04:49 +03:00
qazal
0d5cdc9600
viz: split draw loop (#15676)
* split draw loop

* one draw

* no functions

* inline all highlights

* cleanup
2026-04-10 23:25:50 +09:00
chenyu
e1334d3852
move canonicalize_device to device.py (#15675) 2026-04-10 09:43:56 -04:00
chenyu
8e7fcc8ca3
remove _include_initial in _cumalu (#15674)
handle negative pad in caller
2026-04-10 08:33:30 -04:00
George Hotz
9092f2a8c0
llm: add shared_expert and rope_dim support from qwen35 (#15673)
* llm: add shared_expert and rope_dim support from qwen35

* refactor into FFNBlock and TransformerBlock

* norms where they belong
2026-04-10 19:18:27 +08:00
b1tg
9ab1415937
llm: fix streaming UTF-8 decode (#15653) 2026-04-10 17:01:02 +08:00
wozeparrot
55bcd7cc9e
llama amax outside (#15670) 2026-04-09 23:08:03 -07:00
George Hotz
16f3448b26
Add HIP to abstractions4 (#15672)
* cleanup formatting

* add HIP option

* pass in correct
2026-04-10 14:05:52 +08:00
George Hotz
ed2a72bb23
work on abstractions4 (#15671)
* work on abstractions4

* works

* offst

* assembly works

* RAND

* cleanup

* work
2026-04-10 13:25:11 +08:00
Christopher Milan
dbc23e8a1b
move HCQ_VISIBLE_DEVICES into DEV (#15668) 2026-04-09 22:01:35 -04:00
George Hotz
fa02105546 hotfix: pin amd isa xml version 2026-04-10 06:47:00 +08:00
nimlgen
057dc173ab
beam uop (#15660)
* beam as uop

* x
2026-04-09 19:13:03 +03:00
nimlgen
0ff30b003d
am: reset queues from spi (#15664)
* am: reset queues from spi

* move
2026-04-09 18:25:50 +03:00
George Hotz
48a7627b04
add RDNA4 support to copy WMMA (#15663)
* add RDNA4 supportt to copy WMMA

* simpler

* simpler

* comment

* assert
2026-04-09 22:48:20 +08:00
chenyu
6837881b06
remove same_shape_noop [pr] (#15662)
no longer used
2026-04-09 09:50:26 -04:00
Christopher Milan
d08c76d9cb
c.Struct cleanup (#15640) 2026-04-08 20:07:16 -04:00
qazal
742b3894d7
viz/cli: add pmc printer (#15651)
* viz/cli: add pmc printer

* cli work

* s

* linter

* pack workgroups

* add : to wgp

* counter name
2026-04-09 08:50:54 +09:00
chenyu
4cf2759fc8
fix merge_reduce_ends (#15659)
* fix merge_reduce_ends

same range with different nesting should not merge, like cumsum twice should not merge

* skip that
2026-04-08 17:20:01 -04:00
chenyu
cb681da840
move UOp.pad to mixin (#15657)
the same arg works for Tensor.pad
2026-04-08 13:15:19 -04:00
nimlgen
28b14b0e38
mlx: remove to_be, use helpers (#15655) 2026-04-08 20:07:28 +03:00
nimlgen
1b44cb2ac6
split update stat from execitem (#15654) 2026-04-08 20:07:12 +03:00
qazal
71c83cc3f6
viz: put OTHER_ on the wave row (#15650)
* viz: put OTHER_ on the wave row

* update tests

* cleanup cli
2026-04-08 23:13:44 +09:00
chenyu
839d37b7bc
update median_step_time in model_train.py (#15649)
BENCHMARK=5 used to pick the 4th largest, not the middle one
2026-04-08 09:53:59 -04:00
chenyu
dae9dea903
clean up tensor random functions (#15648)
* clean up tensor random functions

* revert that
2026-04-08 09:44:37 -04:00
George Hotz
1ebeb52e59
RDNA4 asm gemm (#15427)
* sqtt: rdna4 decoder work

* diff cleanup

* more diff

* test

* 125

* r4

---------

Co-authored-by: qazal <qazal.software@gmail.com>
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>
2026-04-08 21:26:44 +08:00
nimlgen
b1e52ba0c2
the slowest line in hcq graph (#15635)
* the slowest line in hcq graph

* x
2026-04-08 15:53:52 +03:00
qazal
3ac16b3bea
viz: add wmma row, update exec duration logic (#15646)
* viz: split wmma to its own row, fix duration logic

* regs

* decrease number of loops, add pickle

* assert overlaps
2026-04-08 20:24:23 +09:00
George Hotz
35e3983840
Add Q5_0, Q5_1, and bfloat16 GGUF types (#15644) 2026-04-08 17:16:19 +08:00
qazal
39a029ec55
remove ASM_GEMM context var (#15645) 2026-04-08 18:02:40 +09:00
qazal
dc6a51e44d
viz: add # of bytes to sdma (#15639)
* viz: add # of bytes to sdma

* update test_viz
2026-04-08 17:43:37 +09:00
wozeparrot
70dbd35023
llama: move custom_kernel into flat_llama (#15643) 2026-04-08 00:19:14 -07:00
Christopher Milan
bcf6931a4f
fix: comma 4 does not have pcie (#15642) 2026-04-07 23:57:03 -04:00
George Hotz
f930579b7a llm: change the default port to 8000 so you can remember it (match vLLM) 2026-04-08 11:25:38 +08:00
b1tg
bf3763526a
llm: buffer SSE chunks to fix parse errors from split reads (#15641) 2026-04-08 10:26:23 +08:00
qazal
a508b8fd2a
viz: delete redundant things (#15637)
* delete that

* remove

* delete graph config
2026-04-08 07:18:04 +09:00
chenyu
9c6e925b56
move lerp to mixin (#15634)
last function of math function section
2026-04-07 15:13:00 -04:00
qazal
890286e8d6
update llama profile.sh (#15633)
* update llama profile.sh

* BENCHMARK 5
2026-04-08 03:18:45 +09:00
nimlgen
b78b384d58
mlx: graph (#15621)
* Dx

* Dx

* simpler

* mypy

* x

* f

* Dx

* x

* c

* x
2026-04-07 19:43:51 +03:00
qazal
d29f0ef721
viz: speed up profiler first render (#15632)
* viz: speed up profiler first render

* better comment
2026-04-07 23:07:09 +09:00
George Hotz
d3de63d998
improvements to apps.llm (#15631) 2026-04-07 20:34:05 +08:00
George Hotz
2b01ca59dd
USB driver for custom ASM firmware (#15597)
* USB driver for custom ASM firmware

* timeout

* fix mypy

* pcie mem read

* flip in f/w

* one tx

* litle endian

* autodetect custom

* mock bypass

* lint

* clean
2026-04-07 13:45:41 +08:00
wozeparrot
810d7c00cd
llama: unify scripts (#15628) 2026-04-06 20:28:08 -07:00
Christopher Milan
19e96497ee
interface in DEV (#15620) 2026-04-06 19:59:28 -04:00