Commit graph

5,784 commits

Author SHA1 Message Date
chenyu
33b635d23a
Tensor.train -> TRAINING [PR] (#16705)
* Tensor.train -> TRAINING [PR]

* doc
2026-06-22 15:13:22 -04:00
chenyu
2d8b802958
contiguous in wino conv (#16696)
also fixed test_counters
2026-06-21 17:11:46 -04:00
chenyu
ba1d3baae8
masked_select and nonzero to mixin [PR] (#16695)
with a .data stub
2026-06-21 15:10:44 -04:00
chenyu
58ff75272e
const_like and invalids to mixin [PR] (#16690)
* const_like and invalids to mixin [PR]

* empty_like

* einsum

* type
2026-06-21 00:02:29 -04:00
chenyu
4618d27129
final const cleanups [PR] (#16688) 2026-06-20 21:38:16 -04:00
chenyu
8b07cca9f7
invalid clone try 3+ [PR] (#16679) 2026-06-19 20:13:52 -04:00
Christopher Milan
1822eed8d3
ci: only test models on cpu (#16678) 2026-06-19 18:16:59 -04:00
wozeparrot
bba611bb59
gemm: fix mxfp8 on more shapes (#16677) 2026-06-19 13:28:53 -07:00
chenyu
67c3e589a1
invalid clone tests and prereq [PR] (#16675) 2026-06-19 13:20:43 -04:00
George Hotz
649971f02a
remove DEFINE_LOCAL and DEFINE_REG (gpt) (#16673)
* remove define_local and define_reg (gpt)

* fix precommit

* cleanups

* regalloc fix

* cleanups 2
2026-06-19 10:07:50 -07:00
George Hotz
b05bea81ce
x86 cleanups (fable) [pr] (#16591)
* x86 cleanups (fable)

* support shrink

* remove ptr dtype

* move that

* is_lane helper

* Revert "is_lane helper"

This reverts commit ea4571254d.
2026-06-19 09:04:51 -07:00
George Hotz
d7b10c69bc
update placeholder to not create DEFINE_LOCAL/DEFINE_REG (#16671)
* update placeholder to not create DEFINE_LOCAL/DEFINE_REG

* simpler

* define_local
2026-06-18 21:21:06 -07:00
George Hotz
925c49ce99
use placeholder in tests (#16672) 2026-06-18 20:51:44 -07:00
George Hotz
4a4b6956df
remove DEFINE_VAR from codebase (gpt) (#16666)
* remove DEFINE_VAR from codebase

* junk

* remove junk
2026-06-18 15:33:50 -07:00
George Hotz
5989d0b150
remove DEFINE_VAR try 2 (#16651)
* remove DEFINE_VAR try 2

* param

* null index

* fix fuzzing

* fixes

* no gather neg params

* param is just Irreducible

* fixes

* skip stack

* need to filter slots there
2026-06-18 12:34:25 -07:00
chenyu
d74f488376
clean up _function.depth properly [PR] (#16663) 2026-06-18 14:10:22 -04:00
qazal
924bece1d5
remove some old scheduler tests (#16660) 2026-06-18 22:15:00 +09:00
qazal
b753fb5e4c
viz: view source working even if compile failed (#16657)
* failing test

* hard

* ret_dict

* switch to _data for tests too

* update sqtt

* start work

* Ops.LINEAR looks good

* baseline with depth works

* support depth

* types

* @needs_tracked_pm

* update, marg can error too

* unwrap_or goes to many more places

* move things to soft_err

* soft_err everywhere needed

* diff cleanup

* use list

* rewrite it

* change

* update depth number

* small comment change
2026-06-18 17:34:53 +09:00
Christopher Milan
e0fe6e542e
ci: fewer pydeps (#16654) 2026-06-17 22:52:14 -04:00
chenyu
a74b7130b4
Revert "invalid clone try 2 [PR] (#16648)" (#16653)
This reverts commit 1bd4551ee1.
2026-06-17 22:05:30 -04:00
chenyu
df015ad541
remove many type ignores [PR] (#16652) 2026-06-17 21:38:45 -04:00
chenyu
1bd4551ee1
invalid clone try 2 [PR] (#16648) 2026-06-17 19:44:35 -04:00
George Hotz
53a1226a49
STACK 0 is dtype void (#16650)
* STACK 0 is dtype void

* spec for stack

* fix gemm group + END shape

* bump
2026-06-17 16:28:32 -07:00
George Hotz
aef85ddc4d
addrspace special/range (#16647)
* addrspace special/range

* just include indexing

* define var is alu

* bring old ignore indexing back

* mults to fix

* fixes

* ALU

* fixes
2026-06-17 15:57:37 -07:00
chenyu
1e08c0a07c
remove NOOP from AFTER with multiple srcs (#16646) 2026-06-17 14:35:02 -04:00
chenyu
1acc40600d
indexing an after with all fully invalid stores is invalid (#16643)
* indexing an after with all fully invalid stores is invalid

* typing cast
2026-06-17 11:06:36 -04:00
George Hotz
d631716858
remove const without STACK (#16639)
* remove const without STACK

* fix GEP rewrite

* fix null tests

* fix openpilot regression

* it's 10 in CI
2026-06-16 21:25:42 -07:00
wozeparrot
36f6d1b064
gemm: fix bf16 atb for mp sharding (#16637) 2026-06-16 15:58:47 -07:00
qazal
1cb6b88d37
viz: show contents of vconst (#16636)
* failing test

* render vconst

* simpler test

* reorder
2026-06-17 02:31:03 +09:00
chenyu
f0998e9bba
Revert "invalid clone is anonymous buffer" (#16613) (#16633) 2026-06-16 08:27:48 -04:00
qazal
7d2b0b697d
simple failing test for invalid extra E kernel (#16632)
* simple failing test for invalid extra E kernel

* 6 kernels
2026-06-16 17:57:44 +09:00
chenyu
efd03d7153
invalid clone is anonymous buffer [PR] (#16613) 2026-06-15 20:14:26 -04:00
George Hotz
41aa2fe119
test_gemm needs .clone() on eye (#16629) 2026-06-15 12:48:27 -07:00
George Hotz
b1fb39502d delete that test 2026-06-14 09:42:58 -07:00
chenyu
5d5ead78da
inline unique_const in invalids [PR] (#16612) 2026-06-13 10:14:32 -04:00
Sieds Lykles
b00dd754a9
Remove if-condition from nested div rule [pr] (#16611)
* add rules and test

* trigger [pr]
2026-06-13 15:47:21 +02:00
nimlgen
c43091a464
fix missing cast in cstyle (#16608)
* fix missing cast in cstyle

* x

* x
2026-06-13 10:04:06 +03:00
Christopher Milan
8862c7549c
new-style dcache_flush (#16602) 2026-06-12 22:25:08 -04:00
chenyu
aa32d309db
fix rangeify indexing for pad/reduce (#16599) 2026-06-12 20:26:15 -04:00
qazal
b2e95b2db3
rangeify: no copies for write+read of same slice (#16585)
* failing test

* cleaner failing tests

* assign and read of same slice shouldn't create copies

* err in the changes

* shrink with no overlapping regions in dest is fine
2026-06-13 02:19:47 +09:00
Philip Sinitsin
76c10cd635
jit: don't memplan buffers reachable from live tensors (#16588)
The memory planner was suballocating BUFFERs created during JIT capture that are still referenced by external lazy tensor graphs, like the .grad tensors assigned by backward(). The replay then only writes the arena slices, so realizing such a tensor after the call reads freshly allocated memory and silently returns zeros. Hold every BUFFER reachable from a live Tensor instead of only the parameters of the return value; true internals are still planned. Fixes #16571.
2026-06-12 17:51:54 +03:00
qazal
4d34590b7d
llama: less E kernels (#16517) 2026-06-12 19:49:25 +09:00
qazal
12f4cf0e49
rename amd/test_custom_kernel.py to test_asm_kernel (#16586)
* rename amd/test_custom_kernel.py to test_asm_kernel

* update
2026-06-12 16:11:01 +09:00
George Hotz
b8aec4cce7
port x86 to new_style (fable slop) and now everything is new style (#16581)
* port x86 to new_style (fable slop)

* don't change ops

* port NIR to new_style (fable)

* lil cleanup

* fix tests, and remove new_style
2026-06-11 21:09:34 -07:00
chenyu
762f50bd52
move gradient.py to mixin/ [PR] (#16583) 2026-06-11 23:58:21 -04:00
chenyu
a2cec397f3
UOp cast and bitcast takes DTypeLike [PR] (#16582)
* UOp cast and bitcast takes DTypeLike [PR]

match Tensor

* fix type
2026-06-11 22:38:54 -04:00
Christopher Milan
4d893f626a
move a bunch of test_schedule to null (#16578) 2026-06-11 20:26:34 -04:00
chenyu
5f1e2d3900
PADTO pads Invalids (#16562) 2026-06-11 16:54:26 -04:00
qazal
a83710396c
support mselect input to CALL, less kernels in allreduce (#16567)
* support mselect input to CALL, less kernels in allreduce

* resolve mstack
2026-06-11 18:10:47 +09:00
qazal
21f1101691
add allreduce kernel count test (#16566) 2026-06-11 15:54:12 +09:00