Commit graph

9,700 commits

Author SHA1 Message Date
George Hotz
29262a7543 fine for ptx 2025-08-05 18:26:06 -07:00
George Hotz
dcc6ddf0eb that hack broke things 2025-08-05 18:24:58 -07:00
George Hotz
d1d935242b Revert "fix tests"
This reverts commit a27019383d.
2025-08-05 18:06:27 -07:00
George Hotz
6b330f302d remote metal was flaky 2025-08-05 17:50:14 -07:00
George Hotz
a27019383d fix tests 2025-08-05 17:45:35 -07:00
George Hotz
8b285e193a move those to fix_kernel_ops 2025-08-05 17:07:12 -07:00
George Hotz
f0c9b11b9e early meta ops 2025-08-05 17:00:49 -07:00
George Hotz
1902a85ac1 early load buffer 2025-08-05 16:56:10 -07:00
George Hotz
b2fc111e3f cleanup fix_kernel 2025-08-05 16:46:13 -07:00
George Hotz
067daee5be
pin torch to 2.7.1 (#11519) 2025-08-05 15:58:57 -07:00
George Hotz
b39f43c46a
optimize in rewrite, try 2 (#11518)
* changes

* fix test uops

* optimize in rewrite, try 2
2025-08-05 15:52:53 -07:00
George Hotz
07b0df0d86 hotfix: test tensor dims start at 1 2025-08-05 15:40:24 -07:00
George Hotz
4dabdf7c6d
Revert "optimize in rewrite (#11516)" (#11517)
This reverts commit 3b777a9e05.
2025-08-05 15:39:07 -07:00
George Hotz
3b777a9e05
optimize in rewrite (#11516)
* changes

* fix test uops

* dim shouldn't be 0

* huh, why did that one not save
2025-08-05 15:33:26 -07:00
nimlgen
ec676eddfa
nv: move base address higher (#11514) 2025-08-05 22:42:53 +03:00
qazal
7703f8b805
viz: skip flops info if estimates is symbolic (#11513) 2025-08-05 22:12:52 +03:00
nimlgen
fc4e713d1c
jit graph split tests (#11507)
* jit graph split tests

* fix

* one more test

* more tests

* fix

* xm

* rmeote
2025-08-05 21:32:37 +03:00
George Hotz
c57fde51f9
move swizzler to opt (#11509) 2025-08-05 11:31:30 -07:00
chenyu
ace8e9a706
fix test_conv2d_winograd (#11511) 2025-08-05 12:15:46 -04:00
chenyu
223aaa0492
clean up more conv tests (#11510) 2025-08-05 12:15:30 -04:00
Garret Castro
76e62a1c23
extract conv layer test logic (#11488)
* refactor: extract conv layer test logic

* tuple is unnecessary

* integrate _test_conv logic into all conv tests

* fix linter, forgot dilation

* undo winograd extraction

adds too many if statements for a single case
2025-08-05 11:15:54 -04:00
b1tg
8b8bd6c534
make einsum generate same kernels (#11508)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-08-05 11:12:52 -04:00
uuuvn
011ef8fa9d
Fix incorrect jit current batch devs reset (#11505)
`current_batch_devs = []` (in `flush_batch()`) happens between
`new_batched_devs = ...` and `current_batch_devs = new_batched_devs` =>
doesn't actually reset anything leading to things not jitting properly

which 2xs remote bert step time (should have similar effects on any
non-hcq backend)
2025-08-05 08:16:16 +03:00
chenyu
f02720ca2d
fix fuse gate_contiguous unique (#11504) 2025-08-04 23:43:31 -04:00
George Hotz
7f6acfb0d5
give define global and friends a shape (#11502)
* give define global and friends a shape

* ignore negative size

* ptx fix
2025-08-04 19:09:39 -07:00
chenyu
83385e7abc
update gradient src in ramp.py (#11499)
that's simplified now
2025-08-04 18:58:03 -04:00
qazal
846a2826ab
viz: remove TracingKey.fmt (#11482)
* viz: remove TracingKey.fmt

* remove from test too
2025-08-05 00:00:03 +03:00
chenyu
01d44e8f16
tiny reduce_gradient cleanup [pr] (#11498) 2025-08-04 16:12:53 -04:00
chenyu
8a11af01ed
remove broken paperswithcode links in doc (#11497) 2025-08-04 13:12:33 -04:00
leopf
4f0ee4e982
BPE tokenizer (#11415)
* BPE works

* refactor tok

* oops

* basic tests

* fix eval

* smaller diff

* fix error

* proper vocab decoding

* use regex for splitting

* escape ucatrange

* full compat

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-08-04 09:52:38 -07:00
b1tg
06af9f9236
fix double exception + add name,loc in error msg (#11487)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-08-04 13:41:23 +03:00
nimlgen
4877aa965a
ast seems to probe nv as well (#11494) 2025-08-04 11:47:07 +03:00
chenyu
e0106b6b25
1/(x*c) -> (1/c)*(1/x) (#11491)
example: 2*(2*a).reciprocal() -> a.reciprocal()

# TODO: bounds for reciprocal
# TODO: should z3 work?
2025-08-03 23:35:46 -04:00
qazal
5870352fe1
viz: factorize llvm-mca call (#11490) 2025-08-04 00:31:23 +03:00
chenyu
dbc7807c61
enable WEBGPU tests with buffer limit (#11489)
TestSample still fails?
2025-08-03 13:02:44 -07:00
nimlgen
8f374ee1f7
nv: print devfmr in gsp logs (#11484) 2025-08-03 15:12:53 +03:00
chenyu
823f1a01db
move cast around expand backward to tensor.py (#11483) 2025-08-02 23:03:54 -04:00
chenyu
0ce0f51010
generic double cast folding (#11481)
b.cast(a).cast(b) -> b if a preserves all values in b
2025-08-02 19:26:37 -04:00
qazal
72e0d1d0dc
viz: profile the compiler in TINY device (#11457)
* viz: profile the compiler in TINY device

* leanup
2025-08-03 02:03:20 +03:00
chenyu
66be747908
few more dtype cast convinience methods (#11480) 2025-08-02 15:47:09 -04:00
chenyu
e22e5da9a5
move some test_dtype tests to unit (#11479) 2025-08-02 15:25:00 -04:00
nimlgen
da0b955be4
hcq: cpu can be graphed (#11474)
* hcq: cpu can be graphed

* ops

* new jit decisions

* fix test

* fix remote

* cleaner

* fix
2025-08-02 21:01:19 +03:00
chenyu
f7965f85aa
Revert "feat: faster index building (#11462)" (#11478)
This reverts commit 3a4deb08d2.
2025-08-02 12:50:48 -04:00
kevvz
ef7e01cadf
Fix SVD shape bug + Fix batched SVD bug (#11477)
* failing test case

* fix

* better test

* space
2025-08-02 09:47:41 -07:00
b1tg
6ecaf8e7b2
refactor: use less index and simplify reduce axes check [pr] (#11476)
* use output_shape/full_shape

* simple final_reduces check

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-08-02 09:44:51 -07:00
wozeparrot
3a4deb08d2
feat: faster index building (#11462)
* feat: faster index building

* feat: correct training samples
2025-08-02 11:50:18 -04:00
nimlgen
8cc2d64edb
amd: reuse create_queues for usb iface (#11473) 2025-08-02 14:40:46 +03:00
chenyu
9e8e6b45ab
grad acc train llama (#11467)
* grad acc train llama

* log step time
2025-08-01 15:54:50 -04:00
chenyu
7ad7329257
data parallel train llama (#11466) 2025-08-01 12:13:51 -04:00
nimlgen
9f2182f92f
cpu: start threading (#11324)
* cpu: threading

* syncs

* llvm

* fix

* opt

* fx

* fix

* missed sync

* one line less

* cleaner

* fix
2025-08-01 15:35:07 +03:00