Commit graph

9,679 commits

Author SHA1 Message Date
George Hotz
7ae02dea19
Merge branch 'master' into moveleftright 2025-08-04 19:10:27 -07:00
George Hotz
7f6acfb0d5
give define global and friends a shape (#11502)
* give define global and friends a shape

* ignore negative size

* ptx fix
2025-08-04 19:09:39 -07:00
George Hotz
0e91b6fd30 bugfixes 2025-08-04 18:46:32 -07:00
George Hotz
823dfbde70 move view to swizzle 2025-08-04 18:27:27 -07:00
chenyu
83385e7abc
update gradient src in ramp.py (#11499)
that's simplified now
2025-08-04 18:58:03 -04:00
qazal
846a2826ab
viz: remove TracingKey.fmt (#11482)
* viz: remove TracingKey.fmt

* remove from test too
2025-08-05 00:00:03 +03:00
chenyu
01d44e8f16
tiny reduce_gradient cleanup [pr] (#11498) 2025-08-04 16:12:53 -04:00
chenyu
8a11af01ed
remove broken paperswithcode links in doc (#11497) 2025-08-04 13:12:33 -04:00
leopf
4f0ee4e982
BPE tokenizer (#11415)
* BPE works

* refactor tok

* oops

* basic tests

* fix eval

* smaller diff

* fix error

* proper vocab decoding

* use regex for splitting

* escape ucatrange

* full compat

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-08-04 09:52:38 -07:00
b1tg
06af9f9236
fix double exception + add name,loc in error msg (#11487)
Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-08-04 13:41:23 +03:00
nimlgen
4877aa965a
ast seems to probe nv as well (#11494) 2025-08-04 11:47:07 +03:00
chenyu
e0106b6b25
1/(x*c) -> (1/c)*(1/x) (#11491)
example: 2*(2*a).reciprocal() -> a.reciprocal()

# TODO: bounds for reciprocal
# TODO: should z3 work?
2025-08-03 23:35:46 -04:00
qazal
5870352fe1
viz: factorize llvm-mca call (#11490) 2025-08-04 00:31:23 +03:00
chenyu
dbc7807c61
enable WEBGPU tests with buffer limit (#11489)
TestSample still fails?
2025-08-03 13:02:44 -07:00
nimlgen
8f374ee1f7
nv: print devfmr in gsp logs (#11484) 2025-08-03 15:12:53 +03:00
chenyu
823f1a01db
move cast around expand backward to tensor.py (#11483) 2025-08-02 23:03:54 -04:00
chenyu
0ce0f51010
generic double cast folding (#11481)
b.cast(a).cast(b) -> b if a preserves all values in b
2025-08-02 19:26:37 -04:00
qazal
72e0d1d0dc
viz: profile the compiler in TINY device (#11457)
* viz: profile the compiler in TINY device

* leanup
2025-08-03 02:03:20 +03:00
chenyu
66be747908
few more dtype cast convinience methods (#11480) 2025-08-02 15:47:09 -04:00
chenyu
e22e5da9a5
move some test_dtype tests to unit (#11479) 2025-08-02 15:25:00 -04:00
nimlgen
da0b955be4
hcq: cpu can be graphed (#11474)
* hcq: cpu can be graphed

* ops

* new jit decisions

* fix test

* fix remote

* cleaner

* fix
2025-08-02 21:01:19 +03:00
chenyu
f7965f85aa
Revert "feat: faster index building (#11462)" (#11478)
This reverts commit 3a4deb08d2.
2025-08-02 12:50:48 -04:00
kevvz
ef7e01cadf
Fix SVD shape bug + Fix batched SVD bug (#11477)
* failing test case

* fix

* better test

* space
2025-08-02 09:47:41 -07:00
b1tg
6ecaf8e7b2
refactor: use less index and simplify reduce axes check [pr] (#11476)
* use output_shape/full_shape

* simple final_reduces check

---------

Co-authored-by: b1tg <b1tg@users.noreply.github.com>
2025-08-02 09:44:51 -07:00
wozeparrot
3a4deb08d2
feat: faster index building (#11462)
* feat: faster index building

* feat: correct training samples
2025-08-02 11:50:18 -04:00
nimlgen
8cc2d64edb
amd: reuse create_queues for usb iface (#11473) 2025-08-02 14:40:46 +03:00
chenyu
9e8e6b45ab
grad acc train llama (#11467)
* grad acc train llama

* log step time
2025-08-01 15:54:50 -04:00
chenyu
7ad7329257
data parallel train llama (#11466) 2025-08-01 12:13:51 -04:00
nimlgen
9f2182f92f
cpu: start threading (#11324)
* cpu: threading

* syncs

* llvm

* fix

* opt

* fx

* fix

* missed sync

* one line less

* cleaner

* fix
2025-08-01 15:35:07 +03:00
qazal
c7ae1bd474
viz: more consistent border styling (#11464) 2025-08-01 09:31:06 +03:00
George Hotz
8ff03806e8
add llama layers (#11460)
* add llama layers

* add contig bw for speed
2025-07-31 16:28:04 -07:00
qazal
719827b95d
viz: add flops / mem bw to device programs (#11459)
* viz: add flops / mem bw to device programs

* better spacing style
2025-08-01 02:12:30 +03:00
chenyu
3f742a5a7c
comma space lab models benchmark (#11461) 2025-07-31 19:06:18 -04:00
George Hotz
474ee9daa5 hotfix: add contiguous_backward to llama 2025-07-31 15:07:12 -07:00
qazal
fa66d9772d
viz: show const node when it's root (#11456) 2025-08-01 01:01:58 +03:00
qazal
056dabda5a
viz: refactor to color scheme (#11455) 2025-08-01 00:17:50 +03:00
nimlgen
e5b6149dfb
more typing in drivers (#11454)
* more typing in drivers

* rm
2025-07-31 23:26:33 +03:00
qazal
bad3cf5731
viz: add LLVM machine code analysis (#11421)
* start

* works everywhere

* add viz api

* utilization table

* reg pressure ui

* use llvm-mca

* llvm-mca ui

* work

* cleanup

* cycle through, defaults are enough

* x86 pending

* x86 nops

* get mcpu/mtriple from autogen

* cleanup server diff

* move parser to python

* normalize to pct of max

* segments legend

* imports

* also monospace

* max comes from the total per instruction

* base on the value
2025-08-01 01:59:26 +08:00
chenyu
e847677e8a
use AxisType in search instead of colors (#11452) 2025-07-31 13:07:33 -04:00
nimlgen
75c2c42def
suppress exceptions only during finalization (#11451)
* suppress exceptions only during finalization

* fix

* fix typing

* fix more warns

* fix

* better?

* Revert "better?"

This reverts commit a068aa5793.

* mm?

* no as e
2025-07-31 13:57:12 +03:00
wozeparrot
24dd0d52ed
feat: test remove to cpu (#11444) 2025-07-30 20:18:56 -07:00
kevvz
c3cfcb50cb
Add linalg_det and test for torch backend (#11405)
* add linalg_det and test

* space

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-07-30 22:04:44 -04:00
Eitan Turok
cba3655de5
Add Test for Setitem (#10559)
* init

* update

* better

* failing test

* works

* Delete test file

* clean

* lint

* simplify variable name

* rm contigious, rm int dtype, and add assertEqual

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-07-30 22:03:41 -04:00
wozeparrot
6252f7770e
feat: fake data (#11447) 2025-07-30 17:18:20 -07:00
chenyu
e300451f3a
update llama3 (#11446)
`LR=1e-4 TRAIN_ON_VAL=1 DEFAULT_FLOAT=bfloat16 FUSE_ARANGE=1 JITBEAM=2 OPTIM_DTYPE=bfloat16 LLAMA3_SIZE=1B WARMUP_STEPS=36 DECAY_STEPS=360 SEQLEN=512 PYTHONPATH=. AMD=1 AMD_LLVM=0 MODEL=llama3 python3 examples/mlperf/model_train.py` trained to 7
2025-07-30 19:34:21 -04:00
wozeparrot
5fb975351a
feat: flag for training on val (#11441) 2025-07-30 14:29:45 -07:00
chenyu
4ca430e5bf
fix search dedup (#11439)
it should check against pre real_axis axis in actions, not real_axis.
2025-07-30 17:24:16 -04:00
wozeparrot
d3da20eca6
feat: bump mlperf workflow timeout to 6 hours (#11440) 2025-07-30 14:12:12 -07:00
wozeparrot
825b6a2505
feat: llama3 dataloader (#11340) 2025-07-30 13:27:55 -07:00
qazal
af357b5dc8
disable TRACK_MATCH_STATS in BEAM workers [pr] (#11437) 2025-07-30 23:22:08 +03:00