Commit graph

11,300 commits

Author SHA1 Message Date
George Hotz
d1223922b1 fixed and test is real 2025-12-04 16:52:11 -08:00
George Hotz
05c4b18f91
Merge branch 'master' into sched_cache 2025-12-04 16:46:23 -08:00
qazal
f21c9dbf4b
enable PMC with VIZ=2 (#13575) 2025-12-05 03:09:53 +08:00
qazal
d7caae5f61
viz: tabulate pmc (#13574)
* viz: tabulate pmc

* linter

* enable nesting

* pmc comes before waves
2025-12-05 03:08:39 +08:00
chenyu
42f6cf3a90
tighter test_real_world mem and kernel count bounds (#13573)
also check if actual usage is within 20% of set limit, the old limits are too big to be useful
2025-12-04 13:35:39 -05:00
chenyu
89f9e1dcd5
add SGD to beautiful_mnist (#13571) 2025-12-04 12:17:29 -05:00
qazal
512a8f3dd4
viz: start global memory PMC tests (#13569) 2025-12-05 00:40:27 +08:00
chenyu
7df56d3b99
Optimizer.device is a property (#13568) 2025-12-04 09:25:15 -05:00
nimlgen
db99a61fad
qcom: support cpu mappings (#13565)
* test

* qcom: support cpu mappings

* clean

* msg
2025-12-04 14:50:46 +03:00
George Hotz
bd6a068ef7
move track_rewrites to outer schedule cache (#13556)
Co-authored-by: qazal <qazal.software@gmail.com>
2025-12-04 19:13:45 +08:00
qazal
3eae146139
faster process replay [pr] (#13564) 2025-12-04 18:52:07 +08:00
Rory Clear
6eab756578
fix and test loading num_batches_tracked (#13538)
* fix and test loading num_batches_tracked

* add failing reverse case

* try reshape state dict if mismatch

* reshape for () and (1,)

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2025-12-04 01:22:49 -08:00
nimlgen
877a7fdd61
jit: support encdec (#13563)
* jit: support encdec

* fix
2025-12-04 11:58:34 +03:00
Douglas Nyberg
a8a62bc08e
add max/min reduction support to ScatterND (#13562) 2025-12-04 00:53:47 -08:00
ayanhan
edf929ec9d
fix: add __delitem__ to Tensor with proper TypeError (#13561) 2025-12-04 00:53:08 -08:00
Douglas Nyberg
9411ecedc4
fix CUDA half-precision trunc() type mismatch (#13559) 2025-12-03 21:53:16 -05:00
ayanhan
92b40290c7
fix: add test_sum_int and remove outdated TODO in test_custom_kernel (#13560) 2025-12-03 21:51:58 -05:00
Christopher Milan
0a54434b15
mitigate ctypes c_bool bitfield bug (#13558)
* mitigate ctypes c_bool bitfield bug

* don't delete old test
2025-12-03 20:46:04 -05:00
George Hotz
f58b3afeb2
Merge branch 'master' into sched_cache 2025-12-03 16:12:44 -08:00
George Hotz
96d16675fe update examples/gradaccum_mnist.py to use the JIT 2025-12-03 16:11:42 -08:00
George Hotz
e0a805765e full jit 2025-12-03 16:08:34 -08:00
George Hotz
7c66e44454 fix JIT in examples/gradaccum_mnist.py 2025-12-03 16:00:28 -08:00
George Hotz
e75e391ad4
Merge branch 'master' into sched_cache 2025-12-03 15:41:31 -08:00
George Hotz
24ca8eeaa7
small fixups from schedule_cache (#13557) 2025-12-03 15:41:16 -08:00
George Hotz
8c69e26d22 metadata is best effort 2025-12-03 15:22:58 -08:00
George Hotz
74fb405cc9 reenable the actual schedule cache 2025-12-03 15:03:42 -08:00
George Hotz
bf5de6ba5f delete abstractions2 2025-12-03 15:02:20 -08:00
George Hotz
183b3ced03 fix process replay 2025-12-03 14:56:28 -08:00
George Hotz
2280dae504 src[0].op 2025-12-03 14:50:46 -08:00
George Hotz
9ba612f0b4
Merge branch 'master' into sched_cache 2025-12-03 14:50:29 -08:00
Douglas Nyberg
f5abd38132
remove tfa dependency: use keras.optimizers.Lamb and tf.raw_ops for LARS (#13555) 2025-12-03 17:48:27 -05:00
George Hotz
32794853db why is that broken? 2025-12-03 14:44:41 -08:00
George Hotz
4a72a49082
Merge branch 'master' into sched_cache 2025-12-03 14:34:49 -08:00
George Hotz
a4c4e48385
add LUNIQUE op (#13554) 2025-12-03 14:34:34 -08:00
George Hotz
9e6f8c823d always miss 2025-12-03 14:22:26 -08:00
George Hotz
4459a88a54 fix spec 2025-12-03 14:19:07 -08:00
George Hotz
9cdda8913f put that there 2025-12-03 14:15:13 -08:00
George Hotz
e644d59f9f oops, fix cache 2025-12-03 14:07:04 -08:00
George Hotz
37a930591f preserve metadata 2025-12-03 14:04:20 -08:00
George Hotz
723179dfd6
Merge branch 'master' into sched_cache 2025-12-03 13:43:58 -08:00
George Hotz
a909cd4581
faster HEVC decode (#13552)
* faster HEVC decode

* bind to variables

* cleanups

* more cleanups
2025-12-03 11:33:05 -08:00
chenyu
22777a89ea
minor test_uop_symbolic updates (#13551) 2025-12-03 13:17:44 -05:00
chenyu
a205f98ef4
tighter bound for MOD (#13550) 2025-12-03 11:24:29 -05:00
nimlgen
fcdb01abe7
hip: fix ioctl (#13548) 2025-12-03 16:40:43 +03:00
qazal
aab7535805
viz: format buffer size unit (#13547) 2025-12-03 21:35:49 +08:00
nimlgen
daea1161cc
nv: nvdec for blackwell (#13546) 2025-12-03 16:30:22 +03:00
nimlgen
549f3287a8
fix caching for fetch (#13544) 2025-12-03 14:34:14 +03:00
qazal
8390de39e6
amd: static flag check for sqtt/pmc (#13545) 2025-12-03 18:36:15 +08:00
George Hotz
ddf3f2d0c4
rdna3 asm + zip_extract (#13499)
* rdna3 asm + zip_extract

* include sqtt

* fix end parsing

* disassembler working

* parsing fields

* instruction

* op

* more parsing
2025-12-02 22:56:01 -08:00
George Hotz
81bafb1af3
Merge branch 'master' into sched_cache 2025-12-02 19:59:48 -08:00