Commit graph

1,197 commits

Author SHA1 Message Date
geohotstan
1163292759
move onnx_parser into onnx (#11530) 2025-08-06 10:46:27 -04:00
nimlgen
eafc7fda12
upd perfetto (#11528) 2025-08-06 14:00:34 +03:00
nimlgen
4877aa965a
ast seems to probe nv as well (#11494) 2025-08-04 11:47:07 +03:00
George Hotz
8ff03806e8
add llama layers (#11460)
* add llama layers

* add contig bw for speed
2025-07-31 16:28:04 -07:00
George Hotz
474ee9daa5 hotfix: add contiguous_backward to llama 2025-07-31 15:07:12 -07:00
kevvz
c3cfcb50cb
Add linalg_det and test for torch backend (#11405)
* add linalg_det and test

* space

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-07-30 22:04:44 -04:00
wozeparrot
825b6a2505
feat: llama3 dataloader (#11340) 2025-07-30 13:27:55 -07:00
nimlgen
5fc5bb5237
ci: clear processes (#11434)
* unified hcq_smi for managment

* fix

* fix

* no reset for amd
2025-07-30 22:15:18 +03:00
George Hotz
4f26a9ad32
check elements_per_thread in tensorcore [pr] (#11435) 2025-07-30 11:55:48 -07:00
George Hotz
1bef2d80c1
unrolls are all in the same scope (#11429)
* unrolls are all in the same scope

* fix that import
2025-07-29 16:55:37 -07:00
George Hotz
03909f2772
permute locals for HL uop matmul (#11412)
* permute locals for HL uop matmul

* parens fix that

* permutes

* 20 TFLOPS
2025-07-29 08:19:59 -07:00
George Hotz
735ad5f10d
kernel4 and 5 in uops (#11411)
* move simplify views to merge views

* add amd kernel 4

* Revert "move simplify views to merge views"

This reverts commit 1e07dff384.

* k4 in python

* kernel4 written in uops

* k5 support

* cleanups
2025-07-28 19:35:48 -07:00
George Hotz
fddc645668
HL=2 top matmul (#11406)
* HL=2 top matmul

* top colored
2025-07-28 12:32:38 -07:00
George Hotz
dfeee63d30
uop matmul work (#11388)
* uop matmul work

* works with locals
2025-07-26 21:23:55 -07:00
George Hotz
2c70eaf18c
fix load / barrier (#11386)
* fix load / barrier

* cleanups

* fix CI
2025-07-26 10:27:37 -07:00
George Hotz
466ab5a3f2
store/load not pass through index (#11381)
* noop

* fix noop

* store cat is NOOP

* store dtype is void

* stores aren't passed through anymore

* meh, skip those for ptx

* correct ptx skip

* hl runs
2025-07-25 21:01:47 -07:00
chenyu
3d68feb67d
minor onnx Gather cleanup (#11375)
removed a type ignore and one error code skip
2025-07-25 21:08:08 -04:00
George Hotz
490a93902c
define reg doesn't have init anymore (#11365)
* define reg doesn't have init anymore

* remove that

* no special logic for dr

* fix amd uop matmul
2025-07-24 19:15:49 -07:00
George Hotz
0602b22086
kernel spec (#11359)
* kernel spec

* ops.VIEW

* work
2025-07-24 12:45:38 -07:00
George Hotz
b0dc97d1f7
write out kernel 3 in uops (#11352)
* write out kernel 3 in uops

* matmul is correct

* gemm passes spec

* bugfix to match speed

* cleanups
2025-07-23 17:32:38 -07:00
chenyu
86e7504111
mypy check extra/onnx.py (#11348)
instead of running test with 3.10, add onnx to mypy which would have caught StrEnum regression. Several type annotation failed mypy now that does not affect running the code and were skipped for now
2025-07-23 12:42:59 -04:00
chenyu
960da9319d
Remove StrEnum in onnx for python 3.10 (#11345)
some training tests failed looks like parsing error?
2025-07-23 11:52:25 -04:00
George Hotz
108aac8af4
use AddrSpace instead of local (#11314)
* use AddrSpace instead of local

* addrspace in test
2025-07-21 14:00:06 -07:00
geohotstan
445ff8de56
ONNX onnx_parser and buffer_parse clean up (#11000)
* start

* remove onnx.load from compile4 and move np to dropout

* clean up and enable test

* clean up

* move WebGPU ONNX test into MacOS (WebGPU)

* leave test in ONNX (CPU)

* fix raw_data init None, and simplify onnx_runner test a little?

* THESE TESTS ARE SO UGLY UGHH

* need to really think about how to structure the test

* wow LLMs are quite something

* not always on disk now

* also add external data loading test

* cleaner tests

* minimize diff and add const folding tests

* add external data loading too

* whoops add webgpu back.. but why was it not needed in the first place?

* better comment

* move webgpu test to macos(webgpu)?

* llm english so much better than me wow

* trigger CI to check flakiness

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-07-21 15:10:25 -04:00
George Hotz
842184a1ab
rename kernelize to schedule, try 2 (#11305) 2025-07-21 11:18:36 -07:00
वेदांत
e368628736
Add amin support to Tensor operations in Torch backend (#11290)
* intiger div mod fix

* Revert "intiger div mod fix"

This reverts commit d5d2f201bf.

* feat arg_min support

* tets update

* test fix
2025-07-21 09:14:08 -04:00
nimlgen
cc3c1e4c14
hcq: move cpu to hcq (#11262)
* hcq: move cpu to hcq

* import time

* upd

* fix

* windows support

* hm

* cleaner

* fix timer

* fix timing

* std is ns

* skip profiler

* mypy

* cleaner

* cleanups

* after merge

* default is back
2025-07-21 15:10:38 +03:00
nimlgen
2f72be5055
nv_smi: init basic insmod/rmmod/reset cmds (#11282) 2025-07-19 15:43:03 +03:00
qazal
577e581943
fix typo in sqtt/readme (#11281) 2025-07-19 15:10:24 +03:00
geohotstan
536b254df4
Bump onnx to 1.18.0 (#11266)
* bump

* thou hast implement functions

* hacked in domain support

* some clean ups

* hack quantize_onnx_test too

* add helper lol, why onnx tests why

* better dispatcher, but need tests and better naming

* flaky ci

* change some names

* small clean ups

* make it easier to clean up tests once ORT supports 1.18.0

* nits

* fix bug of Softmax_1 being registered in onnx_ops

* need a default value

* resolve_const is better name

* fix OnnxRunner.to

* use proper domain names
2025-07-17 15:35:41 -04:00
chenyu
c8e5c4d7c3
insert_before -> insert_at [pr] (#11257)
more precise
2025-07-15 17:44:34 -04:00
chenyu
a0438012af
remove Kernel.get_program [pr] (#11203) 2025-07-12 20:50:29 -04:00
George Hotz
d67c8e7b42
local metal on metal in uop syntax (#11185)
* local metal on metal in uop syntax

* TODO: just put the axis_info in the kernelinfo

* local

* amd_matmul works @ 28 TFLOPS

* clean up matmul

* kernel8 works

* remove that

* locals

* axistype innovation

* work

* cleanup

* kernel3 regs

* cleanup kernel3

* work

* why is it broken

* no beam

* reenable

* permutes
2025-07-12 16:31:19 -07:00
geohotstan
5ce278b245
OnnxRunner file as input (#10789)
* file path as input and have parse be in OnnxRunner.__init__

* modelproto_to_onnxrunner -> modelproto_to_runner

* whoops, fix import

* oh flakiness again, is it because it's getting gc-ed?

* small changes

* CI flaky so just move compile4 fix in

* copy typing of onnx_load

* actually can just import onnx_load instead of onnx.load

* fix external_benchmark_openpilot

* fix onnx_runner test to use onnx_helper

* rerun CI

* try run_modelproto

* spam CI a few times

* revert run_modelproto since that's flaky also

* no external onnx_load usage except onnx.py

* cursor tab complete is evil. Snuck a darn sorted in. But does order change result? Why?

* model_benchmark 193s -> 80s, add OnnxRunner.to()...

* minimize diff and clean up

* device can be None, weird but eh

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-07-12 14:27:46 -04:00
chenyu
6283d50224
DEPRECATED_linearize -> to_program [pr] (#11198) 2025-07-12 13:46:20 -04:00
George Hotz
2893feb9f6
cleanups for kernel.py (#11143)
* cleanups for kernel.py

* fixups
2025-07-08 18:10:25 -07:00
chenyu
7ce9e45474
mypy onnx_parser (#11141) 2025-07-08 19:50:28 -04:00
chenyu
ffcc557986
lint onnx and onnx_parser (#11134) 2025-07-08 15:28:35 -04:00
qazal
3dfc0ff887
move cpu_profile and shared ProfileEvents from device.py to helpers [pr] (#11126)
* move cpu_profile and shared ProfileEvents to helpers [pr]

* TestProfiler.test_cpu_profile

* update test_viz.py

* TestProfiler.test_profile_multiops ordering, it's different streams now
2025-07-08 12:14:03 +03:00
nimlgen
71377cd233
nv: parse falcon app descs (#11118) 2025-07-07 18:14:14 +03:00
kevvz
b7af9cf849
clean svd tests, set full_matrices false in torch backend (#11113)
* clean tests, set full_matrices false

* add more shape asserts
2025-07-06 13:55:49 -04:00
chenyu
ba88ec3ad0
pipe linalg svd to torch (#11109)
and found a bug in svd
2025-07-06 08:37:25 -04:00
nimlgen
4dccb2ea49
am_smi: increase kill retries (#11099) 2025-07-05 16:23:50 +03:00
0xSG
17119b0f23
hip_ioctl: platform.machine added (#11084) 2025-07-04 17:20:24 +03:00
nimlgen
2d138c6cf1
am: factor out init_sw (#11070) 2025-07-03 11:01:17 +03:00
chenyu
425d5f55c4
generate kernel dataset and upload artifact (#11063) 2025-07-02 17:21:25 -04:00
chenyu
4626e9c172
is_numpy_ndarray helper [pr] (#11050) 2025-07-02 09:12:53 -04:00
chenyu
126fcf4129
clean up AMD_LLVM in tests (#11021) 2025-06-28 22:45:47 -04:00
chenyu
a6485d00c8
very tiny generate_dataset (#11013)
one minute to gen on my mac
2025-06-27 17:10:45 -04:00
George Hotz
be53ef4f0a
rename DEFINE_ACC -> DEFINE_REG (#11006)
* rename DEFINE_ACC -> DEFINE_REG

* add CMPEQ to groupops
2025-06-27 11:09:25 -07:00