Commit graph

1,181 commits

Author SHA1 Message Date
chenyu
3d68feb67d
minor onnx Gather cleanup (#11375)
removed a type ignore and one error code skip
2025-07-25 21:08:08 -04:00
George Hotz
490a93902c
define reg doesn't have init anymore (#11365)
* define reg doesn't have init anymore

* remove that

* no special logic for dr

* fix amd uop matmul
2025-07-24 19:15:49 -07:00
George Hotz
0602b22086
kernel spec (#11359)
* kernel spec

* ops.VIEW

* work
2025-07-24 12:45:38 -07:00
George Hotz
b0dc97d1f7
write out kernel 3 in uops (#11352)
* write out kernel 3 in uops

* matmul is correct

* gemm passes spec

* bugfix to match speed

* cleanups
2025-07-23 17:32:38 -07:00
chenyu
86e7504111
mypy check extra/onnx.py (#11348)
instead of running test with 3.10, add onnx to mypy which would have caught StrEnum regression. Several type annotation failed mypy now that does not affect running the code and were skipped for now
2025-07-23 12:42:59 -04:00
chenyu
960da9319d
Remove StrEnum in onnx for python 3.10 (#11345)
some training tests failed looks like parsing error?
2025-07-23 11:52:25 -04:00
George Hotz
108aac8af4
use AddrSpace instead of local (#11314)
* use AddrSpace instead of local

* addrspace in test
2025-07-21 14:00:06 -07:00
geohotstan
445ff8de56
ONNX onnx_parser and buffer_parse clean up (#11000)
* start

* remove onnx.load from compile4 and move np to dropout

* clean up and enable test

* clean up

* move WebGPU ONNX test into MacOS (WebGPU)

* leave test in ONNX (CPU)

* fix raw_data init None, and simplify onnx_runner test a little?

* THESE TESTS ARE SO UGLY UGHH

* need to really think about how to structure the test

* wow LLMs are quite something

* not always on disk now

* also add external data loading test

* cleaner tests

* minimize diff and add const folding tests

* add external data loading too

* whoops add webgpu back.. but why was it not needed in the first place?

* better comment

* move webgpu test to macos(webgpu)?

* llm english so much better than me wow

* trigger CI to check flakiness

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-07-21 15:10:25 -04:00
George Hotz
842184a1ab
rename kernelize to schedule, try 2 (#11305) 2025-07-21 11:18:36 -07:00
वेदांत
e368628736
Add amin support to Tensor operations in Torch backend (#11290)
* intiger div mod fix

* Revert "intiger div mod fix"

This reverts commit d5d2f201bf.

* feat arg_min support

* tets update

* test fix
2025-07-21 09:14:08 -04:00
nimlgen
cc3c1e4c14
hcq: move cpu to hcq (#11262)
* hcq: move cpu to hcq

* import time

* upd

* fix

* windows support

* hm

* cleaner

* fix timer

* fix timing

* std is ns

* skip profiler

* mypy

* cleaner

* cleanups

* after merge

* default is back
2025-07-21 15:10:38 +03:00
nimlgen
2f72be5055
nv_smi: init basic insmod/rmmod/reset cmds (#11282) 2025-07-19 15:43:03 +03:00
qazal
577e581943
fix typo in sqtt/readme (#11281) 2025-07-19 15:10:24 +03:00
geohotstan
536b254df4
Bump onnx to 1.18.0 (#11266)
* bump

* thou hast implement functions

* hacked in domain support

* some clean ups

* hack quantize_onnx_test too

* add helper lol, why onnx tests why

* better dispatcher, but need tests and better naming

* flaky ci

* change some names

* small clean ups

* make it easier to clean up tests once ORT supports 1.18.0

* nits

* fix bug of Softmax_1 being registered in onnx_ops

* need a default value

* resolve_const is better name

* fix OnnxRunner.to

* use proper domain names
2025-07-17 15:35:41 -04:00
chenyu
c8e5c4d7c3
insert_before -> insert_at [pr] (#11257)
more precise
2025-07-15 17:44:34 -04:00
chenyu
a0438012af
remove Kernel.get_program [pr] (#11203) 2025-07-12 20:50:29 -04:00
George Hotz
d67c8e7b42
local metal on metal in uop syntax (#11185)
* local metal on metal in uop syntax

* TODO: just put the axis_info in the kernelinfo

* local

* amd_matmul works @ 28 TFLOPS

* clean up matmul

* kernel8 works

* remove that

* locals

* axistype innovation

* work

* cleanup

* kernel3 regs

* cleanup kernel3

* work

* why is it broken

* no beam

* reenable

* permutes
2025-07-12 16:31:19 -07:00
geohotstan
5ce278b245
OnnxRunner file as input (#10789)
* file path as input and have parse be in OnnxRunner.__init__

* modelproto_to_onnxrunner -> modelproto_to_runner

* whoops, fix import

* oh flakiness again, is it because it's getting gc-ed?

* small changes

* CI flaky so just move compile4 fix in

* copy typing of onnx_load

* actually can just import onnx_load instead of onnx.load

* fix external_benchmark_openpilot

* fix onnx_runner test to use onnx_helper

* rerun CI

* try run_modelproto

* spam CI a few times

* revert run_modelproto since that's flaky also

* no external onnx_load usage except onnx.py

* cursor tab complete is evil. Snuck a darn sorted in. But does order change result? Why?

* model_benchmark 193s -> 80s, add OnnxRunner.to()...

* minimize diff and clean up

* device can be None, weird but eh

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-07-12 14:27:46 -04:00
chenyu
6283d50224
DEPRECATED_linearize -> to_program [pr] (#11198) 2025-07-12 13:46:20 -04:00
George Hotz
2893feb9f6
cleanups for kernel.py (#11143)
* cleanups for kernel.py

* fixups
2025-07-08 18:10:25 -07:00
chenyu
7ce9e45474
mypy onnx_parser (#11141) 2025-07-08 19:50:28 -04:00
chenyu
ffcc557986
lint onnx and onnx_parser (#11134) 2025-07-08 15:28:35 -04:00
qazal
3dfc0ff887
move cpu_profile and shared ProfileEvents from device.py to helpers [pr] (#11126)
* move cpu_profile and shared ProfileEvents to helpers [pr]

* TestProfiler.test_cpu_profile

* update test_viz.py

* TestProfiler.test_profile_multiops ordering, it's different streams now
2025-07-08 12:14:03 +03:00
nimlgen
71377cd233
nv: parse falcon app descs (#11118) 2025-07-07 18:14:14 +03:00
kevvz
b7af9cf849
clean svd tests, set full_matrices false in torch backend (#11113)
* clean tests, set full_matrices false

* add more shape asserts
2025-07-06 13:55:49 -04:00
chenyu
ba88ec3ad0
pipe linalg svd to torch (#11109)
and found a bug in svd
2025-07-06 08:37:25 -04:00
nimlgen
4dccb2ea49
am_smi: increase kill retries (#11099) 2025-07-05 16:23:50 +03:00
0xSG
17119b0f23
hip_ioctl: platform.machine added (#11084) 2025-07-04 17:20:24 +03:00
nimlgen
2d138c6cf1
am: factor out init_sw (#11070) 2025-07-03 11:01:17 +03:00
chenyu
425d5f55c4
generate kernel dataset and upload artifact (#11063) 2025-07-02 17:21:25 -04:00
chenyu
4626e9c172
is_numpy_ndarray helper [pr] (#11050) 2025-07-02 09:12:53 -04:00
chenyu
126fcf4129
clean up AMD_LLVM in tests (#11021) 2025-06-28 22:45:47 -04:00
chenyu
a6485d00c8
very tiny generate_dataset (#11013)
one minute to gen on my mac
2025-06-27 17:10:45 -04:00
George Hotz
be53ef4f0a
rename DEFINE_ACC -> DEFINE_REG (#11006)
* rename DEFINE_ACC -> DEFINE_REG

* add CMPEQ to groupops
2025-06-27 11:09:25 -07:00
George Hotz
b4eb876d5a
kernel.py no longer permutes reduce axis [pr] (#10968)
* kernel.py no longer permutes reduce axis [pr]

* delete tests that handcode uops

* regen of sops is broken...

* put import back

* just remove that

* disable those tests
2025-06-26 17:44:58 -07:00
George Hotz
856759c79c
add halide example (#10980)
* add halide example

* upd halide gemm

* partial works

* touchups
2025-06-26 16:14:57 -07:00
qazal
1127302c46
move perfetto to extra (#10994)
* move perfetto to extra

* update TestViz and fix tests

* remove perfetto.html from viz directory

* work

* mypy
2025-06-27 01:53:54 +03:00
qazal
712980e167
fix extract_dataset + add tests to CI (#10995)
* fix extract_dataset + tests

* add CI

* sops.gz itself is same as master

* yml + gzip -c + ge

* don't commit that

* bump limit to 1000

* axis=7

* test_tiny
2025-06-27 01:51:36 +03:00
geohotstan
50936b4a18
ONNX real float16 (#10694)
* squash commits

* temp fix for const tensor

* actually realizing float16 can only happen in raw_data

* .float -> cast(float) to rerun CI

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-06-26 14:05:12 -04:00
chenyu
49bba2f0a0
improve test_nll_loss (#10986)
build target and weight tensors outside so it tests backward too.
2025-06-26 02:46:55 -04:00
nimlgen
1c45b9f7fb
start nvpci (#10521)
* start nvpci

* talk to fsp

* boot args

* riscv core bootted

* q

* agen

* got gsp init msg

* some fixes

* set registry, stuck aft lockdown(

* start ga/ad port

* gsp init on ada

* more classes allocated

* more

* mm

* fixes and progress

* no huge pages for now

* mm seems workin, but switch to 512mb page for simplicity

* working state

* not cleaned

* claned

* nvd=1

* start gr ctx

* compute

* clean 1

* cleanup 2

* cleanup 3

* cleaner 4

* cleaner 6

* add iface to nv

* save before reboot

* merged into NV

* moveout mm

* post merge

* cleaner 7

* merge and rebase

* pciiface abstraction + reset

* download fw from web

* print logs

* minor changes + p2p

* cleaner 8

* cleaner 9

* cleaner 10

* delete

* delete this as well

* linter 1

* oops

* priv_client -> priv_root

* fix mypy

* mypy?

* mypy?

* small changes

* shorter

* ops

* remove this

* do not allocate paddr for reserve

* nodiff

* unified script

* ops

* dif ver

* add lock

* setup
2025-06-25 00:37:34 +03:00
chenyu
ffb032e31d
test_diagonal touchup (#10962) 2025-06-24 15:51:19 -04:00
Utkarsh Gill
7f9958b632
Fix torch.linalg.diagonal crash due to invalid shrink in to_movement_ops (#10945)
* fix as_strided shrink bug breaking torch.linalg.diagonal on tinygrad backend

* cleanup

* generic fix

* tests

* cmp with diagonal too

* oops

* move tests

* fix test

* remove unnecessary import

* fix assert

* compare against numpy

---------

Co-authored-by: Utkarsh Gill <engelbart@Utkarshs-MacBook-Pro.local>
2025-06-24 15:36:06 -04:00
chenyu
18e264a449
Tensor.logsigmoid (#10955) 2025-06-24 11:16:14 -04:00
George Hotz
e15754db28
remove (some) kernelize from llama and test schedule speed (#10939)
* remove kernelize from llama

* 405B

* space
2025-06-23 15:07:31 -07:00
alpharush
22f9696522
Fix/hcqfuzz harnesss bug (#10923)
* update command so extra module is found

* fix empty range in randrange errors

* lint
2025-06-23 11:22:30 +03:00
geohotstan
4ab7d792cc
ONNX improve dtype fallback (#10800)
* fix

* add early verbose demo test

* is this how to write tests :s

* is definition drift even a thing? gemini says it is

* clean up

* better

* even better

* try add to CI

* doesn't work quite yet

* much more work to be done

* whoops

* partition the test heh

* skipif

* some nits for better names

* add webgpu test for onnxrunner

* fix reference links

* flush for now
2025-06-21 19:29:45 -04:00
George Hotz
92678e59ee
move kernel to opt (#10899) 2025-06-20 15:22:28 -07:00
chenyu
3f29c7edda
minor onnx dropout cleanup (#10891)
we should consider removing numpy random and test it similar to test_randomness, unless how seed works is part of spec?
2025-06-20 10:18:34 -04:00
qazal
000eb30f04
viz: remove prev profiler file (#10888)
The new profiler is integrated in the main VIZ tab.

Will also delete perfetto.html after matching [final features](https://github.com/tinygrad/tinygrad/pull/10763#issuecomment-2980543715) soon.
2025-06-19 23:05:46 +03:00