Commit graph

1,363 commits

Author SHA1 Message Date
chenyu
687ade119e
IMAGE hand_coded_optimizations update (#16720) 2026-06-23 21:55:28 -04:00
chenyu
ce87d80911
better _drop_valid_stmts [pr] (#16719)
also dropped the unused is_increasing
2026-06-23 19:35:01 -04:00
Christopher Milan
116045cc8e
ci: remove tensorflow from testoptim (#16717) 2026-06-23 18:11:48 -04:00
Christopher Milan
da98fae9e1
ci: try parallelizing tc tests (#16710) 2026-06-22 20:43:32 -04:00
Christopher Milan
cbfcf36e44
ci: remove generate_dataset and CL misc (#16709) 2026-06-22 18:01:07 -04:00
Christopher Milan
b2199c54a3
ci: update actions/cache/restore to suppress warnings (#16680) 2026-06-19 18:27:52 -04:00
Christopher Milan
1822eed8d3
ci: only test models on cpu (#16678) 2026-06-19 18:16:59 -04:00
Christopher Milan
091ec8d10d
use tinygrad.llm in benchmarks (#16670) 2026-06-19 00:03:57 -04:00
qazal
b753fb5e4c
viz: view source working even if compile failed (#16657)
* failing test

* hard

* ret_dict

* switch to _data for tests too

* update sqtt

* start work

* Ops.LINEAR looks good

* baseline with depth works

* support depth

* types

* @needs_tracked_pm

* update, marg can error too

* unwrap_or goes to many more places

* move things to soft_err

* soft_err everywhere needed

* diff cleanup

* use list

* rewrite it

* change

* update depth number

* small comment change
2026-06-18 17:34:53 +09:00
Christopher Milan
e0fe6e542e
ci: fewer pydeps (#16654) 2026-06-17 22:52:14 -04:00
George Hotz
d631716858
remove const without STACK (#16639)
* remove const without STACK

* fix GEP rewrite

* fix null tests

* fix openpilot regression

* it's 10 in CI
2026-06-16 21:25:42 -07:00
Christopher Milan
bcdb988df0
split comma benchmark, dsp on c4 [PR] (#16598) 2026-06-12 23:26:05 -04:00
qazal
7d4a77dce4
relax comma benchmark timeout (#16568) 2026-06-11 18:03:37 +09:00
Christopher Milan
83971860d8
ci: simplify webgpu install (#16557) 2026-06-10 22:57:19 -04:00
Christopher Milan
6e1b61f16f
cleanup some amd deps (#16563)
don't load hsa runtime, remove ib autogen
2026-06-10 19:01:56 -04:00
George Hotz
fd76ac992e
cstyle renderer is new style [pr] (#16484)
* cstyle new style

* switch cstyle renderer to new style

* fix hip

* fixes

* fix webgpu

* correct webgpu is_packed

* fix dsp

* fixes

* fix Ops.RANGE must be CONST

* old style render access

* this is correct

* fix cstyle to good

* dl/dr

* as array

* fix spec

* remove define_local/define_reg

* buffer in shrink

* fix test_tiny

* all tests fix

* param args aren't realized

* wgsl fix

* work

* new gate

* fix opencl qcom

* process replay

* sort order

* fix render index
2026-06-09 18:36:01 -07:00
Christopher Milan
97d483350c
ci: download prebuilt ocelot (#16554) 2026-06-09 19:51:33 -04:00
Christopher Milan
857b1f5399
ci: more parallelism, less duplication (#16509) 2026-06-05 21:26:19 -04:00
Christopher Milan
9dac781e45
ci: use uv (#16492) 2026-06-03 21:38:50 -04:00
Christopher Milan
f43cba5765
ci: native python where possible (#16473)
linters stays at 3.11
2026-06-02 22:40:12 -04:00
George Hotz
ffadd7a315
remove intel and amx support (#16482) 2026-06-02 18:53:05 -07:00
Christopher Milan
9897658895
ci: fix ocelot compilation on macos (#16471) 2026-06-02 12:43:31 -04:00
Christopher Milan
c6cad1ad67
ci: standardize runs-on (#16466)
* ci: use macos 26

* ugh github

* stick with github for arm
2026-06-01 21:39:58 -04:00
Christopher Milan
b0ecbb34d9
ci: cleanup python backend tests (#16465) 2026-06-01 20:08:05 -04:00
Christopher Milan
2d0f132a3b
ci: cleanup more duplicate tests (#16462) 2026-06-01 18:56:29 -04:00
Christopher Milan
c377d01491
ci: run dsp on tinygrad[testing] (#16442) 2026-05-29 21:16:56 -04:00
Christopher Milan
d943493b79
ci: remove duplicate op compile test (#16441) 2026-05-29 19:20:31 -04:00
Christopher Milan
ef50a49693
ci: macos dev matrix (#16436) 2026-05-29 17:40:32 -04:00
Christopher Milan
6e0d5262dc
ci: autocancel outdated pr jobs (#16424) 2026-05-28 23:14:35 -04:00
Christopher Milan
69aa2054f6
rename clangjit to clang (#16423) 2026-05-28 22:41:58 -04:00
Christopher Milan
a909acb882
move llvmspeed to benchmarks (#16422) 2026-05-28 22:26:22 -04:00
Christopher Milan
7d38edffdb
ci: dev matrix (#16420)
windows just runs test_tiny
2026-05-28 22:04:04 -04:00
George Hotz
c87f3433d1
use namespace runners (#16387)
Co-authored-by: Christopher Milan <chrismilan@ucla.edu>
2026-05-28 18:05:46 -04:00
Christopher Milan
c8af163d2b
disable process replay by default (#16419)
enable process replay with [pr] and assert with [PR]
process replay no longer captures on master
2026-05-28 17:36:28 -04:00
Christopher Milan
aacc8addf4
ci: use ubuntu 24.04 (#16393) 2026-05-26 23:22:01 -04:00
George Hotz
322693dcd3 hotfix: bump Mac pytest timeout to 4 minutes (try 2) 2026-05-25 18:23:21 -07:00
George Hotz
942cb42b97 Revert "hotfix: bump Mac pytest timeout to 4 minutes"
This reverts commit 695a0069ed.
2026-05-25 17:25:11 -07:00
George Hotz
695a0069ed hotfix: bump Mac pytest timeout to 4 minutes 2026-05-25 17:20:19 -07:00
Christopher Milan
35461d4d8f
ci: cleanup some deps [pr] (#16340) 2026-05-22 19:16:08 -04:00
Christopher Milan
518e60534e
only load tinymesa_cpu when LVP is explicitly requested (#16320) 2026-05-21 19:03:13 -04:00
George Hotz
58d58c1659
remove DEVECTORIZE (#16290)
* remove DEVECTORIZE

* fully remove DEVECTORIZE
2026-05-20 13:25:49 -07:00
chenyu
7af7b6703a
relax policy ASSERT_MIN_STEP_TIME to 3.2 (#16273) 2026-05-19 22:29:09 -04:00
George Hotz
da7414d6dc
fix RUN_PICKLE and test it (#16272)
* add test for openpilot RUN_PICKLE

* fix RUN_PICKLE and test it
2026-05-19 17:00:25 -07:00
ttomsa
aa1e59ab97
X86 with Ops.INS (#14873)
* draft

* cleanup test_encodings

* cleanup test_isel

* model flag state and support rematerialization

* woops

* add vbroadcastss instruction

* don't fuse load if used multiple times in src

* add movabs instruction and fix idiv

* fixes

* add x86 backend to tests

* float16 fix

* rm TwoAddress2nd

* add BARRIER

* test windows ci

* yup isel fixes the mask stuff too and its beautiful

* add cmoves to the spec

* support storing imms

* no TUPLE_ORDER, breaks tests

* fix remaining seg faults

* add float max

* always fuse index

* minor

* fix DEFINE_VAR/SPECIAL and enable multithreading

* linter

* more linter

* more

* more

* more

* let's try this

* perhaps

* start new scheduler

* more scheduling info

* cleaner shuffle functions

* fixup isel tests

* skip bounds check when NOOPs exist

* skip inf rewrite tests

* fix const tag hack and add x86ops to _shape

* fix

* skip a few tests

* func arg order independent from op value

* x86 goes in own linearize

* switch to PARAM

* more

* add min x86op and neg in decomps

* do mulacc in isel

* use def_reg in test_encodings

* enable emulated int64 tests

* how much does this fix

* Ops becomes OpType

* fix

* rm noqa

* rm machine scheduler stuff

* and this

* allow for extending enums and move X86Ops out of uop

* fix imports

* rm X86GroupOp from ops.py

* spacing

* tell mypy to shut up

* more linter

* add x86op test

* allow set[X86Ops] in upat

* move NOOPs to pre_isel_matcher and rm NOOP from spec

* more asserts

* also this

* cleanup encode

* simplify live range

* fix idiv

* add Ops.INS to x86

* more changes

* more changes

* more changes

* fix

* fix

* fix

* fix

* print formatted assembly

* fix 8bit idiv?

* oops

* enable float16  and unaligned vector load/store

* actually no

* move x86 tests

* no more bool cast

* fix

* linter

* linter

* move X86Ops to x86.py

* fix vpbroadcast

* cleanups

* linter

* print correct reg names

* canonical max

* move max/min and add test

* support float16 vector load/store

* rm bad rewrite

* vpsrldq can't access memory

* regalloc takes renderer

* enable vector load/store on all dtypes

* more isel tests

* rm this for now

* a lot better

* fix

* fix

* fix

* deal with flags correctly

* fix

* enable gep noop rule

* fix

* fix

* fix

* add callee saved registers

* use Ops.CONST instead of X86Ops.IMM

* fix

* enable TUPLE_ORDER

* fix

* rm x86 code in linearizer

* fix

* fix

* fix

* move isa rewrites to codegen

* fix

* fix

* skip test_linearizer.py

* skip more tests

* fix

* fix for idiv/mod changes

* fix

* don't use fmadd if it duplicates fused op

* hacky

* fix

* cleanups

* cleanups

* fix

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
2026-05-19 12:42:54 -07:00
George Hotz
b2e8102209 25000 lines for x86 backend 2026-05-19 11:27:41 -07:00
Christopher Milan
7515824a6d
ci: actually use clang-20, enable bfloat16 (#16249) 2026-05-18 19:06:43 -04:00
Christopher Milan
891a1ae7c2
onnx: remove dtype_fallback (#15717) 2026-05-14 22:06:57 -04:00
Christopher Milan
9a365d9978
ci: fix null image tests (#16188) 2026-05-13 18:00:05 -04:00
Christopher Milan
f1fdd2ccec
ci: add IMAGE=1 compile-only tests (#16182)
* ci: add IMAGE=1 compile-only tests

* fix
2026-05-12 23:40:32 -04:00
Christopher Milan
7d0c5ab689
ci: ocelot needs nvcc on linux (#16178)
* ci: ocelot needs nvcc on linux

* cudart
2026-05-12 23:13:48 -04:00