Commit graph

1,221 commits

Author SHA1 Message Date
chenyu
af0392efea
only set DiskDevice.size if it opens successfully (#13962) 2026-01-01 19:33:26 -05:00
chenyu
e036d6df89
properly fix DiskDevice reuse (#13961) 2026-01-01 18:08:23 -05:00
chenyu
cb7c76a3bd
update test_fuzz_failure to not contruct full UOp (#13960) 2026-01-01 15:09:58 -05:00
chenyu
8e416df438
simpler InvalidType [pr] (#13957)
simpler singleton pattern
2026-01-01 13:55:51 -05:00
chenyu
4d5c4d256d
update tqdm for edge case (#13956)
1.00kit/s and not 1000it/s for value 999.5
2026-01-01 11:37:26 -05:00
chenyu
b91b46091c
delete test_tensor_uop (#13951)
old test for shape tracker. also update tests that refer shapetracker

names
2026-01-01 09:25:05 -05:00
chenyu
17ef4af72c
new ceildiv that fixed symbolic conv (#13944)
* new ceildiv that fixed symbolic conv

* smaller test case
2026-01-01 09:02:41 -05:00
haofei
526fd4ec71
Fix SVD rank‑1 Jacobi rotation when tau == 0 (#13945) 2026-01-01 00:30:18 -05:00
haofei
20777f30b9
Fix QR/SVD NaNs on zero/orthogonal inputs (#13943) 2025-12-31 23:40:09 -05:00
chenyu
52acadc160
consolidate IGNORE_OOB=0 tests (#13937)
add a new unit test file and add more cases
2025-12-31 15:24:20 -05:00
chenyu
0a98fd38b3
fix tests that failed locally on mac (#13872)
keccak output was silently broken without contiguous
2025-12-29 11:23:38 -05:00
Clément Verrier
0e409ff5ce
fix indentation in UOp pretty_print for repeated references (#13857)
* fix correct indentation in UOp pretty_print for repeated references

When a UOp was referenced multiple times, the walrus operator notation
(e.g., x0:=) was correctly used for the first occurrence, but subsequent
references had misaligned indentation due to an extra space character.

Fix indentation misalignment in pretty_print() when UOps are referenced
multiple times.

* add simple unit tests for UOp repr

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2025-12-29 10:46:16 -05:00
anu
9b4de8abc7
fix beam in python 3.14+ (#13836)
* fix beam search on python 3.14

* add PickleableCount class to helpers

* change name, add test, add step

* tidy count init
2025-12-27 16:24:22 -05:00
chenyu
54af29dbdb
trange can just be a function (#13827) 2025-12-24 23:57:10 -05:00
George Hotz
43c6e973d8
add optional compiler in Renderer (#13817)
* add optional compiler in Renderer [pr]

* fix

* late init

* remove precompiled

* cleanup
2025-12-23 17:58:46 -05:00
George Hotz
6439a515be
test fixups / speedups / var_vals refactor (#13812)
* no PYTHONPATH + llm server port 0

* llm tok speedup

* refactor var_vals
2025-12-23 12:05:59 -05:00
George Hotz
8dcba2e2cc
no full_rewrite [pr] (#13809)
* no full_rewrite [pr]

* fix

* fix docs
2025-12-22 23:20:01 -05:00
George Hotz
df0f9d6860
add olmoe support to llm (#13792)
* add olmoe support to llm

* cleanups

* simpler

* clean

* fix mypy

* lil

* remove dumb assert
2025-12-22 10:41:35 -04:00
chenyu
5cb827f7bf
clean up can_lossless_cast and add missing pairs [p] (#13793) 2025-12-21 12:18:33 -05:00
George Hotz
75a6a03664
add qwen3 moe support to tinygrad.apps.llm (#13775)
* qwen moe works

* simple moe

* one test

* integration
2025-12-21 12:36:02 -04:00
chenyu
733ef0452c
update test_uop_resolve (#13777)
plain @unittest.expectedFailure is too broad
2025-12-20 12:40:59 -05:00
chenyu
185a000882
gradient of COPY (#13760) 2025-12-19 13:33:59 -05:00
George Hotz
aeb7516c8a
tests passing on tinybox h3 (#13742) 2025-12-17 19:04:34 -04:00
George Hotz
b013244c38
fix local tests for AMD_LLVM (#13738)
* fix local tests for AMD_LLVM

* fix linters

* skip that for now

* fix segfault
2025-12-17 12:23:46 -04:00
George Hotz
3dbde178c1
mark slow tests as slow instead of as CI (#13736)
* mark slow tests as slow instead of as CI

* CI shouldn't have different behavior

* more skips / CI

* slow
2025-12-17 10:29:57 -04:00
George Hotz
9015a22523
make tests faster (#13734) 2025-12-17 09:39:44 -04:00
George Hotz
cf0c28d5ae
all tests pass on strix halo (#13728) 2025-12-16 19:35:50 -04:00
George Hotz
321ab943b2
qwen model is working (#13690)
* qwen model is mostly working

* add Q4_K quantization support to GGUF parser, add qwen3:1.7b model

- Add Q4_K (type 12) dequantization in nn/state.py
- Add qwen3:1.7b model using Q4_K_M quantization (smaller than Q8_0)
- Make bos_token_id optional for models like Qwen3 that don't have it
- Fix line length issues and add preset parameter to SimpleTokenizer

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* smaller diff

* test dequant

* half split

* better

* simple tok

* mock token

* polish

* better

* fix

* replace

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 18:00:34 -04:00
George Hotz
a657a4e0f4
add Q4_K GGUF quantization support (#13700)
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-15 10:17:56 -05:00
George Hotz
572ca80046
fast tinygrad.apps.llm (#13685)
* llm: add --benchmark support

* fix speed

* debug logging

* fix test attention
2025-12-14 21:05:21 -05:00
chenyu
ed962786d6
use assign in Tensor.backward (#13674)
preserve the grad object so that jit works
2025-12-13 22:43:06 -05:00
George Hotz
55845f7de7
schedule: cache unbinds for consistent cache keys (#13664)
* schedule: cache unbinds for consistent cache keys

strip BIND values before computing cache key so different bound values
(e.g. KV cache positions) hit the same schedule cache entry.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* spec: allow single-src BIND for schedule cache key normalization

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs: add lessons learned to CLAUDE.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* more claude.md

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 17:27:42 -05:00
George Hotz
8c87a0bf8d Revert "schedule: cache unbinds for consistent cache keys (#13662)"
This reverts commit af86cae10c.
2025-12-12 16:49:50 -05:00
George Hotz
af86cae10c
schedule: cache unbinds for consistent cache keys (#13662)
* schedule: cache unbinds for consistent cache keys

different bound variable values (e.g. kv cache positions) now produce
the same schedule cache key by unbinding BIND(DEFINE_VAR, CONST) before
computing the cache key and rebinding after lookup.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* schedule: cache unbinds for consistent cache keys

When scheduling, BIND(DEFINE_VAR, CONST) nodes are now unbound to
tagged DEFINE_VARs before computing the cache key. This ensures that
the same computation with different bound values (e.g., different
KV cache positions in LLM) gets the same cache key and reuses the
cached schedule.

The fix:
- pm_pre_sched_cache: replaces BIND with tagged DEFINE_VAR
- pm_post_sched_cache: restores tagged DEFINE_VAR back to original BIND
- pm_remove_rangeify_tags: excludes DEFINE_VAR to preserve tags through rangeify
- var_vals extracted from BINDs before cache key computation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* schedule: fix BIND handling and add CLAUDE.md

- Handle BIND to RANGE in create_schedule (not matched by CONST pattern)
- Assert all BINDs on same variable have same value
- Add CLAUDE.md codebase guide

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 16:40:10 -05:00
George Hotz
316da9f7ff
llm: add created/model fields, non-streaming support, and tests (#13660)
* llm: add created/model fields, non-streaming support, and tests

- Add `created` timestamp and `model` fields to response (required by OpenAI spec)
- Add non-streaming mode support for /v1/chat/completions
- Add `send_data` helper to HTTPRequestHandler for responses with Content-Length
- Refactor viz/serve.py to use send_data
- Add integration tests using real OpenAI client

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* add openai to testing

* toml

* Remove 'openai' from dependencies

Removed 'openai' from the dependencies list.

* bump cache

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-12 14:50:36 -05:00
Christopher Milan
94d7646bdc
fix anonymous struct fields (#13610) 2025-12-07 12:56:38 -05:00
nimlgen
ac5f1e115d
autogen: repro for the bug (#13607)
* autogen: repro for the test

* mute
2025-12-07 15:51:03 +03:00
George Hotz
c5bd28e21d
start work on schedule cache (#13529)
* start work on schedule cache

* local unique

* schedule cache works

* schedule cache cleanup

* fix tests

* preserve metadata

* oops, fix cache

* put that there

* fix spec

* always miss

* why is that broken?

* src[0].op

* fix process replay

* delete abstractions2

* reenable the actual schedule cache

* metadata is best effort

* fix JIT in examples/gradaccum_mnist.py

* full jit

* fixed and test is real
2025-12-04 17:24:49 -08:00
ayanhan
edf929ec9d
fix: add __delitem__ to Tensor with proper TypeError (#13561) 2025-12-04 00:53:08 -08:00
Christopher Milan
0a54434b15
mitigate ctypes c_bool bitfield bug (#13558)
* mitigate ctypes c_bool bitfield bug

* don't delete old test
2025-12-03 20:46:04 -05:00
chenyu
22777a89ea
minor test_uop_symbolic updates (#13551) 2025-12-03 13:17:44 -05:00
chenyu
a205f98ef4
tighter bound for MOD (#13550) 2025-12-03 11:24:29 -05:00
nimlgen
549f3287a8
fix caching for fetch (#13544) 2025-12-03 14:34:14 +03:00
George Hotz
6bd355fa26
add needs_second_gpu decorator (#13543)
* add needs_second_gpu decorator

* more skips

* two more fixes
2025-12-02 19:08:23 -08:00
Roelof van Dijk
c158e3c988
add cifar gated uop_given_valid regression test (#13536) 2025-12-02 16:02:47 -05:00
nimlgen
77a76d1b13
device: respect compiler ContextVars (#13523)
* device: envvars for cc

* fix

* fix

* x

* um

* fix

* remote

* em

* cleanup

* typing

* fix

* debug

* lvp?

* ugh

* singl

* rm

* lol

* fix

* ?

* this?

* why?

* rev

* mod test

* l
2025-12-02 14:42:04 +03:00
George Hotz
c38b7684dc
improve microbenchmarks (#13492)
* improve microbenchmarks

* bugfix + ubench

* lil

* no src in const method
2025-11-29 10:15:22 -08:00
qazal
72ef533d9c
tracing: use u32 for buffer args encoding (#13472) 2025-11-28 00:19:51 +08:00
George Hotz
e4cd649ff0
remove kernelize to prepare for refactors (#13463)
* remove kernelize to prepare for refactors

* less kernelize

* last test
2025-11-26 14:18:50 -08:00
qazal
7238df7a94
viz: cleanup sort_fn (#13454) 2025-11-26 04:10:10 +08:00