mirrors/tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-06-24 02:14:17 +00:00

Author	SHA1	Message	Date
George Hotz	7ae02dea19	Merge branch 'master' into moveleftright	2025-08-04 19:10:27 -07:00
George Hotz	7f6acfb0d5	give define global and friends a shape (#11502 ) * give define global and friends a shape * ignore negative size * ptx fix	2025-08-04 19:09:39 -07:00
George Hotz	0e91b6fd30	bugfixes	2025-08-04 18:46:32 -07:00
George Hotz	823dfbde70	move view to swizzle	2025-08-04 18:27:27 -07:00
chenyu	83385e7abc	update gradient src in ramp.py (#11499 ) that's simplified now	2025-08-04 18:58:03 -04:00
qazal	846a2826ab	viz: remove TracingKey.fmt (#11482 ) * viz: remove TracingKey.fmt * remove from test too	2025-08-05 00:00:03 +03:00
chenyu	01d44e8f16	tiny reduce_gradient cleanup [pr] (#11498 )	2025-08-04 16:12:53 -04:00
chenyu	8a11af01ed	remove broken paperswithcode links in doc (#11497 )	2025-08-04 13:12:33 -04:00
leopf	4f0ee4e982	BPE tokenizer (#11415 ) * BPE works * refactor tok * oops * basic tests * fix eval * smaller diff * fix error * proper vocab decoding * use regex for splitting * escape ucatrange * full compat --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>	2025-08-04 09:52:38 -07:00
b1tg	06af9f9236	fix double exception + add name,loc in error msg (#11487 ) Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-08-04 13:41:23 +03:00
nimlgen	4877aa965a	ast seems to probe nv as well (#11494 )	2025-08-04 11:47:07 +03:00
chenyu	e0106b6b25	1/(xc) -> (1/c)(1/x) (#11491 ) example: 2(2a).reciprocal() -> a.reciprocal() # TODO: bounds for reciprocal # TODO: should z3 work?	2025-08-03 23:35:46 -04:00
qazal	5870352fe1	viz: factorize llvm-mca call (#11490 )	2025-08-04 00:31:23 +03:00
chenyu	dbc7807c61	enable WEBGPU tests with buffer limit (#11489 ) TestSample still fails?	2025-08-03 13:02:44 -07:00
nimlgen	8f374ee1f7	nv: print devfmr in gsp logs (#11484 )	2025-08-03 15:12:53 +03:00
chenyu	823f1a01db	move cast around expand backward to tensor.py (#11483 )	2025-08-02 23:03:54 -04:00
chenyu	0ce0f51010	generic double cast folding (#11481 ) b.cast(a).cast(b) -> b if a preserves all values in b	2025-08-02 19:26:37 -04:00
qazal	72e0d1d0dc	viz: profile the compiler in TINY device (#11457 ) * viz: profile the compiler in TINY device * leanup	2025-08-03 02:03:20 +03:00
chenyu	66be747908	few more dtype cast convinience methods (#11480 )	2025-08-02 15:47:09 -04:00
chenyu	e22e5da9a5	move some test_dtype tests to unit (#11479 )	2025-08-02 15:25:00 -04:00
nimlgen	da0b955be4	hcq: cpu can be graphed (#11474 ) * hcq: cpu can be graphed * ops * new jit decisions * fix test * fix remote * cleaner * fix	2025-08-02 21:01:19 +03:00
chenyu	f7965f85aa	Revert "feat: faster index building (#11462 )" (#11478 ) This reverts commit `3a4deb08d2`.	2025-08-02 12:50:48 -04:00
kevvz	ef7e01cadf	Fix SVD shape bug + Fix batched SVD bug (#11477 ) * failing test case * fix * better test * space	2025-08-02 09:47:41 -07:00
b1tg	6ecaf8e7b2	refactor: use less index and simplify reduce axes check [pr] (#11476 ) * use output_shape/full_shape * simple final_reduces check --------- Co-authored-by: b1tg <b1tg@users.noreply.github.com>	2025-08-02 09:44:51 -07:00
wozeparrot	3a4deb08d2	feat: faster index building (#11462 ) * feat: faster index building * feat: correct training samples	2025-08-02 11:50:18 -04:00
nimlgen	8cc2d64edb	amd: reuse create_queues for usb iface (#11473 )	2025-08-02 14:40:46 +03:00
chenyu	9e8e6b45ab	grad acc train llama (#11467 ) * grad acc train llama * log step time	2025-08-01 15:54:50 -04:00
chenyu	7ad7329257	data parallel train llama (#11466 )	2025-08-01 12:13:51 -04:00
nimlgen	9f2182f92f	cpu: start threading (#11324 ) * cpu: threading * syncs * llvm * fix * opt * fx * fix * missed sync * one line less * cleaner * fix	2025-08-01 15:35:07 +03:00
qazal	c7ae1bd474	viz: more consistent border styling (#11464 )	2025-08-01 09:31:06 +03:00
George Hotz	8ff03806e8	add llama layers (#11460 ) * add llama layers * add contig bw for speed	2025-07-31 16:28:04 -07:00
qazal	719827b95d	viz: add flops / mem bw to device programs (#11459 ) * viz: add flops / mem bw to device programs * better spacing style	2025-08-01 02:12:30 +03:00
chenyu	3f742a5a7c	comma space lab models benchmark (#11461 )	2025-07-31 19:06:18 -04:00
George Hotz	474ee9daa5	hotfix: add contiguous_backward to llama	2025-07-31 15:07:12 -07:00
qazal	fa66d9772d	viz: show const node when it's root (#11456 )	2025-08-01 01:01:58 +03:00
qazal	056dabda5a	viz: refactor to color scheme (#11455 )	2025-08-01 00:17:50 +03:00
nimlgen	e5b6149dfb	more typing in drivers (#11454 ) * more typing in drivers * rm	2025-07-31 23:26:33 +03:00
qazal	bad3cf5731	viz: add LLVM machine code analysis (#11421 ) * start * works everywhere * add viz api * utilization table * reg pressure ui * use llvm-mca * llvm-mca ui * work * cleanup * cycle through, defaults are enough * x86 pending * x86 nops * get mcpu/mtriple from autogen * cleanup server diff * move parser to python * normalize to pct of max * segments legend * imports * also monospace * max comes from the total per instruction * base on the value	2025-08-01 01:59:26 +08:00
chenyu	e847677e8a	use AxisType in search instead of colors (#11452 )	2025-07-31 13:07:33 -04:00
nimlgen	75c2c42def	suppress exceptions only during finalization (#11451 ) * suppress exceptions only during finalization * fix * fix typing * fix more warns * fix * better? * Revert "better?" This reverts commit `a068aa5793`. * mm? * no as e	2025-07-31 13:57:12 +03:00
wozeparrot	24dd0d52ed	feat: test remove to cpu (#11444 )	2025-07-30 20:18:56 -07:00
kevvz	c3cfcb50cb	Add linalg_det and test for torch backend (#11405 ) * add linalg_det and test * space --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-07-30 22:04:44 -04:00
Eitan Turok	cba3655de5	Add Test for Setitem (#10559 ) * init * update * better * failing test * works * Delete test file * clean * lint * simplify variable name * rm contigious, rm int dtype, and add assertEqual --------- Co-authored-by: chenyu <chenyu@fastmail.com>	2025-07-30 22:03:41 -04:00
wozeparrot	6252f7770e	feat: fake data (#11447 )	2025-07-30 17:18:20 -07:00
chenyu	e300451f3a	update llama3 (#11446 ) `LR=1e-4 TRAIN_ON_VAL=1 DEFAULT_FLOAT=bfloat16 FUSE_ARANGE=1 JITBEAM=2 OPTIM_DTYPE=bfloat16 LLAMA3_SIZE=1B WARMUP_STEPS=36 DECAY_STEPS=360 SEQLEN=512 PYTHONPATH=. AMD=1 AMD_LLVM=0 MODEL=llama3 python3 examples/mlperf/model_train.py` trained to 7	2025-07-30 19:34:21 -04:00
wozeparrot	5fb975351a	feat: flag for training on val (#11441 )	2025-07-30 14:29:45 -07:00
chenyu	4ca430e5bf	fix search dedup (#11439 ) it should check against pre real_axis axis in actions, not real_axis.	2025-07-30 17:24:16 -04:00
wozeparrot	d3da20eca6	feat: bump mlperf workflow timeout to 6 hours (#11440 )	2025-07-30 14:12:12 -07:00
wozeparrot	825b6a2505	feat: llama3 dataloader (#11340 )	2025-07-30 13:27:55 -07:00
qazal	af357b5dc8	disable TRACK_MATCH_STATS in BEAM workers [pr] (#11437 )	2025-07-30 23:22:08 +03:00

1 2 3 4 5 ...

9,679 commits