Commit graph

6,637 commits

Author SHA1 Message Date
chenyu
18e159c9ac
comment about multi real and more tests [pr] (#7467) 2024-11-01 11:49:11 -04:00
chenyu
1f343aa40e
replace x.alu(BinaryOps.ADD, y) with add in multi [pr] (#7466) 2024-11-01 10:50:57 -04:00
geohotstan
6513690223
Add Tensor.hardsigmoid (#7433)
* move hardsigmoid to new branch

* add to test

* add NOTE to mention differing values for alpha and beta that match torch

* shift from relu6

* correct shift implementation

* or we just use relu? no more 666
2024-11-01 08:36:52 -04:00
George Hotz
fe78ed8cb7
improve match speed [pr] (#7465)
* improve match speed [pr]

* no sym in expand

* remove useless rule, sym back

* don't track that
2024-11-01 17:33:53 +08:00
George Hotz
a7ba3d2d91
move reduce to lowerer [pr] (#7462)
* move reduce to lowerer [pr]

* simpler
2024-11-01 16:39:20 +08:00
George Hotz
2cfca230b5
reduce collapse as a rule (#7464)
* reduce collapse as a rule

* better [pr]

* cleaner
2024-11-01 16:25:44 +08:00
George Hotz
4f6cf1f8cc
expand DEFINE_ACC [pr] (#7461) 2024-11-01 15:20:43 +08:00
qazal
d9f38f9518
group stores by UOp [pr] (#7460) 2024-11-01 15:09:16 +08:00
qazal
c1bd2d3f71
viz increment -1 kernel on enter [pr] (#7448)
* viz increment -1 kernel on enter [pr]

* two paths

* share
2024-11-01 14:14:54 +08:00
Tobias Fischer
1a9e145388
Tensor Clone Function (#7154)
* implemented clone function

* cleanup linting, single func

* added tests, cleaned up grad cloning

* fixed whitespace
2024-11-01 12:24:43 +08:00
chenyu
acd0fa1a7a
s/hasattr(self, '_buf')/self.is_allocated() [pr] (#7458)
use is_allocated helper in Buffer
2024-10-31 20:55:20 -04:00
chenyu
036409266d
clean up _prepare_jit_inputs [pr] (#7457)
removed an unnecessary cast and reordered a bit
2024-10-31 20:41:02 -04:00
chenyu
a21434504b
update payne_hanek_reduction [pr] (#7455) 2024-10-31 18:41:22 -04:00
chenyu
4f27862242
no more UPat._any [pr] (#7454) 2024-10-31 16:48:37 -04:00
chenyu
5777fca904
clean up cody_waite_reduction magic numbers (#7452) 2024-10-31 14:45:04 -04:00
chenyu
5648b9788e
more xlog2 cleanups (#7451)
following the notations in the paper closer
2024-10-31 13:52:31 -04:00
chenyu
4065c3dec8
remove special 0 case in frexp (#7450)
we can safely assume input is non-zero, also removed unneeded bitcast
2024-10-31 13:02:33 -04:00
chenyu
53db3478fe
cast to float32 for float16 xlog2 (#7447)
formula has 2X error with denormal floats
2024-10-31 10:36:29 -04:00
chenyu
5085b2fde7
cleanup xlog2 and remove unneeded functions (#7446)
denormal_map still looks wrong but a lot cleaner
2024-10-31 09:45:16 -04:00
chenyu
02636bc05e
simpler switch over in xsin (#7426) 2024-10-31 08:56:01 -04:00
qazal
c5a50465d1
big graph first [pr] (#7443)
* big graph first [pr]

* move things
2024-10-31 20:10:11 +08:00
qazal
38b1790575
move image dtype fixup [pr] (#7444)
* move image dtype fixup [pr]

* more work

* late dtype

* use base
2024-10-31 19:51:46 +08:00
George Hotz
f579693ec9 hotfix: casted nan/inf 2024-10-31 19:50:17 +08:00
George Hotz
a43b7a4b7c
less rewrite stages in matcher (#7445)
* less rewrite stages in matcher

* better name
2024-10-31 19:45:21 +08:00
George Hotz
5dd1ffd5d0
don't const rewrite in cstyle (#7442)
* don't const rewrite in cstyle

* Update cstyle.py

* simple_symbolic

* fix bfloat16 const on AMD
2024-10-31 19:16:49 +08:00
qazal
bdde795239
early filter sink buffers [pr] (#7440) 2024-10-31 18:50:36 +08:00
qazal
9905de3362
late append realizes [pr] (#7439)
* dont unbind in ops

* late append realizes [pr]

* Revert "dont unbind in ops"

This reverts commit e8d9da936d.

* delete ctx.realizes

* empty
2024-10-31 18:04:42 +08:00
George Hotz
50ddd11350
lil cleanup matchers [pr] (#7437)
* move delete_redundant_gates [pr]

* simpler uops test

* addr in delete_redundant_gates

* lines

* correct early delete gates

* shorter find_gate
2024-10-31 17:52:22 +08:00
qazal
a0bd385448
late uop_bufs [pr] (#7438) 2024-10-31 17:30:32 +08:00
qazal
7916d1f6ab
shorter UOps.BUFFER init [pr] (#7436) 2024-10-31 17:14:19 +08:00
George Hotz
2e3048fc57
Revert "improve full_graph_rewrite matchers for speed (#7431)" (#7434)
This reverts commit 996152d2de.
2024-10-31 16:16:47 +08:00
George Hotz
996152d2de
improve full_graph_rewrite matchers for speed (#7431)
* remove finalize [pr]

* early transcendental

* fix tests

* load store indexing runs with devectorize

* move delete_redundant_gates

* ptx has to wait for the mask to move
2024-10-31 16:13:11 +08:00
qazal
5f49651360
verify assign pre astfying [pr] (#7417) 2024-10-31 16:02:07 +08:00
George Hotz
17c9a9fde4
pm_render [pr] (#7430)
* pm_render [pr]

* test fixes

* use gep, not src

* ptx only symbolic, not sym

* move cast rules
2024-10-31 15:04:50 +08:00
George Hotz
8fff8fc3e7
replace REDUCE and clean up arange (#7429)
* break apart arange [pr]

* fix missing

* cleanups to add/mul

* UOps.VECTORIZE

* don't vectorize const
2024-10-31 14:02:20 +08:00
George Hotz
fe2bc4c613
clean up arange/indexing matchers [pr] (#7427)
* clean up arange/indexing matchers [pr]

* syntax for assign
2024-10-31 12:12:44 +08:00
George Hotz
e446e95974
enforce ctx is called ctx [pr] (#7424)
* enforce ctx is called ctx [pr]

* fix bug and use has_ctx

* inspect signature

* assert

* no slow asserts

* now we can support contextual reduce
2024-10-31 11:39:19 +08:00
chenyu
9b08bb4c3e
fold the +x term in sine inside sin_poly (#7425) 2024-10-30 23:13:08 -04:00
chenyu
0739895b4d
tiny clena up pow2if and payne_hanek_reduction (#7423) 2024-10-30 22:22:48 -04:00
chenyu
118dd7721f
clean up transcendental.rintk [pr] (#7422)
added unit tests and updated the comment. it's rounding away from 0 for negatives
2024-10-30 20:37:28 -04:00
chenyu
fb694a63eb
Tensor.erf (#7419)
the same one used in onnx and the one in bert.
2024-10-30 18:12:28 -04:00
qazal
e955aa1bee
hotfix: process replay (#7418) 2024-10-30 22:45:40 +02:00
qazal
4c0ee32ef2
delete metadata from schedule ctx [pr] (#7415) 2024-10-31 01:49:49 +08:00
George Hotz
b4410545d8 hotfix: INDEX is yellow-green 2024-10-31 01:42:54 +08:00
qazal
d81e07e4fc
compare schedule len against group count [pr] (#7414) 2024-10-31 01:42:10 +08:00
qazal
1a2ee37dd3
hotfix: remove redundant test_schedules [pr] (#7412) 2024-10-31 01:10:31 +08:00
George Hotz
7039fba406
move indexing first (#7409)
* move indexing first [pr]

* no create gate

* fix create_gate

* fix load/store folding

* fix index folding

* remove comment, no process replay
2024-10-31 00:50:35 +08:00
George Hotz
133fe81cc5
Revert "Revert "move up migrate + new gated fold (#7403)" (#7406)" (#7407)
* Revert "Revert "move up migrate + new gated fold (#7403)" (#7406)"

This reverts commit ea5654a9bc.

* test padded in emulation too

* bring back early folding
2024-10-30 23:25:45 +08:00
chenyu
ea5654a9bc
Revert "move up migrate + new gated fold (#7403)" (#7406)
This reverts commit adccfade7f.
2024-10-30 23:02:18 +08:00
George Hotz
adccfade7f
move up migrate + new gated fold (#7403)
* move up migrate + new gated fold [pr]

* vcount for const ptr

* move those rules there

* fix openpilot
2024-10-30 22:14:01 +08:00