* basic tests
* cleanup
* pylint
* ruff
* use define acc as a proxy for rendered reductions
* use define acc as a proxy for rendered reductions
* recursive reduceop rendering via ast_parse
* linters + cleanup
* fixing late buf loading
* plus linters
* removing extra line
* linters
* does this break ci?
* added tests and if add end change
* typo in add_ends
* linters
* removing comments
* allow endifs to be inserted before the end of the graph
* find add ENDIF before next BARRIER
* removing tests with manual ENDIF + linters
* specifically the next barrier aftr the store of the local result
* Revert "specifically the next barrier aftr the store of the local result"
This reverts commit b288a5c3ce.
* keeping up to date
* linters + merge changes
* cleaning up old bad decisions
* linters and opts
* mrged linearizer tests
* fixing merge issues
* removing the big ugly uop test (functionality tested end-to-end by test_linearizer additions
* small diff fixes
* updating linearizer to work without uops.add( ... cachable)
* linters
* comment in multireduce tests
* skipping tests without locals
* full tests
* linters
* load_cache[key] fix for multiple accs
* linters
* assert only one reduceop
* fix loop_scope test to actually cause an issue
* self.load_cache[key] key for DEFINE_ACC changed to use a string to make sure each acc is unique
* updated tests
* fixing merge
* removing debug prints
* complete merge fix
* linters
* diff cleanup
* adding tests in
* give each reduce it's own local buffer
* gpu=1 changes
* store and load locals with upcasting
* modifying test?
* make multireduce_netsted_local_upcast test match single reduce shapes
* removing todo
* cleaning up the diff
* unroll test
* unroll and upcast tests
* fix gpu
* seq and self.load_cache[key] cleaning
* linters
* padto works
* merge fixes
* fixes
* add skips for amd
* linters + seq
* cleaning & more tests
* softmax tests
* linters
* [run_process_replay]
* add new tests back
This reverts commit 19dec22e01.
* more hardcoded -1s
* fix ptx
* Fix name for loop in ptx
* cleaning up the diff
* cleaning up the uops diff
* nv ci is too slow
---------
Co-authored-by: qazal <qazal.software@gmail.com>
Co-authored-by: Szymon Ożóg <58388001+SzymonOzog@users.noreply.github.com>
Co-authored-by: qazal <77887910+Qazalin@users.noreply.github.com>