Eli Frigo
801564f31b
Remove POW llop and add SQRT llop ( #1104 )
...
* fixed division by zero for fast operations
* made et closer to 0
* replace POW llop with SQRT
* updated mlops to swap SQRT and POW llops
* updated hlops to swap POW and SQRT
* added sqrt llop to cpu runtime
* added sqrt llop to cstyle codegen
* added POW llop to llvm ir codegen
* added SQRT llop to torch runtime
* moved pow from mlops to hlops
* found a better way to do reverse pow
* fixed indentation
* added SQRT llop to triton
* update docs to match new llops
* removed POW operator from assembly codegen
* added sqrt and rsqrt to pow hlop
* rewrote pow function in tensor.py
* Adjust tolerance
* Adjust for adamw
* Reduce for Adam too
* removed accidental leftover code
* removed all of accidental code
* added rsqrt test
* removed pow from mlops again
it was added back when resolving merge conflicts
---------
Co-authored-by: Jacky Lee <jla524@sfu.ca>
2023-07-05 18:07:58 -07:00
cloud11665
b7369ffcff
add ptx formatter + syntax highlighter ( #1128 )
2023-07-05 17:56:09 -07:00
Reza Rezvan
d1356cac27
Fix: Jacobian tests [WIP] ( #1126 )
...
* Fix: Jacobian tests; num_jacobian either bugged or not accurate enough;
* Fix: Jacobian tests;
* Fix: Gradcheck;
2023-07-05 15:36:22 -07:00
nimlgen
d363d25ee2
fix imports for examples/transformer.py ( #1136 )
2023-07-05 08:15:13 -07:00
Mehmet Kuzucu
c3173ff281
Add return statement to the train function ( #1135 )
...
add a return statement to the train function in order to provide access to the losses and accuracies lists
2023-07-05 08:13:38 -07:00
wozeparrot
981d4980c4
feat: reword contributing ( #1131 )
2023-07-04 22:17:47 -07:00
George Hotz
793a670187
from tensor cores + lb touchup ( #1127 )
2023-07-04 15:45:20 -07:00
George Hotz
2f968f8547
ignore cloudpickle type for local mypy
2023-07-04 13:51:20 -07:00
George Hotz
87d21ea979
examples: simple conv bn
2023-07-04 13:50:26 -07:00
Reza Rezvan
535224ac20
Remove float64 ( #1101 )
...
* Refactor: Remove float64
* Refactor: Remove unused imports
* Refactor: Remove float64
* Refactor: Remove float64
* Refactor: Exclude float64 onnx backend
* Add: Skip jacobian and gradcheck tests;
2023-07-04 08:40:51 -07:00
Daniel Hipke
b4ce23e4b8
Make cross_process use cloudpickle ( #1118 )
...
* fix syntax issues in imagenet_download.py
* use cloudpickle in cross_process to make it work in Python 3.9+
* add cross_process test
* prevent unpickling on every function call
* add cloudpickle to setup.py
* add support for args/kwargs
2023-07-04 00:47:34 -07:00
George Hotz
c709dec8b5
gelu: weird test was broken for metal
2023-07-04 00:43:54 -07:00
George Hotz
daf8e1942f
sigmoid: test large postive also and add note
2023-07-04 00:18:31 -07:00
Kunwar Raj Singh
9e6067378f
Broken Sigmoid backward: Add test and mlop for Sigmoid ( #1113 )
...
* Add failing sigmoid test
* update more tests
* add mlop for sigmoid
* add back test
* math.log(math.e) = 1
* remove divides
---------
Co-authored-by: Kunwar Raj Singh <kunwar31@pop-os.localdomain>
2023-07-04 00:14:22 -07:00
Daniel Hipke
d58a9603ab
Create COCO data directory if it doesn't exist. ( #1114 )
...
* Create COCO data directory if it doesn't exist.
* update paths to support windows
2023-07-03 18:15:53 -07:00
Anselm Coogan
a22aad7d32
Use generators instead of lists in anys and alls ( #1111 )
...
* Use generators in any(..) instead of lists for better best-case
* Use generators in all(...) instead of lists
* enable R1729 in .pylintrc
* revert import sorting
---------
Co-authored-by: Anselm Coogan <anselm@scandit.com>
2023-07-03 16:06:06 -07:00
tricky-labyrinth
fd98f6cffa
Small fix to abstractions.py so it runs on Windows without throwing an AttributeError ( #1109 )
...
Co-authored-by: Tricky Labyrinth <trickylabyrinth@gmail.com>
2023-07-03 13:44:49 -07:00
Mike Ovyan
651d080594
[perf] Replace more list comprehension with * ( #1106 )
...
* [perf] Replace more list comprehension with *
* comeback
* final fix?
* blind me
* kill me
* ?
* rev
* [none]
2023-07-03 10:49:23 -07:00
Frank Pinnola
2071e53da8
Handle broadcast flag on gemm ( #1103 )
2023-07-02 22:15:07 -07:00
Taras Tsugrii
cbb5c655e5
[tensor][perf] Replace list comprehension with *. ( #1102 )
...
It's more concise, idiomatic and faster:
```
In [8]: %timeit [1 for _ in range(100)]
2.12 µs ± 26.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
In [9]: %timeit [1] * 100
515 ns ± 5.23 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
```
2023-07-02 18:34:23 -07:00
David Hou
363fbfc2e4
do not emit loop end code for global+local loops in assembly kernel ( #1100 )
2023-07-02 18:33:57 -07:00
Reza Rezvan
8ae9a054ae
Refactor nn.optim ( #1091 )
...
* Refactor: nn.optim.py
* Refactor: nn.optim.py; Fix all tests
* Refactor: Replace all optim.get_parameters()
* Refactor: Revert list comp.
* Refactor: Replace optim.get_state_dict
* Refactor: Change quickstart.md
2023-07-02 15:07:30 -07:00
Eli Frigo
10f1aeb144
fixed broken link ( #1097 )
2023-07-02 15:06:59 -07:00
Rob Grossman
c8ddc34368
include missing queue in thneed load ( #1095 )
2023-07-02 12:33:59 -07:00
nmarwell26
12ce68c1ee
Renamed examples/yolo to examples/vgg7_helpers because that directory contains no yolo-related code and only helper code for vgg7. This was confusing to a new user when trying to understand the examples. ( #1086 )
2023-07-01 12:04:28 -07:00
Rob Grossman
2533a992e7
remove unused imports in models ( #1088 )
2023-07-01 12:04:19 -07:00
geohotstan
575f75f613
hello ( #1084 )
2023-07-01 01:29:35 -07:00
foreign-sub
574cbda979
Quickstart ( #1015 )
...
* fix quickstart md
* add quickstart to ci
2023-06-29 13:26:58 -07:00
Roelof van Dijk
542b2d93a5
Perf/cache string ops ( #1078 )
...
* perf: remove extra function, include in cached getitem
* perf: only calculate hash once per node
---------
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-06-29 13:23:11 -07:00
George Hotz
e234bf2298
hip matmul : add K support
2023-06-28 19:54:33 +00:00
George Hotz
0e93b9642a
hip matmul
2023-06-28 19:21:01 +00:00
Jacky Lee
754e54ebb9
Fix Tensor ceil and floor for whole numbers ( #1071 )
...
* Works on non-special numbers
* Test different cases
2023-06-27 23:22:17 -07:00
George Hotz
1f5d45ca8c
imagenet loader minor cleanups
2023-06-28 05:08:09 +00:00
George Hotz
6ec0a24706
imagenet eval in 1 min 28 sec
2023-06-28 04:23:26 +00:00
George Hotz
9fabdbd054
speed ( #1070 )
2023-06-27 20:28:57 -07:00
George Hotz
d16c16ec28
new upcast works ( #1066 )
...
* new upcast works
* float4 try
* fix unaligned float4
* disallow unaligned access
* upcast dim
* maybe good now
* fix gpu half
* vstore_half4
* fix deep image bugs
* improve symbolic to fix issues
* fix symbolic
* cl test
* this maybe
* gcd of 1 is 1
* real fix for old python
* improve fuzzer
2023-06-27 19:34:53 -07:00
ernie
4d703be6d7
fix typo ( #1065 )
2023-06-27 10:56:54 -07:00
George Hotz
70c07dfea5
5k line max ( #1064 )
2023-06-27 10:53:18 -07:00
George Hotz
c8d87eb8d4
strip whitespace
2023-06-27 10:11:43 -07:00
Rayan Hatout
23648538fa
fix folding of float4 add/mul ( #1060 )
2023-06-26 20:59:29 -07:00
George Hotz
a98e361da0
torch speed test, add add
2023-06-26 18:55:27 -07:00
George Hotz
3e33befc1d
realize hotspots ( #1059 )
...
* realize hotspots
* no str check
* minor changes
* make this an assert
* faster and more readable
* nicer self.buffers
* tests for weak op + LAZYCACHE=0
2023-06-26 18:31:18 -07:00
George Hotz
2977fb17f6
various touchups ( #1058 )
...
* op isn't optional
* barrier + named local buffers
* end global and local loop together to avoid useless if statement
* better comments
2023-06-26 15:41:23 -07:00
George Hotz
f265e8523a
movement ops aren't really ops ( #1056 )
2023-06-26 15:01:28 -07:00
Rayan Hatout
65cbaa3429
no need to slice A and B twice in LLaMa complex multiplication ( #1054 )
2023-06-26 14:42:58 -07:00
George Hotz
571089f10e
Back off minor speed stuff for simplicity ( #1053 )
...
* passing in buffers doesn't increase speed
* functools.reduce
* no more get_buffers
2023-06-26 14:42:17 -07:00
Rayan Hatout
dedbd970aa
Optimizations in lazy.py ( #987 )
...
* optimizations in lazy.py
* make mypy happy with stubs and fix the graph import hack
* merge conflict in helpers.py
2023-06-26 13:55:42 -07:00
Roelof van Dijk
8bea6b6d35
perf/refactor_weakops ( #1052 )
...
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-06-26 10:13:33 -07:00
Roelof van Dijk
8c65f9324c
refactor: print formatting for llama timing ( #1050 )
...
* refactor: print formatting for llama timing, report median and individual runs
* feat: back to mean
* fix: whitespace
* fix: add mean to print
---------
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-06-26 09:49:31 -07:00
Roelof van Dijk
c604ef4beb
symbolic.py: faster Node.sum, faster SumNode.div ( #1014 )
...
* refactor: replace isinstance with class check where possible
* refactor: faster partition
* fix; flake8
* feat: rework node.sum, correct list typing
* fix: typo
* feat: refactor sum
* fix: pylint
* refactor: simpler sum and factorize
* feat; clean up sumnode div, all cpu tests pass
* feat: simplify floordiv, cache factorization
* don't factor numnodes at all
* python 3.8 functools does not yet have @cache
* fix: restore assert
* refactor, fix failing tests
* fix: address review comments
* feat: rework, add specialization, remove cache
* fix: remove specialization
* feat: no tuple conversion, faster loop
---------
Co-authored-by: Roelof van Dijk <roelof.van.dijk@vitestro.com>
2023-06-26 09:47:17 -07:00