tinygrad/test
Vyacheslav Pachkov 4c33192a8b
add qcom runtime (#5213)
* qcom: driver init

* autogen stubs for msm_kgsl also fixup ioctls to show numbers instead of _IOW macros

* autogen: add adreno commands and registers

* ops_qcom: QcomAllocator + signals

* fix EDEADLK in hwqueue, init timestamps, use opencl compiler for qcom

* qcom: we do not really need all these constants input/output is enough

* qcom: perfctr for CS (do not really need all the rest)

* qcom: HALFREGFOOTPRINT and FULLREGFOOTPRINT are set to be around max

* qcom: explicitly set instruction len based on the shader size

* ops_qcom: Program init

extracts shader from open cl binary
sets input/output buffers
allocates stack
sets cs mode
runs shader

* use data64_le from helpers

* ops_qcom: use fill_kernargs for filling i/o buffers

* ops_qcom: add QcomCopyQueue just for api & set kernargs_args_offset

* new signals & fix exec

* add QCOM to the list of supported devices

* correct QcomComputeQueue._wait using CP_WAIT_REG_MEM

* fix exec, synchronize before copyout

* correct setting num_units for ST_SHADER

* fix gpu hangs on sigs with CP_MEM_WRITE, it is uncached mem anyway

* extract offsets to kernel arguments from opencl binary

* extract constants values and offsets from opencl binary

* handle KGSL_MEMFLAGS_USE_CPU_MAP correctly

* align kernel name to 4 bytes when skipping kernel opencl struct

* skip to consts directly using an offset from opencl binary header

* fix alloc

* get halfreg and fullreg from opencl bin

* set unmultipled global sizes as kernel group in HLSQ_CS_NDRANGE

* parse prg offset from open cl binary

* save loc with HLSQ_CS_CNTL. set this with HLSQ_CONTROL_2_REG

* support for vals in _fill_kernargs

* support 16-bit constants

* use KGSL_CONTEXT_NO_FAULT_TOLERANCE for contexts

this helps to not fall down when executing big kernels

    /* Don't time out if the context has disabled it */
    if (drawobj->context->flags & KGSL_CONTEXT_NO_FAULT_TOLERANCE)
        return;

* minor changes of _exec

* QCOMRenderer

* disable HCQGraph for demo. TOOD: support HCQ update api

* support HCQ

- remove copy queue
- add updates
- add strides for buffs and vars for QCOM

* bufs_stride

* clean ups

* linter

* call super().__init__(value) in QcomSignal

* disable=unused-import

* mypy

* type ignore when queue is on the device

* fix

* query gpu_id.
Will be useful for selecting commands e.g. CP_EVENT_WRITE vs
CP_EVENT_WRITE7

* working timestamps

* free context after device is done

* move gpu stack to the device

* reserve some space with lib_gpu for gpu to write to

this fixes test_interpolate_bilinear

* exclude tests that fails with GPU=1 on qualcomm

* lint

* unmap mem in _gpu_free

* ctxt priority and preemtion policy

* remove old qcom

* pass size to self.device.allocator.free

* skip tests only on qcom

* use kgsl and adreno defines instead of numeric vals

* use allocator for allocating lib_gpu

* update to QcomArgsState from master

* intermediate commit while conquering images

* enable image tests on qcom

* fix shader disasm size, dump textures stuff

* working images

* allow signals to be 0

* set branchstack from OpenCL binary

Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>

* set shared memory size from OpenCL binary

Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>

* update images in QcomArgsState & less loc for images

* set stack sizes from OpenCL binary

Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>

* stack allocation based on OpenCL binary

Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>

* better autogen for kgsl and adreno. no more bitshifts

Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>

* cleanup commit for parse cl lib

Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>

* dont forget actual generated files

* refactor + less loc

Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>

* device.py back

* lint

* ruff

* timestamp divisor

Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>

* fix tex fmt & round global size

Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>

* dtypes

* 19.2MHz

* -1 loc in _update_exec

* remove noqa

---------

Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com>
Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>
2024-09-02 19:35:47 +03:00
..
external hotfix: ebs print kernel names 2024-08-29 21:20:36 -07:00
imported update pylint path to check indent/space for all (#6022) 2024-08-10 14:41:09 -04:00
models add failing regression test for image (#5540) 2024-07-17 17:27:18 -07:00
testextra names shadowing builtins (#5179) 2024-06-27 08:15:01 -04:00
unit UOp pattern x + x -> x * 2 (#6224) 2024-08-21 12:06:19 -04:00
__init__.py All devices are equal! (#196) 2020-12-15 23:44:08 -08:00
Dockerfile Docker fix (#1039) 2023-06-25 10:38:58 -07:00
helpers.py graph_rewrite complexity tests [run_process_replay] (#6317) 2024-08-29 22:39:08 +03:00
test_arange.py bump limit for test_llama_embedding_opt (#6332) 2024-08-31 10:03:43 -04:00
test_assign.py RUF018 assignment-in-assert [run_process_replay] (#6172) 2024-08-19 00:34:52 -04:00
test_compile_failures.py CIFAR trainer + various bugfixes / improvements (#6146) 2024-08-20 16:58:46 -07:00
test_const_folding.py lazy const fold idiv 1 (#6285) 2024-08-26 10:29:59 -04:00
test_conv.py db in wal mode (#5388) 2024-07-12 20:43:36 -07:00
test_conv_shapetracker.py st_arg, never -1 [run_process_replay] (#6128) 2024-08-16 22:46:56 -07:00
test_copy_speed.py remove cpu and torch backends (#3399) 2024-02-15 16:55:39 +01:00
test_custom_function.py remove NEG from handwritten ast in tests (#6234) 2024-08-22 09:06:59 -04:00
test_device_speed.py remove iter from uopgraph (#6110) 2024-08-16 15:58:29 -07:00
test_dtype.py Tensor.prod (#6250) 2024-08-23 10:06:32 -04:00
test_dtype_alu.py remove UnaryOps.NEG from lazy.py (#6193) 2024-08-19 18:41:28 -04:00
test_fusion_op.py AST is UOp (#6030) 2024-08-16 22:09:00 +03:00
test_fuzz_shape_ops.py fix typing for test to run in py38 (#4930) 2024-06-12 13:22:30 -04:00
test_gc.py add E275 missing-whitespace-after-keyword linting rule (#6149) 2024-08-17 16:44:34 -04:00
test_graph.py fix hcq sync (#5062) 2024-06-26 17:50:37 +03:00
test_hcq.py hcq skip tests when no multidev (#6235) 2024-08-22 18:27:16 +03:00
test_image_dtype.py add qcom runtime (#5213) 2024-09-02 19:35:47 +03:00
test_jit.py remove realize from threefry (#5969) 2024-08-07 15:08:49 -07:00
test_kernel_cache.py move the compiler cache to be global (#2957) 2024-01-01 10:59:56 -08:00
test_lazybuffer.py merge uops with ops (#6111) 2024-08-16 18:17:57 -04:00
test_linearizer.py remove extra.ops and LazyOp support from Kernel (#6267) 2024-08-24 16:44:38 +03:00
test_linearizer_dumb.py migrate test_linearizer_dumb.py to UOp AST (#6241) 2024-08-24 16:27:29 +03:00
test_linearizer_failures.py hotfix: lin_fail_41 passes on my M3 Max 2024-08-31 11:46:46 -07:00
test_linearizer_overflows.py migrate test_linearizer_overflows.py to UOp AST (#6244) 2024-08-24 16:10:29 +03:00
test_masked_st.py multitensor start (#2676) 2023-12-07 17:07:05 -08:00
test_method_cache.py simple LoadOps.ASSIGN (#3745) 2024-03-14 20:44:34 -07:00
test_multitensor.py fix Tensor.prod for multitensor (#6264) 2024-08-24 08:52:24 -04:00
test_net_speed.py nv mockgpu (#4600) 2024-05-15 23:46:08 +03:00
test_nn.py Fix track_running_stats in batchnorm (#6200) 2024-08-20 14:01:22 -07:00
test_ocl.py touchup cl_errors (#6058) 2024-08-13 13:06:59 -04:00
test_ops.py add qcom runtime (#5213) 2024-09-02 19:35:47 +03:00
test_optim.py improve test_dropout_on_shard (#4912) 2024-06-11 11:36:02 -04:00
test_pattern_matcher.py remove NEG from handwritten ast in tests (#6234) 2024-08-22 09:06:59 -04:00
test_pickle.py AST is UOp (#6030) 2024-08-16 22:09:00 +03:00
test_profiler.py raise time limit for ci in test_profile_multidev_transfer (#6227) 2024-08-21 22:42:03 +03:00
test_randomness.py threefry touchup [run_process_replay] (#6169) 2024-08-18 23:01:24 -04:00
test_rearrange_einops.py Missing features from rearrange (#6184) 2024-08-19 11:19:07 -07:00
test_renderer_failures.py merge uops with ops (#6111) 2024-08-16 18:17:57 -04:00
test_sample.py enable test_sample for all backend (#2593) 2023-12-03 17:20:27 -05:00
test_schedule.py graph_rewrite complexity tests [run_process_replay] (#6317) 2024-08-29 22:39:08 +03:00
test_search.py migrate test_search.py to UOp AST (#6245) 2024-08-24 16:13:53 +03:00
test_setitem.py setitem in-place operator tests (#4577) 2024-05-14 01:28:02 -04:00
test_specific_conv.py nv mockgpu (#4600) 2024-05-15 23:46:08 +03:00
test_speed_v_torch.py remove CUDACPU flag in tests [run_process_replay] (#5902) 2024-08-04 16:06:38 -04:00
test_subbuffer.py remove CUDACPU flag in tests [run_process_replay] (#5902) 2024-08-04 16:06:38 -04:00
test_symbolic_jit.py sort vars in jit when building expected input args (#4990) 2024-06-16 15:55:51 -04:00
test_symbolic_ops.py threefry touchup [run_process_replay] (#6169) 2024-08-18 23:01:24 -04:00
test_symbolic_shapetracker.py support symbolic reshape with non-contiguous (#4844) 2024-06-05 16:01:19 -04:00
test_tensor.py add support for retain_graph in backward (#6145) 2024-08-18 16:08:31 -07:00
test_tensor_data.py BEAM_COMPARE=2 validates the correctness of BEAM kernels (#5458) 2024-07-13 13:53:43 -07:00
test_tensor_variable.py Should this symbolic test fail? (#4501) 2024-06-18 15:21:26 -04:00
test_to_numpy.py Apply ruff linting rules to tests (#2473) 2023-11-27 21:24:06 -08:00
test_transcendental.py lower float64 sin fuzzer threshold (#6173) 2024-08-19 00:25:42 -04:00
test_uop_graph.py remove more rules [run_process_replay] (#6326) 2024-08-29 16:27:10 -07:00
test_uops.py remove UnaryOps.NEG (#6238) 2024-08-22 14:21:39 -04:00
test_uops_stats.py fix intel wmma flop counting, add flop counting tests for different tensor cores (#6192) 2024-08-25 18:37:05 -07:00
test_winograd.py merge uops with ops (#6111) 2024-08-16 18:17:57 -04:00
test_zero_copy.py remove numpy from device (#3123) 2024-01-14 19:36:05 -08:00