tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-06-24 02:14:17 +00:00

History

Vyacheslav Pachkov 4c33192a8b add qcom runtime (#5213 ) * qcom: driver init * autogen stubs for msm_kgsl also fixup ioctls to show numbers instead of _IOW macros * autogen: add adreno commands and registers * ops_qcom: QcomAllocator + signals * fix EDEADLK in hwqueue, init timestamps, use opencl compiler for qcom * qcom: we do not really need all these constants input/output is enough * qcom: perfctr for CS (do not really need all the rest) * qcom: HALFREGFOOTPRINT and FULLREGFOOTPRINT are set to be around max * qcom: explicitly set instruction len based on the shader size * ops_qcom: Program init extracts shader from open cl binary sets input/output buffers allocates stack sets cs mode runs shader * use data64_le from helpers * ops_qcom: use fill_kernargs for filling i/o buffers * ops_qcom: add QcomCopyQueue just for api & set kernargs_args_offset * new signals & fix exec * add QCOM to the list of supported devices * correct QcomComputeQueue._wait using CP_WAIT_REG_MEM * fix exec, synchronize before copyout * correct setting num_units for ST_SHADER * fix gpu hangs on sigs with CP_MEM_WRITE, it is uncached mem anyway * extract offsets to kernel arguments from opencl binary * extract constants values and offsets from opencl binary * handle KGSL_MEMFLAGS_USE_CPU_MAP correctly * align kernel name to 4 bytes when skipping kernel opencl struct * skip to consts directly using an offset from opencl binary header * fix alloc * get halfreg and fullreg from opencl bin * set unmultipled global sizes as kernel group in HLSQ_CS_NDRANGE * parse prg offset from open cl binary * save loc with HLSQ_CS_CNTL. set this with HLSQ_CONTROL_2_REG * support for vals in _fill_kernargs * support 16-bit constants * use KGSL_CONTEXT_NO_FAULT_TOLERANCE for contexts this helps to not fall down when executing big kernels /* Don't time out if the context has disabled it / if (drawobj->context->flags & KGSL_CONTEXT_NO_FAULT_TOLERANCE) return; minor changes of _exec * QCOMRenderer * disable HCQGraph for demo. TOOD: support HCQ update api * support HCQ - remove copy queue - add updates - add strides for buffs and vars for QCOM * bufs_stride * clean ups * linter * call super().__init__(value) in QcomSignal * disable=unused-import * mypy * type ignore when queue is on the device * fix * query gpu_id. Will be useful for selecting commands e.g. CP_EVENT_WRITE vs CP_EVENT_WRITE7 * working timestamps * free context after device is done * move gpu stack to the device * reserve some space with lib_gpu for gpu to write to this fixes test_interpolate_bilinear * exclude tests that fails with GPU=1 on qualcomm * lint * unmap mem in _gpu_free * ctxt priority and preemtion policy * remove old qcom * pass size to self.device.allocator.free * skip tests only on qcom * use kgsl and adreno defines instead of numeric vals * use allocator for allocating lib_gpu * update to QcomArgsState from master * intermediate commit while conquering images * enable image tests on qcom * fix shader disasm size, dump textures stuff * working images * allow signals to be 0 * set branchstack from OpenCL binary Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> * set shared memory size from OpenCL binary Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> * update images in QcomArgsState & less loc for images * set stack sizes from OpenCL binary Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> * stack allocation based on OpenCL binary Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> * better autogen for kgsl and adreno. no more bitshifts Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> * cleanup commit for parse cl lib Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> * dont forget actual generated files * refactor + less loc Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> * device.py back * lint * ruff * timestamp divisor Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> * fix tex fmt & round global size Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> * dtypes * 19.2MHz * -1 loc in _update_exec * remove noqa --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>		2024-09-02 19:35:47 +03:00
..
external	hotfix: ebs print kernel names	2024-08-29 21:20:36 -07:00
imported	update pylint path to check indent/space for all (#6022 )	2024-08-10 14:41:09 -04:00
models	add failing regression test for image (#5540 )	2024-07-17 17:27:18 -07:00
testextra	names shadowing builtins (#5179 )	2024-06-27 08:15:01 -04:00
unit	UOp pattern x + x -> x * 2 (#6224 )	2024-08-21 12:06:19 -04:00
__init__.py	All devices are equal! (#196 )	2020-12-15 23:44:08 -08:00
Dockerfile	Docker fix (#1039 )	2023-06-25 10:38:58 -07:00
helpers.py	graph_rewrite complexity tests [run_process_replay] (#6317 )	2024-08-29 22:39:08 +03:00
test_arange.py	bump limit for test_llama_embedding_opt (#6332 )	2024-08-31 10:03:43 -04:00
test_assign.py	RUF018 assignment-in-assert [run_process_replay] (#6172 )	2024-08-19 00:34:52 -04:00
test_compile_failures.py	CIFAR trainer + various bugfixes / improvements (#6146 )	2024-08-20 16:58:46 -07:00
test_const_folding.py	lazy const fold idiv 1 (#6285 )	2024-08-26 10:29:59 -04:00
test_conv.py	db in wal mode (#5388 )	2024-07-12 20:43:36 -07:00
test_conv_shapetracker.py	st_arg, never -1 [run_process_replay] (#6128 )	2024-08-16 22:46:56 -07:00
test_copy_speed.py	remove cpu and torch backends (#3399 )	2024-02-15 16:55:39 +01:00
test_custom_function.py	remove NEG from handwritten ast in tests (#6234 )	2024-08-22 09:06:59 -04:00
test_device_speed.py	remove iter from uopgraph (#6110 )	2024-08-16 15:58:29 -07:00
test_dtype.py	Tensor.prod (#6250 )	2024-08-23 10:06:32 -04:00
test_dtype_alu.py	remove UnaryOps.NEG from lazy.py (#6193 )	2024-08-19 18:41:28 -04:00
test_fusion_op.py	AST is UOp (#6030 )	2024-08-16 22:09:00 +03:00
test_fuzz_shape_ops.py	fix typing for test to run in py38 (#4930 )	2024-06-12 13:22:30 -04:00
test_gc.py	add E275 missing-whitespace-after-keyword linting rule (#6149 )	2024-08-17 16:44:34 -04:00
test_graph.py	fix hcq sync (#5062 )	2024-06-26 17:50:37 +03:00
test_hcq.py	hcq skip tests when no multidev (#6235 )	2024-08-22 18:27:16 +03:00
test_image_dtype.py	add qcom runtime (#5213 )	2024-09-02 19:35:47 +03:00
test_jit.py	remove realize from threefry (#5969 )	2024-08-07 15:08:49 -07:00
test_kernel_cache.py	move the compiler cache to be global (#2957 )	2024-01-01 10:59:56 -08:00
test_lazybuffer.py	merge uops with ops (#6111 )	2024-08-16 18:17:57 -04:00
test_linearizer.py	remove extra.ops and LazyOp support from Kernel (#6267 )	2024-08-24 16:44:38 +03:00
test_linearizer_dumb.py	migrate test_linearizer_dumb.py to UOp AST (#6241 )	2024-08-24 16:27:29 +03:00
test_linearizer_failures.py	hotfix: lin_fail_41 passes on my M3 Max	2024-08-31 11:46:46 -07:00
test_linearizer_overflows.py	migrate test_linearizer_overflows.py to UOp AST (#6244 )	2024-08-24 16:10:29 +03:00
test_masked_st.py	multitensor start (#2676 )	2023-12-07 17:07:05 -08:00
test_method_cache.py	simple LoadOps.ASSIGN (#3745 )	2024-03-14 20:44:34 -07:00
test_multitensor.py	fix Tensor.prod for multitensor (#6264 )	2024-08-24 08:52:24 -04:00
test_net_speed.py	nv mockgpu (#4600 )	2024-05-15 23:46:08 +03:00
test_nn.py	Fix track_running_stats in batchnorm (#6200 )	2024-08-20 14:01:22 -07:00
test_ocl.py	touchup cl_errors (#6058 )	2024-08-13 13:06:59 -04:00
test_ops.py	add qcom runtime (#5213 )	2024-09-02 19:35:47 +03:00
test_optim.py	improve test_dropout_on_shard (#4912 )	2024-06-11 11:36:02 -04:00
test_pattern_matcher.py	remove NEG from handwritten ast in tests (#6234 )	2024-08-22 09:06:59 -04:00
test_pickle.py	AST is UOp (#6030 )	2024-08-16 22:09:00 +03:00
test_profiler.py	raise time limit for ci in test_profile_multidev_transfer (#6227 )	2024-08-21 22:42:03 +03:00
test_randomness.py	threefry touchup [run_process_replay] (#6169 )	2024-08-18 23:01:24 -04:00
test_rearrange_einops.py	Missing features from rearrange (#6184 )	2024-08-19 11:19:07 -07:00
test_renderer_failures.py	merge uops with ops (#6111 )	2024-08-16 18:17:57 -04:00
test_sample.py	enable test_sample for all backend (#2593 )	2023-12-03 17:20:27 -05:00
test_schedule.py	graph_rewrite complexity tests [run_process_replay] (#6317 )	2024-08-29 22:39:08 +03:00
test_search.py	migrate test_search.py to UOp AST (#6245 )	2024-08-24 16:13:53 +03:00
test_setitem.py	setitem in-place operator tests (#4577 )	2024-05-14 01:28:02 -04:00
test_specific_conv.py	nv mockgpu (#4600 )	2024-05-15 23:46:08 +03:00
test_speed_v_torch.py	remove CUDACPU flag in tests [run_process_replay] (#5902 )	2024-08-04 16:06:38 -04:00
test_subbuffer.py	remove CUDACPU flag in tests [run_process_replay] (#5902 )	2024-08-04 16:06:38 -04:00
test_symbolic_jit.py	sort vars in jit when building expected input args (#4990 )	2024-06-16 15:55:51 -04:00
test_symbolic_ops.py	threefry touchup [run_process_replay] (#6169 )	2024-08-18 23:01:24 -04:00
test_symbolic_shapetracker.py	support symbolic reshape with non-contiguous (#4844 )	2024-06-05 16:01:19 -04:00
test_tensor.py	add support for retain_graph in backward (#6145 )	2024-08-18 16:08:31 -07:00
test_tensor_data.py	BEAM_COMPARE=2 validates the correctness of BEAM kernels (#5458 )	2024-07-13 13:53:43 -07:00
test_tensor_variable.py	Should this symbolic test fail? (#4501 )	2024-06-18 15:21:26 -04:00
test_to_numpy.py	Apply ruff linting rules to tests (#2473 )	2023-11-27 21:24:06 -08:00
test_transcendental.py	lower float64 sin fuzzer threshold (#6173 )	2024-08-19 00:25:42 -04:00
test_uop_graph.py	remove more rules [run_process_replay] (#6326 )	2024-08-29 16:27:10 -07:00
test_uops.py	remove UnaryOps.NEG (#6238 )	2024-08-22 14:21:39 -04:00
test_uops_stats.py	fix intel wmma flop counting, add flop counting tests for different tensor cores (#6192 )	2024-08-25 18:37:05 -07:00
test_winograd.py	merge uops with ops (#6111 )	2024-08-16 18:17:57 -04:00
test_zero_copy.py	remove numpy from device (#3123 )	2024-01-14 19:36:05 -08:00