tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-06-24 02:14:17 +00:00

History

Vyacheslav Pachkov 4c33192a8b add qcom runtime (#5213 ) * qcom: driver init * autogen stubs for msm_kgsl also fixup ioctls to show numbers instead of _IOW macros * autogen: add adreno commands and registers * ops_qcom: QcomAllocator + signals * fix EDEADLK in hwqueue, init timestamps, use opencl compiler for qcom * qcom: we do not really need all these constants input/output is enough * qcom: perfctr for CS (do not really need all the rest) * qcom: HALFREGFOOTPRINT and FULLREGFOOTPRINT are set to be around max * qcom: explicitly set instruction len based on the shader size * ops_qcom: Program init extracts shader from open cl binary sets input/output buffers allocates stack sets cs mode runs shader * use data64_le from helpers * ops_qcom: use fill_kernargs for filling i/o buffers * ops_qcom: add QcomCopyQueue just for api & set kernargs_args_offset * new signals & fix exec * add QCOM to the list of supported devices * correct QcomComputeQueue._wait using CP_WAIT_REG_MEM * fix exec, synchronize before copyout * correct setting num_units for ST_SHADER * fix gpu hangs on sigs with CP_MEM_WRITE, it is uncached mem anyway * extract offsets to kernel arguments from opencl binary * extract constants values and offsets from opencl binary * handle KGSL_MEMFLAGS_USE_CPU_MAP correctly * align kernel name to 4 bytes when skipping kernel opencl struct * skip to consts directly using an offset from opencl binary header * fix alloc * get halfreg and fullreg from opencl bin * set unmultipled global sizes as kernel group in HLSQ_CS_NDRANGE * parse prg offset from open cl binary * save loc with HLSQ_CS_CNTL. set this with HLSQ_CONTROL_2_REG * support for vals in _fill_kernargs * support 16-bit constants * use KGSL_CONTEXT_NO_FAULT_TOLERANCE for contexts this helps to not fall down when executing big kernels /* Don't time out if the context has disabled it / if (drawobj->context->flags & KGSL_CONTEXT_NO_FAULT_TOLERANCE) return; minor changes of _exec * QCOMRenderer * disable HCQGraph for demo. TOOD: support HCQ update api * support HCQ - remove copy queue - add updates - add strides for buffs and vars for QCOM * bufs_stride * clean ups * linter * call super().__init__(value) in QcomSignal * disable=unused-import * mypy * type ignore when queue is on the device * fix * query gpu_id. Will be useful for selecting commands e.g. CP_EVENT_WRITE vs CP_EVENT_WRITE7 * working timestamps * free context after device is done * move gpu stack to the device * reserve some space with lib_gpu for gpu to write to this fixes test_interpolate_bilinear * exclude tests that fails with GPU=1 on qualcomm * lint * unmap mem in _gpu_free * ctxt priority and preemtion policy * remove old qcom * pass size to self.device.allocator.free * skip tests only on qcom * use kgsl and adreno defines instead of numeric vals * use allocator for allocating lib_gpu * update to QcomArgsState from master * intermediate commit while conquering images * enable image tests on qcom * fix shader disasm size, dump textures stuff * working images * allow signals to be 0 * set branchstack from OpenCL binary Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> * set shared memory size from OpenCL binary Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> * update images in QcomArgsState & less loc for images * set stack sizes from OpenCL binary Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> * stack allocation based on OpenCL binary Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> * better autogen for kgsl and adreno. no more bitshifts Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> * cleanup commit for parse cl lib Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> * dont forget actual generated files * refactor + less loc Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> * device.py back * lint * ruff * timestamp divisor Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> * fix tex fmt & round global size Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com> * dtypes * 19.2MHz * -1 loc in _update_exec * remove noqa --------- Co-authored-by: George Hotz <72895+geohot@users.noreply.github.com> Co-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>		2024-09-02 19:35:47 +03:00
..
accel	move things, clean up extra (#2292 )	2023-11-13 20:18:40 -08:00
assembly	lowerer is kernel [run_process_replay] (#5437 )	2024-07-12 18:50:55 -07:00
backends	move GraphException to jit.py (#5744 )	2024-07-26 19:01:12 -04:00
datasets	init REDUCE_AXIS with BinaryOps (#6256 )	2024-08-24 11:28:41 +03:00
disassemblers/adreno	move disassemblers and openpilot (#4592 )	2024-05-14 19:30:02 -07:00
gemm	extra/gemm/triton_nv_matmul: fix Program arguments (#6212 )	2024-08-20 14:05:38 -07:00
hip_gpu_driver	feat: autogen from kernel register offset headers (#6056 )	2024-08-12 14:08:35 -07:00
hiprtc	use comgr to compile (#3248 )	2024-01-26 18:27:49 -08:00
junk	coder.py can write and run code (#2439 )	2023-11-25 12:27:54 -08:00
mockgpu	autogen cleanup (#6064 )	2024-08-14 20:20:35 +03:00
models	sdxl batched inference fixes (#6293 )	2024-08-28 07:44:58 -04:00
nv_gpu_driver	update pylint path to check indent/space for all (#6022 )	2024-08-10 14:41:09 -04:00
optimization	update CI tests in extra with UOp AST (#6290 )	2024-08-28 22:26:50 +03:00
qcom_gpu_driver	add qcom runtime (#5213 )	2024-09-02 19:35:47 +03:00
archprobe.py	move dtypes to dtype.py (#2964 )	2024-01-01 14:58:48 -08:00
augment.py	[ready] Replacing os with pathlib (#1708 )	2023-08-30 10:41:08 -07:00
disk_read_speed.py	io_uring for copies from disk (#5035 )	2024-06-21 11:36:51 +03:00
dump_cache.py	wow how did i think that was okay (#2339 )	2023-11-16 21:21:11 -08:00
export_model.py	all realize 2 (#4527 )	2024-05-10 22:43:09 -07:00
f16_w_uint32.py	fix various examples (#4691 )	2024-05-22 20:43:21 -04:00
gradcheck.py	Fix: Jacobian tests [WIP] (#1126 )	2023-07-05 15:36:22 -07:00
hip_events.py	move autogen to runtime/autogen (#3254 )	2024-01-26 12:44:19 -08:00
introspection.py	bring buffer back to device (#4517 )	2024-05-10 11:22:31 -07:00
lr_scheduler.py	use at least float32 for optim.lr (#4297 )	2024-04-25 14:42:28 -04:00
mcts_search.py	AST is UOp (#6030 )	2024-08-16 22:09:00 +03:00
multitensor.py	multitensor start (#2676 )	2023-12-07 17:07:05 -08:00
onnx.py	fix some type error in onnx [run_process_replay] (#6153 )	2024-08-17 19:54:20 -04:00
onnx_ops.py	Tensor.prod (#6250 )	2024-08-23 10:06:32 -04:00
ops.py	init REDUCE_AXIS with BinaryOps (#6256 )	2024-08-24 11:28:41 +03:00
ring_copy.py	ring copy example (#3185 )	2024-01-19 23:34:30 -05:00
thneed.py	new style device (#2530 )	2023-11-30 17:07:16 -08:00
threefry.py	feat: example and extra tweaks (#6310 )	2024-08-28 19:26:11 -07:00
to_movement_ops.py	update CI tests in extra with UOp AST (#6290 )	2024-08-28 22:26:50 +03:00
training.py	tinytqdm.set_description and tinytrange (#5101 )	2024-06-22 14:45:06 -04:00
transfer_speed.py	hotfix: copy size is in bytes	2024-01-17 16:44:15 +00:00