tinygrad

mirror of https://github.com/tinygrad/tinygrad.git synced 2026-06-24 02:14:17 +00:00

History

divinity76 bec4f59ce8 workaround f16 cast ambiguity (#8935 ) for unknown reasons, without this, when trying to execute "Llama 3.2 1B", I get the error below. Fwiw I do not know the performance impact for this change. I can't even get exo running, but this change allows me to /get further/ (before running into a separate issue with vram allocation? story for another day i suppose) error: ``` Failed to fetch completions: Error processing prompt (see logs with DEBUG>=2): Nvrtc Error 6, NVRTC_ERROR_COMPILATION <null>(18): error: more than one user-defined conversion from "nv_bfloat16" to "half" applies: function "__half::__half(float)" (declared at line 214 of /usr/include/cuda_fp16.hpp) function "__half::__half(short)" (declared at line 227 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned short)" (declared at line 228 of /usr/include/cuda_fp16.hpp) function "__half::__half(int)" (declared at line 229 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned int)" (declared at line 230 of /usr/include/cuda_fp16.hpp) function "__half::__half(long long)" (declared at line 231 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned long long)" (declared at line 232 of /usr/include/cuda_fp16.hpp) ((half4)((data0+(alu0+(gidx1<<14)+(lidx0<<11)+alu1)))) = make_half4(((half)(val0)),((half)(val1)),((half)(val2)),((half)(val3))); ^ <null>(18): error: more than one user-defined conversion from "nv_bfloat16" to "half" applies: function "__half::__half(float)" (declared at line 214 of /usr/include/cuda_fp16.hpp) function "__half::__half(short)" (declared at line 227 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned short)" (declared at line 228 of /usr/include/cuda_fp16.hpp) function "__half::__half(int)" (declared at line 229 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned int)" (declared at line 230 of /usr/include/cuda_fp16.hpp) function "__half::__half(long long)" (declared at line 231 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned long long)" (declared at line 232 of /usr/include/cuda_fp16.hpp) ((half4)((data0+(alu0+(gidx1<<14)+(lidx0<<11)+alu1)))) = make_half4(((half)(val0)),((half)(val1)),((half)(val2)),((half)(val3))); ^ <null>(18): error: more than one user-defined conversion from "nv_bfloat16" to "half" applies: function "__half::__half(float)" (declared at line 214 of /usr/include/cuda_fp16.hpp) function "__half::__half(short)" (declared at line 227 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned short)" (declared at line 228 of /usr/include/cuda_fp16.hpp) function "__half::__half(int)" (declared at line 229 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned int)" (declared at line 230 of /usr/include/cuda_fp16.hpp) function "__half::__half(long long)" (declared at line 231 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned long long)" (declared at line 232 of /usr/include/cuda_fp16.hpp) ((half4)((data0+(alu0+(gidx1<<14)+(lidx0<<11)+alu1)))) = make_half4(((half)(val0)),((half)(val1)),((half)(val2)),((half)(val3))); ^ <null>(18): error: more than one user-defined conversion from "nv_bfloat16" to "half" applies: function "__half::__half(float)" (declared at line 214 of /usr/include/cuda_fp16.hpp) function "__half::__half(short)" (declared at line 227 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned short)" (declared at line 228 of /usr/include/cuda_fp16.hpp) function "__half::__half(int)" (declared at line 229 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned int)" (declared at line 230 of /usr/include/cuda_fp16.hpp) function "__half::__half(long long)" (declared at line 231 of /usr/include/cuda_fp16.hpp) function "__half::__half(unsigned long long)" (declared at line 232 of /usr/include/cuda_fp16.hpp) ((half4)((data0+(alu0+(gidx1<<14)+(lidx0<<11)+alu1)))) = make_half4(((half)(val0)),((half)(val1)),((half)(val2)),((half)(val3))); ^ 4 errors detected in the compilation of "<null>". ```		2025-02-11 09:38:56 +08:00
..
accel	move things, clean up extra (#2292 )	2023-11-13 20:18:40 -08:00
amdpci	am_smi: print power state (#9013 )	2025-02-10 23:07:39 +03:00
assembly	s/UOps/Ops (#7500 )	2024-11-03 11:26:10 +08:00
backends	bring back the DSP runtime	2024-12-31 12:01:42 -05:00
datasets	do not construct unmasked VALID (#8759 )	2025-01-28 20:51:21 +02:00
disassemblers/adreno	qcom fix disasm (#6703 )	2024-09-24 15:23:43 +08:00
dsp	dsp simulator (#8869 )	2025-02-04 09:45:04 +08:00
gemm	speed docs + upgrades [pr] (#8964 )	2025-02-08 17:28:52 +08:00
hip_gpu_driver	create_schedule([x.lazydata]) -> x.schedule() in tests (#8449 )	2024-12-31 03:15:52 +08:00
hiprtc	use comgr to compile (#3248 )	2024-01-26 18:27:49 -08:00
junk	coder.py can write and run code (#2439 )	2023-11-25 12:27:54 -08:00
models	workaround f16 cast ambiguity (#8935 )	2025-02-11 09:38:56 +08:00
nv_gpu_driver	nv fix shared_memory_size (#7239 )	2024-10-23 21:59:47 +03:00
optimization	Tuple -> tuple, List -> list [pr] (#8936 )	2025-02-06 14:21:19 -05:00
qcom_gpu_driver	qcom match texture/sampler descriptors to OpenCL (#7622 )	2024-11-11 21:56:51 +03:00
resnet18	beat mlx at resnet 18 (#6611 )	2024-09-20 11:28:01 +08:00
webgpu	Autogen webgpu dawn, removing wgpu-py dependency (f16 support part 1) (#8646 )	2025-02-07 15:16:59 +08:00
archprobe.py	move dtypes to dtype.py (#2964 )	2024-01-01 14:58:48 -08:00
augment.py	[ready] Replacing os with pathlib (#1708 )	2023-08-30 10:41:08 -07:00
disk_read_speed.py	io_uring for copies from disk (#5035 )	2024-06-21 11:36:51 +03:00
dump_cache.py	wow how did i think that was okay (#2339 )	2023-11-16 21:21:11 -08:00
export_model.py	encapsulate the exported webgpu model (#8203 )	2024-12-13 10:55:37 +01:00
f16_decompress.py	u32 to f16 in tinygrad (#8074 )	2024-12-06 12:00:13 +01:00
gradcheck.py	tests from grad uop path [pr] (#8313 )	2024-12-18 09:25:05 -08:00
hip_events.py	move autogen to runtime/autogen (#3254 )	2024-01-26 12:44:19 -08:00
introspection.py	rename LazyBuffer -> UOp [pr] (#8169 )	2024-12-11 16:15:52 -08:00
lr_scheduler.py	use at least float32 for optim.lr (#4297 )	2024-04-25 14:42:28 -04:00
mcts_search.py	[TIP-9] rename Opt's amt to arg 2 (#8770 )	2025-01-27 14:19:04 -05:00
multitensor.py	multitensor start (#2676 )	2023-12-07 17:07:05 -08:00
onnx.py	copy onnx_ops into onnx (#8876 )	2025-02-03 12:15:07 -05:00
onnx_helpers.py	add onnx_helpers to extra and add ort validate to benchmark_onnx (#8890 )	2025-02-04 16:36:01 -05:00
reduce_speed.py	reduce speed example [pr] (#8978 )	2025-02-09 14:13:59 +08:00
ring_copy.py	ring copy example (#3185 )	2024-01-19 23:34:30 -05:00
setup_mock_amd_osx.sh	add script to install amd mockgpu on macOS (#8536 )	2025-01-09 01:29:25 +03:00
thneed.py	new style device (#2530 )	2023-11-30 17:07:16 -08:00
threefry.py	feat: make buffer (#6745 )	2024-09-25 18:31:03 +08:00
to_movement_ops.py	s/UOps/Ops (#7500 )	2024-11-03 11:26:10 +08:00
training.py	tinytqdm.set_description and tinytrange (#5101 )	2024-06-22 14:45:06 -04:00
transfer_speed.py	hotfix: copy size is in bytes	2024-01-17 16:44:15 +00:00