mirror of
https://github.com/tinygrad/tinygrad.git
synced 2026-06-24 02:14:17 +00:00
for unknown reasons, without this, when trying to execute "Llama 3.2 1B", I get the error below. Fwiw I do not know the performance impact for this change. I can't even get exo running, but this change allows me to /get further/ (before running into a separate issue with vram allocation? story for another day i suppose)
error:
```
Failed to fetch completions: Error processing prompt (see logs with DEBUG>=2): Nvrtc Error 6, NVRTC_ERROR_COMPILATION <null>(18): error: more than one user-defined conversion from "nv_bfloat16" to "half" applies:
function "__half::__half(float)" (declared at line 214 of /usr/include/cuda_fp16.hpp)
function "__half::__half(short)" (declared at line 227 of /usr/include/cuda_fp16.hpp)
function "__half::__half(unsigned short)" (declared at line 228 of /usr/include/cuda_fp16.hpp)
function "__half::__half(int)" (declared at line 229 of /usr/include/cuda_fp16.hpp)
function "__half::__half(unsigned int)" (declared at line 230 of /usr/include/cuda_fp16.hpp)
function "__half::__half(long long)" (declared at line 231 of /usr/include/cuda_fp16.hpp)
function "__half::__half(unsigned long long)" (declared at line 232 of /usr/include/cuda_fp16.hpp)
*((half4*)((data0+(alu0+(gidx1<<14)+(lidx0<<11)+alu1)))) = make_half4(((half)(val0)),((half)(val1)),((half)(val2)),((half)(val3)));
^
<null>(18): error: more than one user-defined conversion from "nv_bfloat16" to "half" applies:
function "__half::__half(float)" (declared at line 214 of /usr/include/cuda_fp16.hpp)
function "__half::__half(short)" (declared at line 227 of /usr/include/cuda_fp16.hpp)
function "__half::__half(unsigned short)" (declared at line 228 of /usr/include/cuda_fp16.hpp)
function "__half::__half(int)" (declared at line 229 of /usr/include/cuda_fp16.hpp)
function "__half::__half(unsigned int)" (declared at line 230 of /usr/include/cuda_fp16.hpp)
function "__half::__half(long long)" (declared at line 231 of /usr/include/cuda_fp16.hpp)
function "__half::__half(unsigned long long)" (declared at line 232 of /usr/include/cuda_fp16.hpp)
*((half4*)((data0+(alu0+(gidx1<<14)+(lidx0<<11)+alu1)))) = make_half4(((half)(val0)),((half)(val1)),((half)(val2)),((half)(val3)));
^
<null>(18): error: more than one user-defined conversion from "nv_bfloat16" to "half" applies:
function "__half::__half(float)" (declared at line 214 of /usr/include/cuda_fp16.hpp)
function "__half::__half(short)" (declared at line 227 of /usr/include/cuda_fp16.hpp)
function "__half::__half(unsigned short)" (declared at line 228 of /usr/include/cuda_fp16.hpp)
function "__half::__half(int)" (declared at line 229 of /usr/include/cuda_fp16.hpp)
function "__half::__half(unsigned int)" (declared at line 230 of /usr/include/cuda_fp16.hpp)
function "__half::__half(long long)" (declared at line 231 of /usr/include/cuda_fp16.hpp)
function "__half::__half(unsigned long long)" (declared at line 232 of /usr/include/cuda_fp16.hpp)
*((half4*)((data0+(alu0+(gidx1<<14)+(lidx0<<11)+alu1)))) = make_half4(((half)(val0)),((half)(val1)),((half)(val2)),((half)(val3)));
^
<null>(18): error: more than one user-defined conversion from "nv_bfloat16" to "half" applies:
function "__half::__half(float)" (declared at line 214 of /usr/include/cuda_fp16.hpp)
function "__half::__half(short)" (declared at line 227 of /usr/include/cuda_fp16.hpp)
function "__half::__half(unsigned short)" (declared at line 228 of /usr/include/cuda_fp16.hpp)
function "__half::__half(int)" (declared at line 229 of /usr/include/cuda_fp16.hpp)
function "__half::__half(unsigned int)" (declared at line 230 of /usr/include/cuda_fp16.hpp)
function "__half::__half(long long)" (declared at line 231 of /usr/include/cuda_fp16.hpp)
function "__half::__half(unsigned long long)" (declared at line 232 of /usr/include/cuda_fp16.hpp)
*((half4*)((data0+(alu0+(gidx1<<14)+(lidx0<<11)+alu1)))) = make_half4(((half)(val0)),((half)(val1)),((half)(val2)),((half)(val3)));
^
4 errors detected in the compilation of "<null>".
```
|
||
|---|---|---|
| .. | ||
| accel | ||
| amdpci | ||
| assembly | ||
| backends | ||
| datasets | ||
| disassemblers/adreno | ||
| dsp | ||
| gemm | ||
| hip_gpu_driver | ||
| hiprtc | ||
| junk | ||
| models | ||
| nv_gpu_driver | ||
| optimization | ||
| qcom_gpu_driver | ||
| resnet18 | ||
| webgpu | ||
| archprobe.py | ||
| augment.py | ||
| disk_read_speed.py | ||
| dump_cache.py | ||
| export_model.py | ||
| f16_decompress.py | ||
| gradcheck.py | ||
| hip_events.py | ||
| introspection.py | ||
| lr_scheduler.py | ||
| mcts_search.py | ||
| multitensor.py | ||
| onnx.py | ||
| onnx_helpers.py | ||
| reduce_speed.py | ||
| ring_copy.py | ||
| setup_mock_amd_osx.sh | ||
| thneed.py | ||
| threefry.py | ||
| to_movement_ops.py | ||
| training.py | ||
| transfer_speed.py | ||