mirror of
https://github.com/tinygrad/tinygrad.git
synced 2026-06-24 02:14:17 +00:00
* viz bytepack format Training a 1B llama yields ~20M profiler events. With JSON serialization, the browser tries to load 6GB to memory. This OOMs since each tab is limited to <3-4GB memory usage. Using a packed format, we only need ~600MB. **Design decisions:** - Timestamps are in microseconds relative to start time. They're stored in u32, which can express up to ~1 hr of trace events. - Strings (kernel names, metadata, etc) are deduped. - Buffer sizes are in u64 nbytes. More optimization possible: - The string lookup is a JSON dumped array, we can compress this. - Can store less for memory by moving the layout to client. **Results** | | Events | JSON | bytepack | |----------------|---------|-------------|-------------| | DP=8 llama 1B train (`command: [1]`) | 24M | 5.8GB | 640MB | | examples/beautiful_mnist.py | 16K | 3.7MB | 745KB | | examples/gpt2.py | 55K | 12.54MB | 1.40MB | `[1]`: `VIZ=1 FAKEDATA=1 OFFLOAD_OPTIM=1 DP=8 BS=8 GRADIENT_ACC_STEPS=2 BLOCK_REORDER=0 LR=3e-4 TRAIN_ON_VAL=1 DEFAULT_FLOAT=bfloat16 OPTIM_DTYPE=bfloat16 LLAMA3_SIZE=1B WARMUP_STEPS=36 DECAY_STEPS=360 SEQLEN=8192 PYTHONPATH=. AMD=1 AMD_LLVM=0 MODEL=llama3 python3 examples/mlperf/model_train.py` * python reference decoder * 27 bytes / event, 1hr hard limit |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| test_allreduce.py | ||
| test_attention.py | ||
| test_block_reorder.py | ||
| test_conv.py | ||
| test_device.py | ||
| test_disk_cache.py | ||
| test_disk_tensor.py | ||
| test_dtype.py | ||
| test_dtype_spec.py | ||
| test_elf.py | ||
| test_gguf.py | ||
| test_gradient.py | ||
| test_graph_rewrite.py | ||
| test_hashing.py | ||
| test_helpers.py | ||
| test_indexing.py | ||
| test_kernelize.py | ||
| test_linalg.py | ||
| test_linearizer_rewrite.py | ||
| test_llm_tokenizer.py | ||
| test_masked_st.py | ||
| test_microbenchmarks.py | ||
| test_mnist_dataset.py | ||
| test_pattern_matcher.py | ||
| test_qcom.py | ||
| test_rearrange_einops.py | ||
| test_rewrite_map.py | ||
| test_rewrite_not_ready.py | ||
| test_rewrite_tracked_childen.py | ||
| test_search.py | ||
| test_shapetracker.py | ||
| test_shapetracker_math.py | ||
| test_shm_tensor.py | ||
| test_simple_schedule.py | ||
| test_simplify_valid_idx.py | ||
| test_symbolic_failures.py | ||
| test_symbolic_shapetracker.py | ||
| test_tar.py | ||
| test_tensor_io.py | ||
| test_tensor_uop_representation.py | ||
| test_tqdm.py | ||
| test_transcendental_helpers.py | ||
| test_uop_resolve.py | ||
| test_uop_spec.py | ||
| test_uop_symbolic.py | ||
| test_uop_vmin_vmax.py | ||
| test_upat_compile.py | ||
| test_view.py | ||
| test_viz.py | ||