tinygrad/examples
Francis Lata b7ce9a1530
UNet3D MLPerf (#3470)
* add training set transforms

* add DICE cross entropy loss

* convert pred and label to Tensor when calculating DICE score

* cleanups and allow train dataset batching

* fix DICE CE loss calculation

* jitted training step

* clean up DICE CE loss calculation

* initial support for sharding

* Revert "initial support for sharding"

This reverts commit e3670813b8.

* minor updates

* cleanup imports

* add support for sharding

* apply temp patch to try to avoid OOM

* revert cstyle changes

* add gradient acc

* hotfix

* add FP16 support

* add ability to train on smaller image sizes

* add support for saving and loading checkpoints + cleanup some various modes

* fix issue with using smaller patch size + update W&B logging

* disable LR_WARMUP_EPOCHS

* updates

* minor cleanups

* cleanup

* update order of transformations

* more cleanups

* realize loss

* cleanup

* more cleanup

* some cleanups

* add RAM usage

* minor cleanups

* add support for gradient accumulation

* cleanup imports

* minor updates to not use GA_STEPS

* remove FP16 option since it's available now globally

* update multi-GPU setup

* add timing logs for training loop

* go back to using existing dataloader and add ability to preprocess data to save time

* clean up optimization and re-enable JIT and multi-GPU support for training and evaluation

* free train and eval steps memory

* cleanups and scale batch size based on the number of GPUs

* fix GlobalCounters import

* fix seed

* fix W&B setup

* update batch size default size

* add back metric divergence check

* put back JIT on UNet3d eval

* move dataset preprocessing inside training code

* add test for dice_loss

* add config logging support to W&B and other cleanups

* change how default float is getting retrieved

* remove TinyJit import duplicate

* update config logging to W&B and remove JIT on eval_step

* no need for caching preprocessed data anymore

* fix how evaluation is ran and how often

* add support for LR scaling

* fix issue with gaussian being moved to scipy.signal.windows

* remove DICE loss unit test

* fix issue where loss isn't compatible with multiGPU

* add individual BEAM control for train and eval steps

* fix ndimage scipy import

* add BENCHMARK

* cleanups on BENCHMARK + fix on rand_flip augmentation during training

* cleanup train and eval BEAM envs

* add checkpointing support after every eval

* cleanup model_eval

* disable grad during eval

* use new preprocessing dataset mechanism

* remove unused import

* use training and inference_mode contexts

* start eval after benchmarking

* add data fetching time

* cleanup decorators

* more cleanups on training script

* add message during benchmarking mode

* realize when reassigning LR on scheduler and update default number of epochs

* add JIT on eval step

* remove JIT on eval_step

* add train dataloader for unet3d

* move checkpointing to be done after every epoch

* revert removal of JIT on unet3d inference

* save checkpoint if metric is not successful

* Revert "add train dataloader for unet3d"

This reverts commit c166d129df.

* Revert "Revert "add train dataloader for unet3d""

This reverts commit 36366c65d2.

* hotfix: seed was defaulting to a value of 0

* fix SEED value

* remove the usage of context managers for setting BEAM and going from training to inference

* support new stack API for calculating eval loss and metric

* Revert "remove the usage of context managers for setting BEAM and going from training to inference"

This reverts commit 2c0ba8d322.

* check training and test preprocessed folders separately

* clean up imports and log FUSE_CONV_BW

* use train and val preprocessing constants

* add kits19 dataset setup script

* update to use the new test decorator for disabling grad

* update kits19 dataset setup script

* add docs on how to train the model

* set default value for BASEDIR

* add detailed instruction about BASEDIR usage

---------

Co-authored-by: chenyu <chenyu@fastmail.com>
2024-09-10 04:37:28 -04:00
..
conversation_data Whisper + LLAMA + VITS (#2332) 2023-12-02 15:03:46 -08:00
llm.c delete seen from the scheduler api [run_process_replay] (#6427) 2024-09-09 16:26:34 +08:00
mlperf UNet3D MLPerf (#3470) 2024-09-10 04:37:28 -04:00
openpilot merge uops with ops (#6111) 2024-08-16 18:17:57 -04:00
other_mnist beautiful_mnist in torch 2024-07-14 11:09:58 -07:00
rl more beautiful_cartpole with exposed hparams 2024-01-07 17:41:09 -08:00
sovits_helpers move dtypes to dtype.py (#2964) 2024-01-01 14:58:48 -08:00
tinychat bring tinychat more inline with tinyos' version (#5358) 2024-07-10 13:13:52 -07:00
vgg7_helpers move to new cached fetch (#2493) 2023-11-28 17:36:55 -08:00
webgl/yolov8 webgl backend in extra (#3041) 2024-01-08 09:29:13 -08:00
webgpu/stable_diffusion s/lazydata.realized/lazydata.base.realized/g (#2914) 2023-12-22 14:45:13 -05:00
__init__.py failing llama test 2023-03-11 16:28:10 -08:00
beautiful_cartpole.py tinytqdm.set_description and tinytrange (#5101) 2024-06-22 14:45:06 -04:00
beautiful_cifar.py CIFAR trainer + various bugfixes / improvements (#6146) 2024-08-20 16:58:46 -07:00
beautiful_mnist.py tensor inference (#6156) 2024-08-18 00:19:28 -07:00
beautiful_mnist_multigpu.py feat: example and extra tweaks (#6310) 2024-08-28 19:26:11 -07:00
coder.py apply the same fix_bf16 in llama and coder (#3789) 2024-03-17 21:25:24 -04:00
compile_efficientnet.py webgl backend in extra (#3041) 2024-01-08 09:29:13 -08:00
compile_tensorflow.py fix various examples (#4691) 2024-05-22 20:43:21 -04:00
conversation.py fix conversation.py quantize (#4663) 2024-05-20 17:36:37 -04:00
efficientnet.py remove clang program header (#4422) 2024-05-04 08:38:01 -07:00
gpt2.py use helpers.JIT in llama and gpt2 examples (#5350) 2024-07-09 15:04:43 -04:00
handcode_opt.py delete seen from the scheduler api [run_process_replay] (#6427) 2024-09-09 16:26:34 +08:00
hlb_cifar10.py add cifar to datasets.py (#6210) 2024-08-20 11:42:49 -07:00
index.html Enable Multi-Output Export (#2179) 2023-10-30 18:42:26 -07:00
llama.py shard kvcache (#5830) 2024-07-30 20:29:54 -07:00
llama3.py faster tinychat (#5993) 2024-08-08 19:16:26 -07:00
mamba.py prev speed improvements (#5252) 2024-07-03 09:06:01 -07:00
mask_rcnn.py change Tensor.stack to method (#4719) 2024-05-24 17:04:19 -04:00
mixtral.py tinytqdm.set_description and tinytrange (#5101) 2024-06-22 14:45:06 -04:00
mnist_gan.py tinytqdm.set_description and tinytrange (#5101) 2024-06-22 14:45:06 -04:00
openelm.py nn.RMSNorm (#5272) 2024-07-02 21:39:01 -04:00
sdv2.py Stable Diffusion v2 Inference (#5283) 2024-07-03 22:47:10 -04:00
sdxl.py sdxl batched inference fixes (#6293) 2024-08-28 07:44:58 -04:00
sdxl_seed0.png revert stable diffusion validation with threefry (#5248) 2024-07-01 14:43:47 -04:00
serious_mnist.py move state to nn/state (#1619) 2023-08-22 07:36:24 -07:00
simple_conv_bn.py fix various examples (#4691) 2024-05-22 20:43:21 -04:00
so_vits_svc.py change Tensor.stack to method (#4719) 2024-05-24 17:04:19 -04:00
stable_diffusion.py no UnaryOps.NEG in generated UOp patterns (#6209) 2024-08-21 11:08:22 -04:00
stable_diffusion_seed0.png revert stable diffusion validation with threefry (#5248) 2024-07-01 14:43:47 -04:00
train_efficientnet.py tinytqdm.set_description and tinytrange (#5101) 2024-06-22 14:45:06 -04:00
train_resnet.py move things, clean up extra (#2292) 2023-11-13 20:18:40 -08:00
transformer.py fix onehot and jit in examples/transformer (#3073) 2024-01-10 02:22:41 -05:00
vgg7.py waifu2x vgg7: testcase, auto-RGBA->RGB, function to grab pretrained models, training "fix" (#2117) 2023-10-19 22:07:15 -07:00
vit.py move to new cached fetch (#2493) 2023-11-28 17:36:55 -08:00
vits.py docs: showcase remove mnist_gan and add conversation.py (#4757) 2024-05-28 11:09:26 -04:00
whisper.py whisper long batch (#6335) 2024-09-09 21:03:59 -04:00
yolov3.py Update yolov3.py (#2680) 2023-12-08 12:59:38 -08:00
yolov8-onnx.py [ready] Replacing os with pathlib (#1708) 2023-08-30 10:41:08 -07:00
yolov8.py fix yolov8 example (#5003) 2024-06-16 20:47:29 -04:00