tinygrad/examples/mlperf
chenyu ff05bff221
put bert data shard inside jit (#9160)
python time 45ms -> 9ms, it was spending time to schedule the shard

also init bert data on CLANG since it's from numpy, so we don't create the tensor on default device then shard into GPUS
2025-02-18 10:36:54 -05:00
..
scripts UNet3D MLPerf (#3470) 2024-09-10 04:37:28 -04:00
training_submission_v4.0/tinycorp copy mlperf 4.0 to mlperf 4.1 (#5614) 2024-07-20 16:12:00 -04:00
training_submission_v4.1/tinycorp update mlperf systems and copy 4.1 to 5.0 (#7004) 2024-10-11 16:20:34 -04:00
training_submission_v5.0/tinycorp free_intermediates in bert (#9040) 2025-02-12 10:00:39 -05:00
dataloader.py put bert data shard inside jit (#9160) 2025-02-18 10:36:54 -05:00
helpers.py put bert data shard inside jit (#9160) 2025-02-18 10:36:54 -05:00
initializers.py Tuple -> tuple, List -> list [pr] (#8936) 2025-02-06 14:21:19 -05:00
losses.py [MLPerf][UNet3D] Add DICE loss + metrics (#4204) 2024-04-17 20:09:33 -04:00
lr_schedulers.py fp16 resnet (without expand backwards sum in float, doesn't work) (#3816) 2024-03-28 01:25:37 -04:00
metrics.py [MLPerf][UNet3D] Add DICE loss + metrics (#4204) 2024-04-17 20:09:33 -04:00
model_eval.py [MLPerf] Prepare openimages dataset script (#6747) 2024-09-27 11:13:56 -04:00
model_spec.py move globalcounters to ops (#2960) 2024-01-01 14:21:02 -08:00
model_train.py put bert data shard inside jit (#9160) 2025-02-18 10:36:54 -05:00
README start on mlperf models 2023-05-10 16:30:49 -07:00

Each model should be a clean single file.
They are imported from the top level `models` directory

It should be capable of loading weights from the reference imp.

We will focus on these 5 models:

# Resnet50-v1.5 (classic) -- 8.2 GOPS/input
# Retinanet
# 3D UNET (upconvs)
# RNNT
# BERT-large (transformer)

They are used in both the training and inference benchmark:
https://mlcommons.org/en/training-normal-21/
https://mlcommons.org/en/inference-edge-30/
And we will submit to both.

NOTE: we are Edge since we don't have ECC RAM