mirror of https://github.com/huggingface/open-r1.git synced 2026-06-24 01:54:06 +00:00

History

Lewis Tunstall 3bcc4fc86e Add codeforces		2025-05-28 19:21:15 +00:00
..
accelerate_configs	Add OlympicCoder recipes (#505 )	2025-03-13 19:08:34 +01:00
dataset_filtering	Add dataset filtering script (#637 )	2025-05-16 10:26:49 +02:00
DeepSeek-R1-Distill-Qwen-1.5B/grpo	Bump TRL and vLLM (#595 )	2025-04-11 16:32:33 +02:00
OlympicCoder-7B/sft	Bump TRL and vLLM (#595 )	2025-04-11 16:32:33 +02:00
OlympicCoder-32B/sft	Bump TRL and vLLM (#595 )	2025-04-11 16:32:33 +02:00
OpenR1-Distill-7B/sft	Align EOS token ID between tokenizer and generation config (#663 )	2025-05-27 17:20:13 +02:00
Qwen2.5-1.5B-Instruct/grpo	Align EOS token ID between tokenizer and generation config (#663 )	2025-05-27 17:20:13 +02:00
Qwen2.5-Coder-7B-Instruct/grpo	GRPO with codeforces problems (#627 )	2025-05-25 11:55:27 +02:00
R1-Zero-Qwen-Math-7B-Code/grpo	Add codeforces	2025-05-28 19:21:15 +00:00
R1-Zero-Qwen-Math-7B-Math/grpo	Fix benchmarks!	2025-05-27 20:44:35 +00:00
README.md	Add OpenR1-Distill recipe (#661 )	2025-05-26 17:57:44 +02:00

README.md

Post-training recipes

OpenR1 Distill 7B

To train the OpenR1 Distill 7B model, run:

sbatch --nodes=1 slurm/train.slurm --model OpenR1-Distill-7B --task sft --config distill --accelerator zero3

OlympicCoder

To train the OlympicCoder models, run:

# 7B
sbatch --nodes=1 slurm/train.slurm --model OlympicCoder-7B --task sft --config v00.00 --accelerator zero3

# 32B
sbatch --nodes=16 slurm/train.slurm --model OlympicCoder-32B --task sft --config v00.00 --accelerator fsdp

Note that we found it necessary to switch to FSDP1 and paged AdamW 8-bit for the 32B model in order to fit the largest possible context size.