mirrors/open-r1

mirror of https://github.com/huggingface/open-r1.git synced 2026-06-24 01:54:06 +00:00

Author	SHA1	Message	Date
Lewis Tunstall	3bcc4fc86e	Add codeforces	2025-05-28 19:21:15 +00:00
lewtun	a6b4f668fb	Fix Weka refresh (#666 ) * Fix Weka refresh * Update evaluate.slurm	2025-05-28 13:45:48 +02:00
lewtun	722f144d21	Refresh Weka on Slurm (#662 ) * Refresh Weka on Slurm * Include current working dir	2025-05-27 19:21:15 +02:00
Guilherme Penedo	c1e1192294	GRPO with codeforces problems (#627 ) * add * update * updates * updates #2 * weighted_sum and python fixes * bugfix * merging ioi/cf setups * integrating the morph changes * move morph_client * run style * small changes for mixed languages training * revert grpo.py changes * piston readme * local test fetching * bug fixes * updated readme * style fixes * style fixes 2 * deps changes * import sorting * fix tests * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update src/open_r1/rewards.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2025-05-25 11:55:27 +02:00
Edward Beeching	ea5b7edf22	Add dataset filtering script (#637 ) * add dataset filtering script * remove subset selection * save wip * save wip * update filter script * refactor to run on chunks * rename script * cleanup * update dapo filtering * fixes * dapo filt config * udpate compute pass rate * clean * update readme and config * add merging snippet	2025-05-16 10:26:49 +02:00
lewtun	4fc2a3ff82	Add time to Slurm (#639 )	2025-05-09 19:19:51 +02:00
Andrei	af81114044	Code Execution using Morph Cloud (#614 ) * initial commit for morphcloud sandbox support * initial * fixed prints in morph client for ioi * updated import * context manager * removed unnecessary comments * more intelligent instance/snapshot management * update * Add documentation for Morph integration * Delete MORPH_INTEGRATION.md * added retry and modularity to morph client * updates to kwargs and setup.py * Update setup.py * added languages codepath + fixed slurm + added m orph tests * make quality formatting fixes * conditional imports for morph --------- Co-authored-by: arb8020 <arbeightytwenty@gmail.com>	2025-05-08 08:59:54 +02:00
lewtun	75c3999180	Bump LightEval to enable DP>1 (#629 ) * Bump LightEval to enable DP>1 * Remove redundant arg * Update eval scores * Fix slurm	2025-04-30 22:02:20 +02:00
lewtun	50590a41b9	Enable data and tensor parallelism for GRPO (#626 ) * Bump deps * Fix SLurm * Fix	2025-04-26 11:50:08 +02:00
Edward Beeching	715c8787fb	add back grad accumulations steps (#612 )	2025-04-17 16:41:39 +02:00
Edward Beeching	3a0e89678c	Fix eval system prompt (#591 ) * fix eval system prompt * style	2025-04-11 11:23:06 +02:00
lewtun	bf08f56849	[WIP] Bump lighteval with proper pass@1 (#584 ) * Bump lighteval with proper pass@1 * Bump lighteval * Update AIME24	2025-04-08 20:53:34 +02:00
Edward Beeching	1b3bf043dc	Adds a E2B router server that executes batches of scripts (#561 ) * adds a dedicated e2b server to handle batches of requests * fix reward tests * update slow reward * style * updates e2b router to be more generic * refactor * refactoring * licence, cleanup * update tests * style * fix import when e2b not present * style * rename sandbox file * rename to RoutedSandbox * update readme * nits * nits2 * unlimited max time * update logs path	2025-04-07 21:01:06 +02:00
lewtun	8000dd2384	[WIP] RL goes brrr (#533 ) * Fix vLLM recipes * Add vllm server to Slurm * Add overlap across srun * Fix NUM_NODES * Refactor TP to script * fix train script to work withnew GRPO * lewis nits * bump trl, transformers --------- Co-authored-by: edbeeching <edbeeching@gmail.com>	2025-03-24 15:15:02 +01:00
Guilherme Penedo	7835979801	adds support for running GRPO on IOI problems (#495 ) * adds support for running GRPO on IOI problems * nit * bugfixes + recipe * added piston info and readme changes * readme updates * run isort to fix checks * Update src/open_r1/rewards.py Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com> * adding ioi test * fix merge issues with python slow tests * style * generalize piston workers * generalize readme * fix extract code * finalize slow tests --------- Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com> Co-authored-by: edbeeching <edbeeching@gmail.com>	2025-03-21 08:48:00 +01:00
lewtun	44cb13d4ba	Fix vLLM (#464 )	2025-03-03 17:25:30 +01:00
lewtun	eeca246b07	Update prompt template and sampling parameters for evaluation (#392 ) * Pin t * Pin t * Set top p * C * Tune math prompt * Improve math prompt * Update tables	2025-02-22 15:21:01 +01:00
lewtun	9fb45bede6	Fix LightEval commands and dependencies (#386 ) * Fix lighteval cmd * Fix typo * Pin lighteval * Hacks to the max * Fix slurm * Fix * Pin lighteval * Pin l --------- Co-authored-by: lewis@huggingface.co <lewis@ip-26-0-160-242.ec2.internal>	2025-02-21 14:52:45 +01:00
Edward Beeching	80e7e7b23c	move details script and fix wandb logging (#314 )	2025-02-13 11:13:00 +01:00
Anton Lozhkov	440ae0b24e	Add the actual async generation script (#273 ) * sglang inference server * add vllm * readme * add a generation script * ruff	2025-02-10 16:52:23 +01:00
Anton Lozhkov	baec330ef5	Add SGLang inference scripts (#268 ) * sglang inference server * add vllm * readme	2025-02-10 14:37:58 +01:00
Edward Beeching	cabf27560b	hardcodes num_processes to 7 when using vllm (#264 ) * hardcodes num_processes to 7 when using vllm * nits	2025-02-10 11:43:16 +01:00
lewtun	9be2e9a859	Add retry mechanism for pushing eval results (#252 ) The Hub throws 403 errors if there are too many concurrent pushes to the same repo, so we need a retry mechanism when that happens.	2025-02-09 09:44:35 +01:00
lewtun	0da0f7cce2	Refactor training configs and unify Slurm for training SFT & GRPO (#231 ) * Refactor Slurm * Fix * FML * Nuke * Clean * Fix config * Fix deps * Fix logging	2025-02-07 15:56:43 +01:00
lewtun	a60b175aeb	Update CUDA (#209 ) * Update CUDA * Fix * Remove module * Restore CUDA * Move cuda import	2025-02-06 16:31:13 +01:00
lewtun	3fbdeac96c	Fix slurm eval (#208 )	2025-02-06 15:46:33 +01:00
lewtun	cec57f3a55	Add GPQA Diamond and fix evaluation deps (#196 ) * Add GPQA Diamond * Add table * Fix README * Up * Fixes * Ignore logs * Fix * Pin deps * Fix GRPO * Add Llama 70B tabels * Restore dp * Pin lighteval * Use bfloat16 * Tune table * Add note	2025-02-06 15:24:52 +01:00
Edward Beeching	3fd56dc7b4	fix uv env path + details (#188 ) * fix uv env path + details * Update slurm/grpo.slurm --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2025-02-05 23:59:25 +01:00
Jingze Shi	e450a6fbc4	Recipes for optimzing training scripts (#120 ) * Add recipe configs to optimize scripts (#73) * remove small models * Add README for recipes * Add README for recipes * Attempt to resolve conflicts * Optimize src scripts * Update recipe of DeepSeek-R1-Distill-Qwen-7B * Update recipe of Qwen2.5-1.5B * Updated recipe readme for qwen * Update training command for recipes * Update README.md Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> * Update preprocessing_num_workers from 36 to 8 * Add small language model recipes for quickly verify R1 * Fix src code quality * Add back the Slurm job command * Remove recipe of doge * Fix torch_dtype is not used * fix grpo yaml * fix grpo yaml * fix deprecation warning * fix config folder location * Remove duplicate variables in grpo.py * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update recipes/qwen/Qwen2.5-1.5B-Instruct/grpo/confg_full.yaml Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> --------- Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2025-01-31 12:41:53 +01:00
Kashif Rasul	c0b53fae29	Grpo slurm scripts (#112 ) * initial grpo.slurm script * initial zero3 yaml using 1 less gpu * add completion and promp length * initial doc * use main * fix typo * remove num_processes * use vllm 0.7.0 * remove double module load * update math-verify * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * overwrite num_procs in the slurm script * add vllm args to readme * update readme --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2025-01-30 10:22:45 +01:00
Edward Beeching	972e47eff0	Adds auto eval callbacks (#115 ) * adds auto eval callbacks * updates training scripts with callbacks * style * date * update gitignore with logs, eval results, etc * remove unused imports * nits	2025-01-30 09:39:47 +01:00
Hynek Kydlíček	e2235cf978	Improve repoduction of r1 reported score (#92 ) * bump up deps, fix aime24 evals, make grpo more strict * minor fixes * 🤨 fmt * bump lighteval + set boxed to match first * remove dead code * bump lighteval * add ed's tp branch swtich --------- Co-authored-by: Hynek Kydlicek <kydlicek.hynek@huggingface.co>	2025-01-29 11:29:05 +01:00
Gabriel Martín Blázquez	c5941ed5e4	Add `--timeout`, `--retries`, `--prompt-template` and TP and PP by slurm variables (#94 ) * Set TP and PP using slurm variables * Add `--timeout` argument * add `--prompt-template` argument * Group generations * Add `--retries` argument	2025-01-28 18:30:04 +01:00
Gabriel Martín Blázquez	b03480d868	Add `--input-batch-size`, `--client-replicas` args and download Ray logs (#71 ) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-01-27 14:58:13 +01:00
Edward Beeching	feb59d2b42	Update evaluate.slurm typo in eval slurm	2025-01-27 09:43:02 +01:00
Hynek Kydlíček	90b0947382	Reward verification and evaluation fixes (#55 ) * bump up deps, fix aime24 evals, make grpo more strict * minor fixes * 🤨 fmt * Update src/open_r1/grpo.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> --------- Co-authored-by: Hynek Kydlicek <kydlicek.hynek@huggingface.co> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-01-26 18:35:48 +01:00
Anton Lozhkov	15df4fb134	vllm speed tweaks (#43 )	2025-01-26 01:59:50 +01:00
elie	64c0ed2254	fix evaluate.slurm (#27 )	2025-01-25 15:28:49 +01:00
elie	f169d2cd8e	add evaluate.slurm (#26 )	2025-01-25 15:23:16 +01:00
Gabriel Martín Blázquez	a90b99686a	Fix passing `vLLM` server URL (#21 ) * Use head node ip as vLLM server url * Pass correct server url * Add num_generations argument * Fix style * Remove `select` --------- Co-authored-by: plaguss <agustin@argilla.io>	2025-01-25 15:01:15 +01:00
elie	43cb6a0e0f	fix sft.slurm	2025-01-25 14:48:45 +01:00
lewtun	2580fd8c1b	Fix Slurm SFT and gather Slurm scripts (#19 ) * Fix slurm * Fix generate * Fix install * Fix c	2025-01-25 13:47:52 +01:00

42 commits