mirrors/open-r1

mirror of https://github.com/huggingface/open-r1.git synced 2026-06-24 01:54:06 +00:00

Author	SHA1	Message	Date
Lewis Tunstall	f6a07648e2	Bump vLLM and TRL	2025-05-28 06:48:01 +00:00
lewtun	9eef995b4d	Bump deps (#656 )	2025-05-27 15:38:21 +02:00
lewtun	57e85b522f	Add better logging defaults for GRPO (#657 )	2025-05-25 13:24:52 +02:00
Guilherme Penedo	c1e1192294	GRPO with codeforces problems (#627 ) * add * update * updates * updates #2 * weighted_sum and python fixes * bugfix * merging ioi/cf setups * integrating the morph changes * move morph_client * run style * small changes for mixed languages training * revert grpo.py changes * piston readme * local test fetching * bug fixes * updated readme * style fixes * style fixes 2 * deps changes * import sorting * fix tests * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update src/open_r1/rewards.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2025-05-25 11:55:27 +02:00
lewtun	db2d9b011a	Bump lower bound on liger-kernel (#654 ) Related to https://github.com/huggingface/open-r1/pull/653 (I forgot to include this in that PR)	2025-05-22 08:44:13 +02:00
lewtun	8067149e90	Bump DeepSpeed to 0.16.8 to fix OOM on Qwen3 (#653 )	2025-05-21 22:25:57 +02:00
lewtun	ebd5913a85	Bump LightEval (#643 )	2025-05-16 10:52:05 +02:00
lewtun	c802f00512	Use pass@1 for all evals (#633 ) * Use pass@1 for all evals * Update scores	2025-05-09 17:42:36 +02:00
Andrei	af81114044	Code Execution using Morph Cloud (#614 ) * initial commit for morphcloud sandbox support * initial * fixed prints in morph client for ioi * updated import * context manager * removed unnecessary comments * more intelligent instance/snapshot management * update * Add documentation for Morph integration * Delete MORPH_INTEGRATION.md * added retry and modularity to morph client * updates to kwargs and setup.py * Update setup.py * added languages codepath + fixed slurm + added m orph tests * make quality formatting fixes * conditional imports for morph --------- Co-authored-by: arb8020 <arbeightytwenty@gmail.com>	2025-05-08 08:59:54 +02:00
lewtun	52520a6713	Fix style (#631 ) * Fix style * Fix * Add jieba	2025-05-05 15:49:10 +02:00
lewtun	75c3999180	Bump LightEval to enable DP>1 (#629 ) * Bump LightEval to enable DP>1 * Remove redundant arg * Update eval scores * Fix slurm	2025-04-30 22:02:20 +02:00
lewtun	50590a41b9	Enable data and tensor parallelism for GRPO (#626 ) * Bump deps * Fix SLurm * Fix	2025-04-26 11:50:08 +02:00
lewtun	5112bfc401	Fix SFT for base models (#604 ) * Fix pad token bug in SFT * Add ChatML default * Clean up * Refactor grpo model load * Add doc * Bump deepspeed	2025-04-16 11:45:50 +02:00
lewtun	04dbf21989	Bump TRL and vLLM (#595 ) * Bump TRL and vLLM * Fix style * Bump liger * Add liger	2025-04-11 16:32:33 +02:00
lewtun	bf08f56849	[WIP] Bump lighteval with proper pass@1 (#584 ) * Bump lighteval with proper pass@1 * Bump lighteval * Update AIME24	2025-04-08 20:53:34 +02:00
Edward Beeching	9915e06f1e	Async code reward fixes (#546 ) * expose num parallel code executions * add e2b benchmarking script * adds new parallel code execution with better execption handling * style * update default * increase sandbox timeout * Add pretty table and Sandbox IDs * Add Sandbox ID * fix merge --------- Co-authored-by: Lewis Tunstall <lewis.c.tunstall@gmail.com>	2025-03-28 14:08:15 +01:00
lewtun	8000dd2384	[WIP] RL goes brrr (#533 ) * Fix vLLM recipes * Add vllm server to Slurm * Add overlap across srun * Fix NUM_NODES * Refactor TP to script * fix train script to work withnew GRPO * lewis nits * bump trl, transformers --------- Co-authored-by: edbeeching <edbeeching@gmail.com>	2025-03-24 15:15:02 +01:00
Edward Beeching	8782fa6e90	bump lighteval, expose the lcb_v4 benchmark (#441 )	2025-02-26 17:59:44 +01:00
Edward Beeching	a20666d5b5	Bumps TRL (#437 )	2025-02-26 10:35:50 +01:00
lewtun	3f9d75a595	Bump Liger kernel (#399 ) Needed to enable SFT training via https://github.com/huggingface/trl/pull/2874	2025-02-23 17:44:03 +01:00
lewtun	49d9b741a5	Pin dependencies (#393 )	2025-02-22 14:46:09 +01:00
lewtun	9fb45bede6	Fix LightEval commands and dependencies (#386 ) * Fix lighteval cmd * Fix typo * Pin lighteval * Hacks to the max * Fix slurm * Fix * Pin lighteval * Pin l --------- Co-authored-by: lewis@huggingface.co <lewis@ip-26-0-160-242.ec2.internal>	2025-02-21 14:52:45 +01:00
lewtun	d76ecc12a2	Add E2B code interpreter reward function (#364 ) * Add stuff * Make it kind of work * Add more stuff * Add fix for parse * Fix * Refactor * Clean up * Fix config * Fix sys * Add SFT config * Use min rate * Fix eval * Add base model * Add s1k * Disable eval * Fix * Add import checker * Fix importer * Fix * Tune config * Tune * Fix * Fix save * Tuen beta * Remove configs * Fix vLLM * Fix * Add note * Add doc * doc * Fix * Tune lr * Add command	2025-02-19 11:26:46 +01:00
Edward Beeching	7041fbc9d6	Update setup.py (#315 ) adds peft as a temp dep due to https://github.com/huggingface/trl/issues/2849	2025-02-13 15:04:03 +01:00
Almaz Zinollayev	517adddae3	[Testing Github workflow] Updating workflows and makefile (#214 ) * [Testing Github workflow] Updating workflows and makefile * [Testing Github workflow] - Refactoring workflow, fixing tests erorr, easier debugging * [Testing Github workflow] Converting docstring into raw string * [Testing Github workflow] - Fixing test_zero_max_penalty_returns_zero() test * [Testing Github workflow] Removing redundant test	2025-02-10 18:28:35 +01:00
lewtun	3519a7fa3d	Remove duplicate math-verify (#234 )	2025-02-07 20:01:54 +01:00
lewtun	0da0f7cce2	Refactor training configs and unify Slurm for training SFT & GRPO (#231 ) * Refactor Slurm * Fix * FML * Nuke * Clean * Fix config * Fix deps * Fix logging	2025-02-07 15:56:43 +01:00
Kashif Rasul	250ab46ea1	[GRPO] add cosine reward (#206 ) * add cosine reward * fix merge * fix typo * fix check	2025-02-07 08:10:48 +01:00
lewtun	cec57f3a55	Add GPQA Diamond and fix evaluation deps (#196 ) * Add GPQA Diamond * Add table * Fix README * Up * Fixes * Ignore logs * Fix * Pin deps * Fix GRPO * Add Llama 70B tabels * Restore dp * Pin lighteval * Use bfloat16 * Tune table * Add note	2025-02-06 15:24:52 +01:00
Lewis	138df0ca44	chore(setup.py): bump vllm>=0.7.1 (#181 ) See https://github.com/huggingface/trl/pull/2766.	2025-02-05 09:53:31 +01:00
Kashif Rasul	a0d61ccece	use ruff (#137 ) * use ruff * reformat * re-run * update deps * undo * Update src/open_r1/configs.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/open_r1/configs.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/open_r1/configs.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/open_r1/configs.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * fix help strings * fix ruff version * fix formatting --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-01-31 13:36:08 +01:00
Kashif Rasul	c0b53fae29	Grpo slurm scripts (#112 ) * initial grpo.slurm script * initial zero3 yaml using 1 less gpu * add completion and promp length * initial doc * use main * fix typo * remove num_processes * use vllm 0.7.0 * remove double module load * update math-verify * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * overwrite num_procs in the slurm script * add vllm args to readme * update readme --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2025-01-30 10:22:45 +01:00
Hynek Kydlíček	e2235cf978	Improve repoduction of r1 reported score (#92 ) * bump up deps, fix aime24 evals, make grpo more strict * minor fixes * 🤨 fmt * bump lighteval + set boxed to match first * remove dead code * bump lighteval * add ed's tp branch swtich --------- Co-authored-by: Hynek Kydlicek <kydlicek.hynek@huggingface.co>	2025-01-29 11:29:05 +01:00
Hynek Kydlíček	90b0947382	Reward verification and evaluation fixes (#55 ) * bump up deps, fix aime24 evals, make grpo more strict * minor fixes * 🤨 fmt * Update src/open_r1/grpo.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> --------- Co-authored-by: Hynek Kydlicek <kydlicek.hynek@huggingface.co> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-01-26 18:35:48 +01:00
lewtun	2580fd8c1b	Fix Slurm SFT and gather Slurm scripts (#19 ) * Fix slurm * Fix generate * Fix install * Fix c	2025-01-25 13:47:52 +01:00
Quentin Gallouédec	742cc008b2	Pin main for transformers and trl	2025-01-25 11:07:17 +01:00
Agus	33795e1b5a	Add math-verify to check accuracy of completions on GRPO (#14 ) * Add math-verify to check accuracy of completions on GRPO * Handle make_conversation * Update src/open_r1/grpo.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/open_r1/grpo.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/open_r1/grpo.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * fix quality * Remove unnecesary item access in parsed answer --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-01-25 11:03:58 +01:00
Gabriel Martín Blázquez	02bed5308c	Add synthetic data generation script (#9 ) * Add synthetic data generation script Co-authored-by: Anton <anton-l@users.noreply.github.com> Co-authored-by: Agustin <plaguss@users.noreply.github.com> * Fix format * Fix imports sorting --------- Co-authored-by: Anton <anton-l@users.noreply.github.com> Co-authored-by: Agustin <plaguss@users.noreply.github.com>	2025-01-25 01:42:24 +01:00
lewtun	26184f71ae	Refactor evaluation (#6 )	2025-01-24 23:46:34 +01:00
Edward Beeching	9c398973e8	Adds Math-500 and AIME24 evals (#4 ) * adds evals * up max model len --------- Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2025-01-24 23:09:07 +01:00
lewtun	6acc9a0aa0	Add configs and stuff (#2 )	2025-01-24 20:05:18 +01:00
Quentin Gallouédec	a4bf90465f	Update setup.py (#1 )	2025-01-24 19:13:04 +01:00
Lewis Tunstall	2ff66e6cde	Add skeleton	2025-01-24 16:50:13 +00:00

43 commits