mirrors/open-r1

mirror of https://github.com/huggingface/open-r1.git synced 2026-06-24 01:54:06 +00:00

Author	SHA1	Message	Date
Lewis Tunstall	97b1c22e55	Merge branch 'bump-deps-0' into zero-math-code	2025-05-28 10:11:06 +02:00
Lewis Tunstall	f6a07648e2	Bump vLLM and TRL	2025-05-28 06:48:01 +00:00
lewtun	33f84def0d	Align EOS token ID between tokenizer and generation config (#663 ) * Align EOS token ID between tokenizer and generation config * Fix	2025-05-27 17:20:13 +02:00
Lewis Tunstall	82fb385fa5	Refine tests	2025-05-27 13:39:00 +00:00
lewtun	5ac5971ea5	Add OpenR1-Distill recipe (#661 )	2025-05-26 17:57:44 +02:00
Guilherme Penedo	c1e1192294	GRPO with codeforces problems (#627 ) * add * update * updates * updates #2 * weighted_sum and python fixes * bugfix * merging ioi/cf setups * integrating the morph changes * move morph_client * run style * small changes for mixed languages training * revert grpo.py changes * piston readme * local test fetching * bug fixes * updated readme * style fixes * style fixes 2 * deps changes * import sorting * fix tests * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update src/open_r1/rewards.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2025-05-25 11:55:27 +02:00
lewtun	9366aa2df3	Add dataset mixer (#647 ) * Prototype * Clean up * Refactor * Add tests * Add doc and make scripts work * Tune doc * Up * Tune * Add column verification * Fix types * Fix YAML * Fix types * Fix doc * f * f	2025-05-20 11:40:42 +02:00
Quentin Gallouédec	5e0c210f9c	use hf papers (#646 )	2025-05-19 13:48:14 +02:00
Edward Beeching	ea5b7edf22	Add dataset filtering script (#637 ) * add dataset filtering script * remove subset selection * save wip * save wip * update filter script * refactor to run on chunks * rename script * cleanup * update dapo filtering * fixes * dapo filt config * udpate compute pass rate * clean * update readme and config * add merging snippet	2025-05-16 10:26:49 +02:00
lewtun	c802f00512	Use pass@1 for all evals (#633 ) * Use pass@1 for all evals * Update scores	2025-05-09 17:42:36 +02:00
Andrei	af81114044	Code Execution using Morph Cloud (#614 ) * initial commit for morphcloud sandbox support * initial * fixed prints in morph client for ioi * updated import * context manager * removed unnecessary comments * more intelligent instance/snapshot management * update * Add documentation for Morph integration * Delete MORPH_INTEGRATION.md * added retry and modularity to morph client * updates to kwargs and setup.py * Update setup.py * added languages codepath + fixed slurm + added m orph tests * make quality formatting fixes * conditional imports for morph --------- Co-authored-by: arb8020 <arbeightytwenty@gmail.com>	2025-05-08 08:59:54 +02:00
lewtun	9373ad3055	Update README.md	2025-04-30 22:16:18 +02:00
lewtun	75c3999180	Bump LightEval to enable DP>1 (#629 ) * Bump LightEval to enable DP>1 * Remove redundant arg * Update eval scores * Fix slurm	2025-04-30 22:02:20 +02:00
lewtun	50590a41b9	Enable data and tensor parallelism for GRPO (#626 ) * Bump deps * Fix SLurm * Fix	2025-04-26 11:50:08 +02:00
lewtun	5112bfc401	Fix SFT for base models (#604 ) * Fix pad token bug in SFT * Add ChatML default * Clean up * Refactor grpo model load * Add doc * Bump deepspeed	2025-04-16 11:45:50 +02:00
lewtun	04dbf21989	Bump TRL and vLLM (#595 ) * Bump TRL and vLLM * Fix style * Bump liger * Add liger	2025-04-11 16:32:33 +02:00
Shenghang Tsai	2a7bb45f05	Update README.md (#590 )	2025-04-10 13:11:35 +02:00
lewtun	bf08f56849	[WIP] Bump lighteval with proper pass@1 (#584 ) * Bump lighteval with proper pass@1 * Bump lighteval * Update AIME24	2025-04-08 20:53:34 +02:00
Edward Beeching	1b3bf043dc	Adds a E2B router server that executes batches of scripts (#561 ) * adds a dedicated e2b server to handle batches of requests * fix reward tests * update slow reward * style * updates e2b router to be more generic * refactor * refactoring * licence, cleanup * update tests * style * fix import when e2b not present * style * rename sandbox file * rename to RoutedSandbox * update readme * nits * nits2 * unlimited max time * update logs path	2025-04-07 21:01:06 +02:00
lewtun	4ec555b0c8	Restore single-node instructions to run GRPO (#549 )	2025-03-27 10:29:07 +01:00
lewtun	8000dd2384	[WIP] RL goes brrr (#533 ) * Fix vLLM recipes * Add vllm server to Slurm * Add overlap across srun * Fix NUM_NODES * Refactor TP to script * fix train script to work withnew GRPO * lewis nits * bump trl, transformers --------- Co-authored-by: edbeeching <edbeeching@gmail.com>	2025-03-24 15:15:02 +01:00
Guilherme Penedo	7835979801	adds support for running GRPO on IOI problems (#495 ) * adds support for running GRPO on IOI problems * nit * bugfixes + recipe * added piston info and readme changes * readme updates * run isort to fix checks * Update src/open_r1/rewards.py Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com> * adding ioi test * fix merge issues with python slow tests * style * generalize piston workers * generalize readme * fix extract code * finalize slow tests --------- Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com> Co-authored-by: edbeeching <edbeeching@gmail.com>	2025-03-21 08:48:00 +01:00
koskotheim	d436b7b9c0	fix typo (#507 )	2025-03-15 20:56:14 +01:00
lewtun	d5922af8ce	Add OlympicCoder recipes (#505 ) * Add OlympicCoder recipes * Fix configs * Add FSDP config	2025-03-13 19:08:34 +01:00
lewtun	3b5d6603bf	Add citation and acknowledgements (#481 ) * Update README.md * Update README.md * Update README.md	2025-03-05 20:23:57 +01:00
lewtun	44cb13d4ba	Fix vLLM (#464 )	2025-03-03 17:25:30 +01:00
Marco Z	c7733d3fa4	update makefile and readme (#449 ) Co-authored-by: Marco Zocca <marco.zocca@unfoldml.com>	2025-03-01 15:08:30 +01:00
Agus	7188001281	Add script to decontaminate datasets against benchmark datasets (#416 ) * Add script to decontaminate datasets against benchmark datasets * Add docs for the decontamination script * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update scripts/decontaminate.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update scripts/decontaminate.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update scripts/decontaminate.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update scripts/decontaminate.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update scripts/decontaminate.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Add license header and attribution to the authors --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2025-02-24 19:54:44 +01:00
lewtun	eeca246b07	Update prompt template and sampling parameters for evaluation (#392 ) * Pin t * Pin t * Set top p * C * Tune math prompt * Improve math prompt * Update tables	2025-02-22 15:21:01 +01:00
lewtun	9fb45bede6	Fix LightEval commands and dependencies (#386 ) * Fix lighteval cmd * Fix typo * Pin lighteval * Hacks to the max * Fix slurm * Fix * Pin lighteval * Pin l --------- Co-authored-by: lewis@huggingface.co <lewis@ip-26-0-160-242.ec2.internal>	2025-02-21 14:52:45 +01:00
lewtun	d76ecc12a2	Add E2B code interpreter reward function (#364 ) * Add stuff * Make it kind of work * Add more stuff * Add fix for parse * Fix * Refactor * Clean up * Fix config * Fix sys * Add SFT config * Use min rate * Fix eval * Add base model * Add s1k * Disable eval * Fix * Add import checker * Fix importer * Fix * Tune config * Tune * Fix * Fix save * Tuen beta * Remove configs * Fix vLLM * Fix * Add note * Add doc * doc * Fix * Tune lr * Add command	2025-02-19 11:26:46 +01:00
Agus	740a7a4305	Add LiveCodeBench's codegeneration task from lighteval (#346 ) * Add lcb:codegeneration task from ligtheval * Add results from R1 Qwen 32B	2025-02-19 08:32:33 +01:00
lewtun	78c197df51	Enable chat template and system prompt to be configured during training (#349 ) * Enable chat template to be configured * Add notes to README * Handle None * Remove default system prompt * Fix ST * Tune hparams * Fix * Tune * Fix	2025-02-18 14:46:43 +01:00
Edward Beeching	f987b3c877	bump vllm to version to 0.7.2 (#311 ) VLLM has made a number of throughput improvements in version 0.7.2, so it's worth bumping the version, particularly for GRPO training runs.	2025-02-13 10:48:11 +01:00
lewtun	96a6b0fa33	Enable Weights & Biases defaults to be overridden in training (#294 ) * Enable WandB defaults to be set * Fix	2025-02-12 13:01:07 +01:00
Lewis	db19392bef	chore(README): fix link, consistent formatting for CUDA warning (#248 ) low priority & cosmetic	2025-02-09 09:45:38 +01:00
Ty Feng	90c1bfe829	Fix README: Correct recipes path and missing --config option (#247 ) * Fix incorrect recipes path in README * Fix missing --config option and incorrect recipes path * Fix missing --config option and incorrect recipes path	2025-02-09 08:21:35 +01:00
Xu Song	f5f0b55dc4	Fix typo (#241 )	2025-02-08 10:28:11 +01:00
lewtun	0da0f7cce2	Refactor training configs and unify Slurm for training SFT & GRPO (#231 ) * Refactor Slurm * Fix * FML * Nuke * Clean * Fix config * Fix deps * Fix logging	2025-02-07 15:56:43 +01:00
Quentin Gallouédec	dba152a494	fix config name (#222 )	2025-02-07 14:34:46 +01:00
lewtun	c4227d6220	Update README.md (#211 )	2025-02-06 16:40:09 +01:00
lewtun	a60b175aeb	Update CUDA (#209 ) * Update CUDA * Fix * Remove module * Restore CUDA * Move cuda import	2025-02-06 16:31:13 +01:00
lewtun	cec57f3a55	Add GPQA Diamond and fix evaluation deps (#196 ) * Add GPQA Diamond * Add table * Fix README * Up * Fixes * Ignore logs * Fix * Pin deps * Fix GRPO * Add Llama 70B tabels * Restore dp * Pin lighteval * Use bfloat16 * Tune table * Add note	2025-02-06 15:24:52 +01:00
Dongwei Jiang	571661a1e4	Provide a minimal reproducible experiment using GRPO for mathematical reasoning on base model, referencing the approach from SimpleRL-Reason (#197 ) * Create config_base_math_smalllr.yaml * Update README.md * Update README.md	2025-02-06 11:43:42 +01:00
Jingze Shi	e450a6fbc4	Recipes for optimzing training scripts (#120 ) * Add recipe configs to optimize scripts (#73) * remove small models * Add README for recipes * Add README for recipes * Attempt to resolve conflicts * Optimize src scripts * Update recipe of DeepSeek-R1-Distill-Qwen-7B * Update recipe of Qwen2.5-1.5B * Updated recipe readme for qwen * Update training command for recipes * Update README.md Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> * Update preprocessing_num_workers from 36 to 8 * Add small language model recipes for quickly verify R1 * Fix src code quality * Add back the Slurm job command * Remove recipe of doge * Fix torch_dtype is not used * fix grpo yaml * fix grpo yaml * fix deprecation warning * fix config folder location * Remove duplicate variables in grpo.py * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update recipes/qwen/Qwen2.5-1.5B-Instruct/grpo/confg_full.yaml Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> --------- Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2025-01-31 12:41:53 +01:00
Dongwei Jiang	22512e62bc	Update README.md (#132 )	2025-01-31 11:27:17 +01:00
Sam Schorb	356f6a5c4f	Add Table of Contents to README for easier navigation (#125 ) * Update README.md * Update README.md	2025-01-30 16:32:13 +01:00
Kashif Rasul	c0b53fae29	Grpo slurm scripts (#112 ) * initial grpo.slurm script * initial zero3 yaml using 1 less gpu * add completion and promp length * initial doc * use main * fix typo * remove num_processes * use vllm 0.7.0 * remove double module load * update math-verify * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * overwrite num_procs in the slurm script * add vllm args to readme * update readme --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2025-01-30 10:22:45 +01:00
Lewis	fb1b4c4e3f	docs(README): note about CUDA 12.1 (#121 ) will segfault for CUDA 14.1 under certain conditions; instructions are specific to 12.1 - fixes #106 - fixes #117	2025-01-30 08:42:43 +01:00
Edward Beeching	bd0e15bfb5	Update README.md (#93 )	2025-01-30 00:42:29 +01:00

1 2

79 commits