mirrors/open-r1

mirror of https://github.com/huggingface/open-r1.git synced 2026-06-24 01:54:06 +00:00

Author	SHA1	Message	Date
Lewis Tunstall	3bcc4fc86e	Add codeforces	2025-05-28 19:21:15 +00:00
Lewis Tunstall	b369e428f8	Merge branch 'main' into zero-math-code	2025-05-28 09:22:22 +02:00
Lewis Tunstall	b6b1643c2d	Fix benchmarks!	2025-05-27 20:44:35 +00:00
lewtun	33f84def0d	Align EOS token ID between tokenizer and generation config (#663 ) * Align EOS token ID between tokenizer and generation config * Fix	2025-05-27 17:20:13 +02:00
Lewis Tunstall	82fb385fa5	Refine tests	2025-05-27 13:39:00 +00:00
Lewis Tunstall	296aa66e1e	Tweak format reward	2025-05-27 08:16:49 +00:00
lewtun	5ac5971ea5	Add OpenR1-Distill recipe (#661 )	2025-05-26 17:57:44 +02:00
Lewis Tunstall	bc06504df5	Add better baseline defaults	2025-05-26 09:06:09 +00:00
Lewis Tunstall	9862bfec41	Relax reward	2025-05-26 08:09:03 +00:00
Lewis Tunstall	1f56bab96c	Tune baseline	2025-05-25 17:22:06 +00:00
Lewis Tunstall	965d451d61	Restore baseline	2025-05-25 17:00:33 +00:00
Lewis Tunstall	31eacc4b9a	Use GAS instead of generation	2025-05-25 16:57:33 +00:00
Lewis Tunstall	0b933a2aa4	Restore gas	2025-05-25 16:54:18 +00:00
Lewis Tunstall	cf765df201	Tune baseline	2025-05-25 13:21:01 +00:00
Lewis Tunstall	da0e9ae28d	Add overlong punishment	2025-05-25 12:46:45 +00:00
Lewis Tunstall	7f777c0583	Add new DAPO recipe	2025-05-25 12:40:32 +00:00
Guilherme Penedo	c1e1192294	GRPO with codeforces problems (#627 ) * add * update * updates * updates #2 * weighted_sum and python fixes * bugfix * merging ioi/cf setups * integrating the morph changes * move morph_client * run style * small changes for mixed languages training * revert grpo.py changes * piston readme * local test fetching * bug fixes * updated readme * style fixes * style fixes 2 * deps changes * import sorting * fix tests * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update src/open_r1/rewards.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2025-05-25 11:55:27 +02:00
Edward Beeching	ea5b7edf22	Add dataset filtering script (#637 ) * add dataset filtering script * remove subset selection * save wip * save wip * update filter script * refactor to run on chunks * rename script * cleanup * update dapo filtering * fixes * dapo filt config * udpate compute pass rate * clean * update readme and config * add merging snippet	2025-05-16 10:26:49 +02:00
lewtun	50590a41b9	Enable data and tensor parallelism for GRPO (#626 ) * Bump deps * Fix SLurm * Fix	2025-04-26 11:50:08 +02:00
lewtun	5112bfc401	Fix SFT for base models (#604 ) * Fix pad token bug in SFT * Add ChatML default * Clean up * Refactor grpo model load * Add doc * Bump deepspeed	2025-04-16 11:45:50 +02:00
lewtun	8cf42663fd	Clean up recipes (#596 )	2025-04-11 20:09:15 +02:00
lewtun	04dbf21989	Bump TRL and vLLM (#595 ) * Bump TRL and vLLM * Fix style * Bump liger * Add liger	2025-04-11 16:32:33 +02:00
lewtun	ca8664df1c	Fix missing prompt columns in recipes (#574 )	2025-04-02 15:48:48 +02:00
lewtun	8000dd2384	[WIP] RL goes brrr (#533 ) * Fix vLLM recipes * Add vllm server to Slurm * Add overlap across srun * Fix NUM_NODES * Refactor TP to script * fix train script to work withnew GRPO * lewis nits * bump trl, transformers --------- Co-authored-by: edbeeching <edbeeching@gmail.com>	2025-03-24 15:15:02 +01:00
Guilherme Penedo	7835979801	adds support for running GRPO on IOI problems (#495 ) * adds support for running GRPO on IOI problems * nit * bugfixes + recipe * added piston info and readme changes * readme updates * run isort to fix checks * Update src/open_r1/rewards.py Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com> * adding ioi test * fix merge issues with python slow tests * style * generalize piston workers * generalize readme * fix extract code * finalize slow tests --------- Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com> Co-authored-by: edbeeching <edbeeching@gmail.com>	2025-03-21 08:48:00 +01:00
lewtun	d5922af8ce	Add OlympicCoder recipes (#505 ) * Add OlympicCoder recipes * Fix configs * Add FSDP config	2025-03-13 19:08:34 +01:00
lewtun	45ccf60109	Remove dataset_configs from YAML recipes (#461 )	2025-03-03 13:54:58 +01:00
elie	3ba56c1c3d	Add config sft smollm (#425 ) * add sft recipe * add smollm sft * max_length modif 1 * max_length modif 2	2025-02-25 21:45:59 +01:00
Edward Beeching	0c3ef8372e	updates max_seq_length to max length due to a bug in trl (#419 )	2025-02-24 17:27:56 +01:00
lewtun	566cfd1a44	Align format reward with R1 traces and add reward function to count think / answer tags (#418 ) * Fix tests * Tune * Add reward * Apply suggestions from code review Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-02-24 17:16:40 +01:00
elie	5355687e6c	add sft recipe (#415 )	2025-02-24 15:43:12 +01:00
lewtun	d76ecc12a2	Add E2B code interpreter reward function (#364 ) * Add stuff * Make it kind of work * Add more stuff * Add fix for parse * Fix * Refactor * Clean up * Fix config * Fix sys * Add SFT config * Use min rate * Fix eval * Add base model * Add s1k * Disable eval * Fix * Add import checker * Fix importer * Fix * Tune config * Tune * Fix * Fix save * Tuen beta * Remove configs * Fix vLLM * Fix * Add note * Add doc * doc * Fix * Tune lr * Add command	2025-02-19 11:26:46 +01:00
lewtun	78c197df51	Enable chat template and system prompt to be configured during training (#349 ) * Enable chat template to be configured * Add notes to README * Handle None * Remove default system prompt * Fix ST * Tune hparams * Fix * Tune * Fix	2025-02-18 14:46:43 +01:00
Almaz Zinollayev	698530484c	Adding grpo reward args into yaml files (#337 )	2025-02-18 13:10:03 +01:00
Yen-Ting Lin	d5b67f4fe5	Add SFT configuration for Mistral-Small-24B-Instruct-2501 model (#348 ) * Add SFT configuration for Mistral-Small-24B-Instruct-2501 model * Rename config_numina.yaml to config_openr1_math.yaml --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2025-02-18 08:52:45 +01:00
Kashif Rasul	90a6de94c7	Revert "Weighted reward functions (#213 )" (#317 ) This reverts commit `fbea53267b`.	2025-02-13 15:00:05 +01:00
Almaz Zinollayev	fbea53267b	Weighted reward functions (#213 ) * [Weighted reward functions] Adding functionality to weigh rewards. Tests. * [Weighted reward functions] Adding @wraps decorator to preserve reward function metadata * style * Changing grpo.py tests to run if cuda is available * style * Apply suggestions from code review Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2025-02-13 14:08:27 +01:00
Quentin Gallouédec	52aa8759a2	new grpo logic (#274 )	2025-02-11 09:35:06 +01:00
Jinfeng Sun	82b2a6525f	fix(sft recipes): remove duplicate packing option from config (#280 )	2025-02-11 09:34:19 +01:00
lewtun	3519a7fa3d	Remove duplicate math-verify (#234 )	2025-02-07 20:01:54 +01:00
lewtun	0da0f7cce2	Refactor training configs and unify Slurm for training SFT & GRPO (#231 ) * Refactor Slurm * Fix * FML * Nuke * Clean * Fix config * Fix deps * Fix logging	2025-02-07 15:56:43 +01:00
Quentin Gallouédec	dba152a494	fix config name (#222 )	2025-02-07 14:34:46 +01:00
Dongwei Jiang	571661a1e4	Provide a minimal reproducible experiment using GRPO for mathematical reasoning on base model, referencing the approach from SimpleRL-Reason (#197 ) * Create config_base_math_smalllr.yaml * Update README.md * Update README.md	2025-02-06 11:43:42 +01:00
Jingze Shi	e450a6fbc4	Recipes for optimzing training scripts (#120 ) * Add recipe configs to optimize scripts (#73) * remove small models * Add README for recipes * Add README for recipes * Attempt to resolve conflicts * Optimize src scripts * Update recipe of DeepSeek-R1-Distill-Qwen-7B * Update recipe of Qwen2.5-1.5B * Updated recipe readme for qwen * Update training command for recipes * Update README.md Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> * Update preprocessing_num_workers from 36 to 8 * Add small language model recipes for quickly verify R1 * Fix src code quality * Add back the Slurm job command * Remove recipe of doge * Fix torch_dtype is not used * fix grpo yaml * fix grpo yaml * fix deprecation warning * fix config folder location * Remove duplicate variables in grpo.py * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update recipes/qwen/Qwen2.5-1.5B-Instruct/grpo/confg_full.yaml Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> --------- Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2025-01-31 12:41:53 +01:00
lewtun	ca8f35c143	REFACTOR TO THE MAX (#7 )	2025-01-25 00:12:25 +01:00
elie	c421bc893b	Improve sft (#5 ) * first commit * working training * change model_id * Update scripts/training/sft.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-01-24 22:23:49 +01:00
lewtun	6acc9a0aa0	Add configs and stuff (#2 )	2025-01-24 20:05:18 +01:00

47 commits