mirrors/open-r1

mirror of https://github.com/huggingface/open-r1.git synced 2026-06-24 01:54:06 +00:00

Author	SHA1	Message	Date
Lewis Tunstall	82fb385fa5	Refine tests	2025-05-27 13:39:00 +00:00
Lewis Tunstall	296aa66e1e	Tweak format reward	2025-05-27 08:16:49 +00:00
Lewis Tunstall	9f6abc8ed1	Relax format reward	2025-05-26 11:15:56 +00:00
Lewis Tunstall	9862bfec41	Relax reward	2025-05-26 08:09:03 +00:00
Lewis Tunstall	b575444fe8	Add think format and accuracy rewards	2025-05-25 12:24:43 +00:00
lewtun	9366aa2df3	Add dataset mixer (#647 ) * Prototype * Clean up * Refactor * Add tests * Add doc and make scripts work * Tune doc * Up * Tune * Add column verification * Fix types * Fix YAML * Fix types * Fix doc * f * f	2025-05-20 11:40:42 +02:00
Edward Beeching	21b48fbe46	soft_overlong_punishment from DAPO paper (#638 ) * soft_overlong_punishment_reward * tests * doc string updated * style * non-sensical import removed * Update src/open_r1/rewards.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/open_r1/rewards.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * max_completion_length set to 3.6 * style * quality * test case added for <max_com_len * style * max_len +cache len updated based on num chars * max_len_completion docstring added in cofig * Update configs.py * refactor soft overlong penalty to use completion ids * change decription to be tokens --------- Co-authored-by: shirinyamani <yamani.shirin@ucalgary.ca> Co-authored-by: Shirin Yamani <75791599+shirinyamani@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2025-05-09 17:26:34 +02:00
lewtun	6a0cd5c8ad	Fix style again :) (#636 )	2025-05-08 16:29:01 +02:00
Andrei	af81114044	Code Execution using Morph Cloud (#614 ) * initial commit for morphcloud sandbox support * initial * fixed prints in morph client for ioi * updated import * context manager * removed unnecessary comments * more intelligent instance/snapshot management * update * Add documentation for Morph integration * Delete MORPH_INTEGRATION.md * added retry and modularity to morph client * updates to kwargs and setup.py * Update setup.py * added languages codepath + fixed slurm + added m orph tests * make quality formatting fixes * conditional imports for morph --------- Co-authored-by: arb8020 <arbeightytwenty@gmail.com>	2025-05-08 08:59:54 +02:00
lewtun	52520a6713	Fix style (#631 ) * Fix style * Fix * Add jieba	2025-05-05 15:49:10 +02:00
Lewis Tunstall	c8b989109d	Fix style	2025-05-02 14:45:17 +00:00
binary-husky	65211f4824	🦜Enhance repetition penalty reward for language that cannot be split by whitespace (#516 ) * Update rewards.py * add test for repetition reward with language * Update src/open_r1/rewards.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update src/open_r1/rewards.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2025-04-30 22:02:59 +02:00
Edward Beeching	c1eadaa097	E2B Router bug fixes (#592 ) * fix eval system prompt * style * fix a rare issue where the execution is None * fixes a bug in the e2b router	2025-04-11 14:04:59 +02:00
Edward Beeching	1b3bf043dc	Adds a E2B router server that executes batches of scripts (#561 ) * adds a dedicated e2b server to handle batches of requests * fix reward tests * update slow reward * style * updates e2b router to be more generic * refactor * refactoring * licence, cleanup * update tests * style * fix import when e2b not present * style * rename sandbox file * rename to RoutedSandbox * update readme * nits * nits2 * unlimited max time * update logs path	2025-04-07 21:01:06 +02:00
lewtun	4f5b21e21d	Fix accuracy reward for math (#566 ) * Fix accuracy reward for math * Add typing * Add unit test * Return None for invalid samples * Fix order of answers * Fix type * Use None for non-verifiable answers	2025-04-01 12:04:26 +02:00
Edward Beeching	af487204ca	Adds binary code reward (#528 ) * adds binary code reward, refactors grpo with get_reward_funcs * adds return type to the function * add get_reward_funcs test * remote type hint * move script args to another file * update test	2025-03-21 12:53:38 +01:00
Guilherme Penedo	7835979801	adds support for running GRPO on IOI problems (#495 ) * adds support for running GRPO on IOI problems * nit * bugfixes + recipe * added piston info and readme changes * readme updates * run isort to fix checks * Update src/open_r1/rewards.py Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com> * adding ioi test * fix merge issues with python slow tests * style * generalize piston workers * generalize readme * fix extract code * finalize slow tests --------- Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com> Co-authored-by: edbeeching <edbeeching@gmail.com>	2025-03-21 08:48:00 +01:00
Edward Beeching	5dcfae8979	Fixes bug with async code reward (#504 ) * adds slow test for code reward * fixes bug in setting language and the output parsing * style * removed redundant comment * removed exeception as e * remove rewards * removed whitespace * more whitespace * remove need for loop with asyncio.run * nits * fix type error with e2n AsyncSandbox	2025-03-13 22:54:15 +01:00
lewtun	566cfd1a44	Align format reward with R1 traces and add reward function to count think / answer tags (#418 ) * Fix tests * Tune * Add reward * Apply suggestions from code review Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-02-24 17:16:40 +01:00
Almaz Zinollayev	8322b3173f	Language specific code format reward (#377 )	2025-02-21 15:41:34 +01:00
Kashif Rasul	90a6de94c7	Revert "Weighted reward functions (#213 )" (#317 ) This reverts commit `fbea53267b`.	2025-02-13 15:00:05 +01:00
Almaz Zinollayev	fbea53267b	Weighted reward functions (#213 ) * [Weighted reward functions] Adding functionality to weigh rewards. Tests. * [Weighted reward functions] Adding @wraps decorator to preserve reward function metadata * style * Changing grpo.py tests to run if cuda is available * style * Apply suggestions from code review Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co> Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2025-02-13 14:08:27 +01:00
Kashif Rasul	7832290687	[Rewards] add kimi len_reward (#292 ) * add kimi len_reward * add to REWARD_FUNCS_REGISTRY * fix formatting * Update src/open_r1/grpo.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update src/open_r1/grpo.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update src/open_r1/grpo.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update src/open_r1/rewards.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update src/open_r1/rewards.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update src/open_r1/rewards.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update src/open_r1/rewards.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update src/open_r1/rewards.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * missing import --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2025-02-13 11:51:09 +01:00
Almaz Zinollayev	517adddae3	[Testing Github workflow] Updating workflows and makefile (#214 ) * [Testing Github workflow] Updating workflows and makefile * [Testing Github workflow] - Refactoring workflow, fixing tests erorr, easier debugging * [Testing Github workflow] Converting docstring into raw string * [Testing Github workflow] - Fixing test_zero_max_penalty_returns_zero() test * [Testing Github workflow] Removing redundant test	2025-02-10 18:28:35 +01:00
Edward Beeching	e4ac3ae070	Fix repetition reward + tests (#272 ) * fix rep penalty * fix tests * clean up, style	2025-02-10 15:50:47 +01:00
Edward Beeching	88c51fe05d	Repetition penalty hotfix (#266 ) * Adds a Repetition Penalty Reward * style * adds option to configue in grpo * style * improve desciptions * fix final changes * fix docstring * style	2025-02-10 12:35:21 +01:00
Edward Beeching	486f7d48f5	Revert "Adds repetition penalty reward (#263 )" (#267 ) This reverts commit `d57f2edbd4`.	2025-02-10 12:28:55 +01:00
Edward Beeching	d57f2edbd4	Adds repetition penalty reward (#263 ) * Adds a Repetition Penalty Reward * style * adds option to configue in grpo * style * improve desciptions	2025-02-10 12:21:08 +01:00
JamesHujy	d12886da7f	fix format reward (#238 ) * fix format reward * failing test * add \s* between </think> and <answer> tag to handle multilines --------- Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>	2025-02-08 15:46:44 +01:00
Quentin Gallouédec	dd915f8483	Fix `cosine_scaled_reward` compatibility with GRPO (#229 ) * Drop partial * Update src/open_r1/grpo.py * style	2025-02-07 15:21:40 +01:00
Kashif Rasul	250ab46ea1	[GRPO] add cosine reward (#206 ) * add cosine reward * fix merge * fix typo * fix check	2025-02-07 08:10:48 +01:00
Almaz Zinollayev	e8c2673a15	Refactoring reward functions. Adding step by step reasoning reward. Adding test coverage for reward functions (#144 ) * Refactoring reward functions. Adding step by step reasoning reward. Adding test coverage for reward functions * [Refactoring reward functions] - Ruff error fix * [Refactoring reward functions] - Linting error fix	2025-02-06 20:10:05 +01:00

32 commits