Lewis Tunstall
82fb385fa5
Refine tests
2025-05-27 13:39:00 +00:00
Lewis Tunstall
296aa66e1e
Tweak format reward
2025-05-27 08:16:49 +00:00
Lewis Tunstall
9f6abc8ed1
Relax format reward
2025-05-26 11:15:56 +00:00
Lewis Tunstall
9862bfec41
Relax reward
2025-05-26 08:09:03 +00:00
Lewis Tunstall
b575444fe8
Add think format and accuracy rewards
2025-05-25 12:24:43 +00:00
lewtun
9366aa2df3
Add dataset mixer ( #647 )
...
* Prototype
* Clean up
* Refactor
* Add tests
* Add doc and make scripts work
* Tune doc
* Up
* Tune
* Add column verification
* Fix types
* Fix YAML
* Fix types
* Fix doc
* f
* f
2025-05-20 11:40:42 +02:00
Edward Beeching
21b48fbe46
soft_overlong_punishment from DAPO paper ( #638 )
...
* soft_overlong_punishment_reward
* tests
* doc string updated
* style
* non-sensical import removed
* Update src/open_r1/rewards.py
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Update src/open_r1/rewards.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* max_completion_length set to 3.6
* style
* quality
* test case added for <max_com_len
* style
* max_len +cache len updated based on num chars
* max_len_completion docstring added in cofig
* Update configs.py
* refactor soft overlong penalty to use completion ids
* change decription to be tokens
---------
Co-authored-by: shirinyamani <yamani.shirin@ucalgary.ca>
Co-authored-by: Shirin Yamani <75791599+shirinyamani@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-05-09 17:26:34 +02:00
lewtun
6a0cd5c8ad
Fix style again :) ( #636 )
2025-05-08 16:29:01 +02:00
Andrei
af81114044
Code Execution using Morph Cloud ( #614 )
...
* initial commit for morphcloud sandbox support
* initial
* fixed prints in morph client for ioi
* updated import
* context manager
* removed unnecessary comments
* more intelligent instance/snapshot management
* update
* Add documentation for Morph integration
* Delete MORPH_INTEGRATION.md
* added retry and modularity to morph client
* updates to kwargs and setup.py
* Update setup.py
* added languages codepath + fixed slurm + added m
orph tests
* make quality formatting fixes
* conditional imports for morph
---------
Co-authored-by: arb8020 <arbeightytwenty@gmail.com>
2025-05-08 08:59:54 +02:00
lewtun
52520a6713
Fix style ( #631 )
...
* Fix style
* Fix
* Add jieba
2025-05-05 15:49:10 +02:00
Lewis Tunstall
c8b989109d
Fix style
2025-05-02 14:45:17 +00:00
binary-husky
65211f4824
🦜 Enhance repetition penalty reward for language that cannot be split by whitespace ( #516 )
...
* Update rewards.py
* add test for repetition reward with language
* Update src/open_r1/rewards.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update src/open_r1/rewards.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
---------
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-04-30 22:02:59 +02:00
Edward Beeching
c1eadaa097
E2B Router bug fixes ( #592 )
...
* fix eval system prompt
* style
* fix a rare issue where the execution is None
* fixes a bug in the e2b router
2025-04-11 14:04:59 +02:00
Edward Beeching
1b3bf043dc
Adds a E2B router server that executes batches of scripts ( #561 )
...
* adds a dedicated e2b server to handle batches of requests
* fix reward tests
* update slow reward
* style
* updates e2b router to be more generic
* refactor
* refactoring
* licence, cleanup
* update tests
* style
* fix import when e2b not present
* style
* rename sandbox file
* rename to RoutedSandbox
* update readme
* nits
* nits2
* unlimited max time
* update logs path
2025-04-07 21:01:06 +02:00
lewtun
4f5b21e21d
Fix accuracy reward for math ( #566 )
...
* Fix accuracy reward for math
* Add typing
* Add unit test
* Return None for invalid samples
* Fix order of answers
* Fix type
* Use None for non-verifiable answers
2025-04-01 12:04:26 +02:00
Edward Beeching
af487204ca
Adds binary code reward ( #528 )
...
* adds binary code reward, refactors grpo with get_reward_funcs
* adds return type to the function
* add get_reward_funcs test
* remote type hint
* move script args to another file
* update test
2025-03-21 12:53:38 +01:00
Guilherme Penedo
7835979801
adds support for running GRPO on IOI problems ( #495 )
...
* adds support for running GRPO on IOI problems
* nit
* bugfixes + recipe
* added piston info and readme changes
* readme updates
* run isort to fix checks
* Update src/open_r1/rewards.py
Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com>
* adding ioi test
* fix merge issues with python slow tests
* style
* generalize piston workers
* generalize readme
* fix extract code
* finalize slow tests
---------
Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com>
Co-authored-by: edbeeching <edbeeching@gmail.com>
2025-03-21 08:48:00 +01:00
Edward Beeching
5dcfae8979
Fixes bug with async code reward ( #504 )
...
* adds slow test for code reward
* fixes bug in setting language and the output parsing
* style
* removed redundant comment
* removed exeception as e
* remove rewards
* removed whitespace
* more whitespace
* remove need for loop with asyncio.run
* nits
* fix type error with e2n AsyncSandbox
2025-03-13 22:54:15 +01:00
lewtun
566cfd1a44
Align format reward with R1 traces and add reward function to count think / answer tags ( #418 )
...
* Fix tests
* Tune
* Add reward
* Apply suggestions from code review
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
---------
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-02-24 17:16:40 +01:00
Almaz Zinollayev
8322b3173f
Language specific code format reward ( #377 )
2025-02-21 15:41:34 +01:00
Kashif Rasul
90a6de94c7
Revert "Weighted reward functions ( #213 )" ( #317 )
...
This reverts commit fbea53267b .
2025-02-13 15:00:05 +01:00
Almaz Zinollayev
fbea53267b
Weighted reward functions ( #213 )
...
* [Weighted reward functions] Adding functionality to weigh rewards. Tests.
* [Weighted reward functions] Adding @wraps decorator to preserve reward function metadata
* style
* Changing grpo.py tests to run if cuda is available
* style
* Apply suggestions from code review
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
---------
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2025-02-13 14:08:27 +01:00
Kashif Rasul
7832290687
[Rewards] add kimi len_reward ( #292 )
...
* add kimi len_reward
* add to REWARD_FUNCS_REGISTRY
* fix formatting
* Update src/open_r1/grpo.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update src/open_r1/grpo.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update src/open_r1/grpo.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update src/open_r1/rewards.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update src/open_r1/rewards.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update src/open_r1/rewards.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update src/open_r1/rewards.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update src/open_r1/rewards.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* missing import
---------
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-02-13 11:51:09 +01:00
Almaz Zinollayev
517adddae3
[Testing Github workflow] Updating workflows and makefile ( #214 )
...
* [Testing Github workflow] Updating workflows and makefile
* [Testing Github workflow] - Refactoring workflow, fixing tests erorr, easier debugging
* [Testing Github workflow] Converting docstring into raw string
* [Testing Github workflow] - Fixing test_zero_max_penalty_returns_zero() test
* [Testing Github workflow] Removing redundant test
2025-02-10 18:28:35 +01:00
Edward Beeching
e4ac3ae070
Fix repetition reward + tests ( #272 )
...
* fix rep penalty
* fix tests
* clean up, style
2025-02-10 15:50:47 +01:00
Edward Beeching
88c51fe05d
Repetition penalty hotfix ( #266 )
...
* Adds a Repetition Penalty Reward
* style
* adds option to configue in grpo
* style
* improve desciptions
* fix final changes
* fix docstring
* style
2025-02-10 12:35:21 +01:00
Edward Beeching
486f7d48f5
Revert "Adds repetition penalty reward ( #263 )" ( #267 )
...
This reverts commit d57f2edbd4 .
2025-02-10 12:28:55 +01:00
Edward Beeching
d57f2edbd4
Adds repetition penalty reward ( #263 )
...
* Adds a Repetition Penalty Reward
* style
* adds option to configue in grpo
* style
* improve desciptions
2025-02-10 12:21:08 +01:00
JamesHujy
d12886da7f
fix format reward ( #238 )
...
* fix format reward
* failing test
* add \s* between </think> and <answer> tag to handle multilines
---------
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2025-02-08 15:46:44 +01:00
Quentin Gallouédec
dd915f8483
Fix cosine_scaled_reward compatibility with GRPO ( #229 )
...
* Drop partial
* Update src/open_r1/grpo.py
* style
2025-02-07 15:21:40 +01:00
Kashif Rasul
250ab46ea1
[GRPO] add cosine reward ( #206 )
...
* add cosine reward
* fix merge
* fix typo
* fix check
2025-02-07 08:10:48 +01:00
Almaz Zinollayev
e8c2673a15
Refactoring reward functions. Adding step by step reasoning reward. Adding test coverage for reward functions ( #144 )
...
* Refactoring reward functions. Adding step by step reasoning reward. Adding test coverage for reward functions
* [Refactoring reward functions] - Ruff error fix
* [Refactoring reward functions] - Linting error fix
2025-02-06 20:10:05 +01:00