Commit graph

220 commits

Author SHA1 Message Date
Lewis Tunstall
3bcc4fc86e Add codeforces 2025-05-28 19:21:15 +00:00
Lewis Tunstall
43375fa7b9 Merge branch 'main' into zero-math-code 2025-05-28 13:52:20 +02:00
lewtun
b806e1092a
Bump vLLM and TRL (#665)
* Bump vLLM and TRL

* Fix Makefile
2025-05-28 13:47:25 +02:00
lewtun
a6b4f668fb
Fix Weka refresh (#666)
* Fix Weka refresh

* Update evaluate.slurm
2025-05-28 13:45:48 +02:00
Lewis Tunstall
97b1c22e55 Merge branch 'bump-deps-0' into zero-math-code 2025-05-28 10:11:06 +02:00
Lewis Tunstall
cada407cd6 Merge branch 'main' into zero-math-code 2025-05-28 09:24:12 +02:00
lewtun
01b4351c45
Set DP=2 for smol model evals (#664)
* Set DP=2 for smol model evals

Temporary hack while the HF cluster is at max capacity :)

* Style
2025-05-28 09:23:12 +02:00
Lewis Tunstall
b369e428f8 Merge branch 'main' into zero-math-code 2025-05-28 09:22:22 +02:00
Lewis Tunstall
f6a07648e2 Bump vLLM and TRL 2025-05-28 06:48:01 +00:00
Lewis Tunstall
898406d85f Fix DP=2 for evals 2025-05-27 21:20:52 +00:00
Lewis Tunstall
b6b1643c2d Fix benchmarks! 2025-05-27 20:44:35 +00:00
lewtun
722f144d21
Refresh Weka on Slurm (#662)
* Refresh Weka on Slurm

* Include current working dir
2025-05-27 19:21:15 +02:00
lewtun
33f84def0d
Align EOS token ID between tokenizer and generation config (#663)
* Align EOS token ID between tokenizer and generation config

* Fix
2025-05-27 17:20:13 +02:00
Lewis Tunstall
82fb385fa5 Refine tests 2025-05-27 13:39:00 +00:00
lewtun
9eef995b4d
Bump deps (#656) 2025-05-27 15:38:21 +02:00
Lewis Tunstall
296aa66e1e Tweak format reward 2025-05-27 08:16:49 +00:00
lewtun
5ac5971ea5
Add OpenR1-Distill recipe (#661) 2025-05-26 17:57:44 +02:00
Lewis Tunstall
9f6abc8ed1 Relax format reward 2025-05-26 11:15:56 +00:00
Lewis Tunstall
bc06504df5 Add better baseline defaults 2025-05-26 09:06:09 +00:00
Lewis Tunstall
9862bfec41 Relax reward 2025-05-26 08:09:03 +00:00
Lewis Tunstall
1f56bab96c Tune baseline 2025-05-25 17:22:06 +00:00
Lewis Tunstall
965d451d61 Restore baseline 2025-05-25 17:00:33 +00:00
Lewis Tunstall
31eacc4b9a Use GAS instead of generation 2025-05-25 16:57:33 +00:00
Lewis Tunstall
0b933a2aa4 Restore gas 2025-05-25 16:54:18 +00:00
Lewis Tunstall
cf765df201 Tune baseline 2025-05-25 13:21:01 +00:00
Lewis Tunstall
da0e9ae28d Add overlong punishment 2025-05-25 12:46:45 +00:00
Lewis Tunstall
7f777c0583 Add new DAPO recipe 2025-05-25 12:40:32 +00:00
Lewis Tunstall
b575444fe8 Add think format and accuracy rewards 2025-05-25 12:24:43 +00:00
Lewis Tunstall
6c7c102755 Merge remote-tracking branch 'origin/bump-deps-0' into zero-math-code 2025-05-25 14:05:42 +02:00
lewtun
57e85b522f
Add better logging defaults for GRPO (#657) 2025-05-25 13:24:52 +02:00
lewtun
5374bc2bef
Merge branch 'main' into bump-deps-0 2025-05-25 12:02:52 +02:00
Lewis Tunstall
3258282733 Bump deps 2025-05-25 09:59:57 +00:00
Guilherme Penedo
c1e1192294
GRPO with codeforces problems (#627)
* add

* update

* updates

* updates #2

* weighted_sum and python fixes

* bugfix

* merging ioi/cf setups

* integrating the morph changes

* move morph_client

* run style

* small changes for mixed languages training

* revert grpo.py changes

* piston readme

* local test fetching

* bug fixes

* updated readme

* style fixes

* style fixes 2

* deps changes

* import sorting

* fix tests

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update src/open_r1/rewards.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

---------

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-05-25 11:55:27 +02:00
lewtun
db2d9b011a
Bump lower bound on liger-kernel (#654)
Related to https://github.com/huggingface/open-r1/pull/653

(I forgot to include this in that PR)
2025-05-22 08:44:13 +02:00
lewtun
8067149e90
Bump DeepSpeed to 0.16.8 to fix OOM on Qwen3 (#653) 2025-05-21 22:25:57 +02:00
lewtun
9366aa2df3
Add dataset mixer (#647)
* Prototype

* Clean up

* Refactor

* Add tests

* Add doc and make scripts work

* Tune doc

* Up

* Tune

* Add column verification

* Fix types

* Fix YAML

* Fix types

* Fix doc

* f

* f
2025-05-20 11:40:42 +02:00
Quentin Gallouédec
5e0c210f9c
use hf papers (#646) 2025-05-19 13:48:14 +02:00
lewtun
ebd5913a85
Bump LightEval (#643) 2025-05-16 10:52:05 +02:00
Edward Beeching
ea5b7edf22
Add dataset filtering script (#637)
* add dataset filtering script

* remove subset selection

* save wip

* save wip

* update filter script

* refactor to run on chunks

* rename script

* cleanup

* update dapo filtering

* fixes

* dapo filt config

* udpate compute pass rate

* clean

* update readme and config

* add merging snippet
2025-05-16 10:26:49 +02:00
lewtun
4fc2a3ff82
Add time to Slurm (#639) 2025-05-09 19:19:51 +02:00
lewtun
c802f00512
Use pass@1 for all evals (#633)
* Use pass@1 for all evals

* Update scores
2025-05-09 17:42:36 +02:00
Edward Beeching
21b48fbe46
soft_overlong_punishment from DAPO paper (#638)
* soft_overlong_punishment_reward

* tests

* doc string updated

* style

* non-sensical import removed

* Update src/open_r1/rewards.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/open_r1/rewards.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* max_completion_length set to 3.6

* style

* quality

* test case added for <max_com_len

* style

* max_len +cache len updated based on num chars

* max_len_completion docstring added in cofig

* Update configs.py

* refactor soft overlong penalty to use completion ids

* change decription to be tokens

---------

Co-authored-by: shirinyamani <yamani.shirin@ucalgary.ca>
Co-authored-by: Shirin Yamani <75791599+shirinyamani@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-05-09 17:26:34 +02:00
lewtun
6a0cd5c8ad
Fix style again :) (#636) 2025-05-08 16:29:01 +02:00
Andrei
af81114044
Code Execution using Morph Cloud (#614)
* initial commit for morphcloud sandbox support

* initial

* fixed prints in morph client for ioi

* updated import

* context manager

* removed unnecessary comments

* more intelligent instance/snapshot management

* update

* Add documentation for Morph integration

* Delete MORPH_INTEGRATION.md

* added retry and modularity to morph client

* updates to kwargs and setup.py

* Update setup.py

* added languages codepath + fixed slurm + added m
orph tests

* make quality formatting fixes

* conditional imports for morph

---------

Co-authored-by: arb8020 <arbeightytwenty@gmail.com>
2025-05-08 08:59:54 +02:00
lewtun
52520a6713
Fix style (#631)
* Fix style

* Fix

* Add jieba
2025-05-05 15:49:10 +02:00
Lewis Tunstall
c8b989109d Fix style 2025-05-02 14:45:17 +00:00
lewtun
9373ad3055
Update README.md 2025-04-30 22:16:18 +02:00
binary-husky
65211f4824
🦜Enhance repetition penalty reward for language that cannot be split by whitespace (#516)
* Update rewards.py

* add test for repetition reward with language

* Update src/open_r1/rewards.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update src/open_r1/rewards.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

---------

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-04-30 22:02:59 +02:00
lewtun
75c3999180
Bump LightEval to enable DP>1 (#629)
* Bump LightEval to enable DP>1

* Remove redundant arg

* Update eval scores

* Fix slurm
2025-04-30 22:02:20 +02:00
lewtun
50590a41b9
Enable data and tensor parallelism for GRPO (#626)
* Bump deps

* Fix SLurm

* Fix
2025-04-26 11:50:08 +02:00