Lewis Tunstall
|
3bcc4fc86e
|
Add codeforces
|
2025-05-28 19:21:15 +00:00 |
|
Lewis Tunstall
|
43375fa7b9
|
Merge branch 'main' into zero-math-code
|
2025-05-28 13:52:20 +02:00 |
|
lewtun
|
b806e1092a
|
Bump vLLM and TRL (#665)
* Bump vLLM and TRL
* Fix Makefile
|
2025-05-28 13:47:25 +02:00 |
|
lewtun
|
a6b4f668fb
|
Fix Weka refresh (#666)
* Fix Weka refresh
* Update evaluate.slurm
|
2025-05-28 13:45:48 +02:00 |
|
Lewis Tunstall
|
97b1c22e55
|
Merge branch 'bump-deps-0' into zero-math-code
|
2025-05-28 10:11:06 +02:00 |
|
Lewis Tunstall
|
cada407cd6
|
Merge branch 'main' into zero-math-code
|
2025-05-28 09:24:12 +02:00 |
|
lewtun
|
01b4351c45
|
Set DP=2 for smol model evals (#664)
* Set DP=2 for smol model evals
Temporary hack while the HF cluster is at max capacity :)
* Style
|
2025-05-28 09:23:12 +02:00 |
|
Lewis Tunstall
|
b369e428f8
|
Merge branch 'main' into zero-math-code
|
2025-05-28 09:22:22 +02:00 |
|
Lewis Tunstall
|
f6a07648e2
|
Bump vLLM and TRL
|
2025-05-28 06:48:01 +00:00 |
|
Lewis Tunstall
|
898406d85f
|
Fix DP=2 for evals
|
2025-05-27 21:20:52 +00:00 |
|
Lewis Tunstall
|
b6b1643c2d
|
Fix benchmarks!
|
2025-05-27 20:44:35 +00:00 |
|
lewtun
|
722f144d21
|
Refresh Weka on Slurm (#662)
* Refresh Weka on Slurm
* Include current working dir
|
2025-05-27 19:21:15 +02:00 |
|
lewtun
|
33f84def0d
|
Align EOS token ID between tokenizer and generation config (#663)
* Align EOS token ID between tokenizer and generation config
* Fix
|
2025-05-27 17:20:13 +02:00 |
|
Lewis Tunstall
|
82fb385fa5
|
Refine tests
|
2025-05-27 13:39:00 +00:00 |
|
lewtun
|
9eef995b4d
|
Bump deps (#656)
|
2025-05-27 15:38:21 +02:00 |
|
Lewis Tunstall
|
296aa66e1e
|
Tweak format reward
|
2025-05-27 08:16:49 +00:00 |
|
lewtun
|
5ac5971ea5
|
Add OpenR1-Distill recipe (#661)
|
2025-05-26 17:57:44 +02:00 |
|
Lewis Tunstall
|
9f6abc8ed1
|
Relax format reward
|
2025-05-26 11:15:56 +00:00 |
|
Lewis Tunstall
|
bc06504df5
|
Add better baseline defaults
|
2025-05-26 09:06:09 +00:00 |
|
Lewis Tunstall
|
9862bfec41
|
Relax reward
|
2025-05-26 08:09:03 +00:00 |
|
Lewis Tunstall
|
1f56bab96c
|
Tune baseline
|
2025-05-25 17:22:06 +00:00 |
|
Lewis Tunstall
|
965d451d61
|
Restore baseline
|
2025-05-25 17:00:33 +00:00 |
|
Lewis Tunstall
|
31eacc4b9a
|
Use GAS instead of generation
|
2025-05-25 16:57:33 +00:00 |
|
Lewis Tunstall
|
0b933a2aa4
|
Restore gas
|
2025-05-25 16:54:18 +00:00 |
|
Lewis Tunstall
|
cf765df201
|
Tune baseline
|
2025-05-25 13:21:01 +00:00 |
|
Lewis Tunstall
|
da0e9ae28d
|
Add overlong punishment
|
2025-05-25 12:46:45 +00:00 |
|
Lewis Tunstall
|
7f777c0583
|
Add new DAPO recipe
|
2025-05-25 12:40:32 +00:00 |
|
Lewis Tunstall
|
b575444fe8
|
Add think format and accuracy rewards
|
2025-05-25 12:24:43 +00:00 |
|
Lewis Tunstall
|
6c7c102755
|
Merge remote-tracking branch 'origin/bump-deps-0' into zero-math-code
|
2025-05-25 14:05:42 +02:00 |
|
lewtun
|
57e85b522f
|
Add better logging defaults for GRPO (#657)
|
2025-05-25 13:24:52 +02:00 |
|
lewtun
|
5374bc2bef
|
Merge branch 'main' into bump-deps-0
|
2025-05-25 12:02:52 +02:00 |
|
Lewis Tunstall
|
3258282733
|
Bump deps
|
2025-05-25 09:59:57 +00:00 |
|
Guilherme Penedo
|
c1e1192294
|
GRPO with codeforces problems (#627)
* add
* update
* updates
* updates #2
* weighted_sum and python fixes
* bugfix
* merging ioi/cf setups
* integrating the morph changes
* move morph_client
* run style
* small changes for mixed languages training
* revert grpo.py changes
* piston readme
* local test fetching
* bug fixes
* updated readme
* style fixes
* style fixes 2
* deps changes
* import sorting
* fix tests
* Update README.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update README.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update src/open_r1/rewards.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
---------
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
|
2025-05-25 11:55:27 +02:00 |
|
lewtun
|
db2d9b011a
|
Bump lower bound on liger-kernel (#654)
Related to https://github.com/huggingface/open-r1/pull/653
(I forgot to include this in that PR)
|
2025-05-22 08:44:13 +02:00 |
|
lewtun
|
8067149e90
|
Bump DeepSpeed to 0.16.8 to fix OOM on Qwen3 (#653)
|
2025-05-21 22:25:57 +02:00 |
|
lewtun
|
9366aa2df3
|
Add dataset mixer (#647)
* Prototype
* Clean up
* Refactor
* Add tests
* Add doc and make scripts work
* Tune doc
* Up
* Tune
* Add column verification
* Fix types
* Fix YAML
* Fix types
* Fix doc
* f
* f
|
2025-05-20 11:40:42 +02:00 |
|
Quentin Gallouédec
|
5e0c210f9c
|
use hf papers (#646)
|
2025-05-19 13:48:14 +02:00 |
|
lewtun
|
ebd5913a85
|
Bump LightEval (#643)
|
2025-05-16 10:52:05 +02:00 |
|
Edward Beeching
|
ea5b7edf22
|
Add dataset filtering script (#637)
* add dataset filtering script
* remove subset selection
* save wip
* save wip
* update filter script
* refactor to run on chunks
* rename script
* cleanup
* update dapo filtering
* fixes
* dapo filt config
* udpate compute pass rate
* clean
* update readme and config
* add merging snippet
|
2025-05-16 10:26:49 +02:00 |
|
lewtun
|
4fc2a3ff82
|
Add time to Slurm (#639)
|
2025-05-09 19:19:51 +02:00 |
|
lewtun
|
c802f00512
|
Use pass@1 for all evals (#633)
* Use pass@1 for all evals
* Update scores
|
2025-05-09 17:42:36 +02:00 |
|
Edward Beeching
|
21b48fbe46
|
soft_overlong_punishment from DAPO paper (#638)
* soft_overlong_punishment_reward
* tests
* doc string updated
* style
* non-sensical import removed
* Update src/open_r1/rewards.py
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Update src/open_r1/rewards.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* max_completion_length set to 3.6
* style
* quality
* test case added for <max_com_len
* style
* max_len +cache len updated based on num chars
* max_len_completion docstring added in cofig
* Update configs.py
* refactor soft overlong penalty to use completion ids
* change decription to be tokens
---------
Co-authored-by: shirinyamani <yamani.shirin@ucalgary.ca>
Co-authored-by: Shirin Yamani <75791599+shirinyamani@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
|
2025-05-09 17:26:34 +02:00 |
|
lewtun
|
6a0cd5c8ad
|
Fix style again :) (#636)
|
2025-05-08 16:29:01 +02:00 |
|
Andrei
|
af81114044
|
Code Execution using Morph Cloud (#614)
* initial commit for morphcloud sandbox support
* initial
* fixed prints in morph client for ioi
* updated import
* context manager
* removed unnecessary comments
* more intelligent instance/snapshot management
* update
* Add documentation for Morph integration
* Delete MORPH_INTEGRATION.md
* added retry and modularity to morph client
* updates to kwargs and setup.py
* Update setup.py
* added languages codepath + fixed slurm + added m
orph tests
* make quality formatting fixes
* conditional imports for morph
---------
Co-authored-by: arb8020 <arbeightytwenty@gmail.com>
|
2025-05-08 08:59:54 +02:00 |
|
lewtun
|
52520a6713
|
Fix style (#631)
* Fix style
* Fix
* Add jieba
|
2025-05-05 15:49:10 +02:00 |
|
Lewis Tunstall
|
c8b989109d
|
Fix style
|
2025-05-02 14:45:17 +00:00 |
|
lewtun
|
9373ad3055
|
Update README.md
|
2025-04-30 22:16:18 +02:00 |
|
binary-husky
|
65211f4824
|
🦜Enhance repetition penalty reward for language that cannot be split by whitespace (#516)
* Update rewards.py
* add test for repetition reward with language
* Update src/open_r1/rewards.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update src/open_r1/rewards.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
---------
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
|
2025-04-30 22:02:59 +02:00 |
|
lewtun
|
75c3999180
|
Bump LightEval to enable DP>1 (#629)
* Bump LightEval to enable DP>1
* Remove redundant arg
* Update eval scores
* Fix slurm
|
2025-04-30 22:02:20 +02:00 |
|
lewtun
|
50590a41b9
|
Enable data and tensor parallelism for GRPO (#626)
* Bump deps
* Fix SLurm
* Fix
|
2025-04-26 11:50:08 +02:00 |
|