Lewis Tunstall
f6a07648e2
Bump vLLM and TRL
2025-05-28 06:48:01 +00:00
lewtun
9eef995b4d
Bump deps ( #656 )
2025-05-27 15:38:21 +02:00
lewtun
57e85b522f
Add better logging defaults for GRPO ( #657 )
2025-05-25 13:24:52 +02:00
Guilherme Penedo
c1e1192294
GRPO with codeforces problems ( #627 )
...
* add
* update
* updates
* updates #2
* weighted_sum and python fixes
* bugfix
* merging ioi/cf setups
* integrating the morph changes
* move morph_client
* run style
* small changes for mixed languages training
* revert grpo.py changes
* piston readme
* local test fetching
* bug fixes
* updated readme
* style fixes
* style fixes 2
* deps changes
* import sorting
* fix tests
* Update README.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update README.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update src/open_r1/rewards.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
---------
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-05-25 11:55:27 +02:00
lewtun
db2d9b011a
Bump lower bound on liger-kernel ( #654 )
...
Related to https://github.com/huggingface/open-r1/pull/653
(I forgot to include this in that PR)
2025-05-22 08:44:13 +02:00
lewtun
8067149e90
Bump DeepSpeed to 0.16.8 to fix OOM on Qwen3 ( #653 )
2025-05-21 22:25:57 +02:00
lewtun
ebd5913a85
Bump LightEval ( #643 )
2025-05-16 10:52:05 +02:00
lewtun
c802f00512
Use pass@1 for all evals ( #633 )
...
* Use pass@1 for all evals
* Update scores
2025-05-09 17:42:36 +02:00
Andrei
af81114044
Code Execution using Morph Cloud ( #614 )
...
* initial commit for morphcloud sandbox support
* initial
* fixed prints in morph client for ioi
* updated import
* context manager
* removed unnecessary comments
* more intelligent instance/snapshot management
* update
* Add documentation for Morph integration
* Delete MORPH_INTEGRATION.md
* added retry and modularity to morph client
* updates to kwargs and setup.py
* Update setup.py
* added languages codepath + fixed slurm + added m
orph tests
* make quality formatting fixes
* conditional imports for morph
---------
Co-authored-by: arb8020 <arbeightytwenty@gmail.com>
2025-05-08 08:59:54 +02:00
lewtun
52520a6713
Fix style ( #631 )
...
* Fix style
* Fix
* Add jieba
2025-05-05 15:49:10 +02:00
lewtun
75c3999180
Bump LightEval to enable DP>1 ( #629 )
...
* Bump LightEval to enable DP>1
* Remove redundant arg
* Update eval scores
* Fix slurm
2025-04-30 22:02:20 +02:00
lewtun
50590a41b9
Enable data and tensor parallelism for GRPO ( #626 )
...
* Bump deps
* Fix SLurm
* Fix
2025-04-26 11:50:08 +02:00
lewtun
5112bfc401
Fix SFT for base models ( #604 )
...
* Fix pad token bug in SFT
* Add ChatML default
* Clean up
* Refactor grpo model load
* Add doc
* Bump deepspeed
2025-04-16 11:45:50 +02:00
lewtun
04dbf21989
Bump TRL and vLLM ( #595 )
...
* Bump TRL and vLLM
* Fix style
* Bump liger
* Add liger
2025-04-11 16:32:33 +02:00
lewtun
bf08f56849
[WIP] Bump lighteval with proper pass@1 ( #584 )
...
* Bump lighteval with proper pass@1
* Bump lighteval
* Update AIME24
2025-04-08 20:53:34 +02:00
Edward Beeching
9915e06f1e
Async code reward fixes ( #546 )
...
* expose num parallel code executions
* add e2b benchmarking script
* adds new parallel code execution with better execption handling
* style
* update default
* increase sandbox timeout
* Add pretty table and Sandbox IDs
* Add Sandbox ID
* fix merge
---------
Co-authored-by: Lewis Tunstall <lewis.c.tunstall@gmail.com>
2025-03-28 14:08:15 +01:00
lewtun
8000dd2384
[WIP] RL goes brrr ( #533 )
...
* Fix vLLM recipes
* Add vllm server to Slurm
* Add overlap across srun
* Fix NUM_NODES
* Refactor TP to script
* fix train script to work withnew GRPO
* lewis nits
* bump trl, transformers
---------
Co-authored-by: edbeeching <edbeeching@gmail.com>
2025-03-24 15:15:02 +01:00
Edward Beeching
8782fa6e90
bump lighteval, expose the lcb_v4 benchmark ( #441 )
2025-02-26 17:59:44 +01:00
Edward Beeching
a20666d5b5
Bumps TRL ( #437 )
2025-02-26 10:35:50 +01:00
lewtun
3f9d75a595
Bump Liger kernel ( #399 )
...
Needed to enable SFT training via https://github.com/huggingface/trl/pull/2874
2025-02-23 17:44:03 +01:00
lewtun
49d9b741a5
Pin dependencies ( #393 )
2025-02-22 14:46:09 +01:00
lewtun
9fb45bede6
Fix LightEval commands and dependencies ( #386 )
...
* Fix lighteval cmd
* Fix typo
* Pin lighteval
* Hacks to the max
* Fix slurm
* Fix
* Pin lighteval
* Pin l
---------
Co-authored-by: lewis@huggingface.co <lewis@ip-26-0-160-242.ec2.internal>
2025-02-21 14:52:45 +01:00
lewtun
d76ecc12a2
Add E2B code interpreter reward function ( #364 )
...
* Add stuff
* Make it kind of work
* Add more stuff
* Add fix for parse
* Fix
* Refactor
* Clean up
* Fix config
* Fix sys
* Add SFT config
* Use min rate
* Fix eval
* Add base model
* Add s1k
* Disable eval
* Fix
* Add import checker
* Fix importer
* Fix
* Tune config
* Tune
* Fix
* Fix save
* Tuen beta
* Remove configs
* Fix vLLM
* Fix
* Add note
* Add doc
* doc
* Fix
* Tune lr
* Add command
2025-02-19 11:26:46 +01:00
Edward Beeching
7041fbc9d6
Update setup.py ( #315 )
...
adds peft as a temp dep due to https://github.com/huggingface/trl/issues/2849
2025-02-13 15:04:03 +01:00
Almaz Zinollayev
517adddae3
[Testing Github workflow] Updating workflows and makefile ( #214 )
...
* [Testing Github workflow] Updating workflows and makefile
* [Testing Github workflow] - Refactoring workflow, fixing tests erorr, easier debugging
* [Testing Github workflow] Converting docstring into raw string
* [Testing Github workflow] - Fixing test_zero_max_penalty_returns_zero() test
* [Testing Github workflow] Removing redundant test
2025-02-10 18:28:35 +01:00
lewtun
3519a7fa3d
Remove duplicate math-verify ( #234 )
2025-02-07 20:01:54 +01:00
lewtun
0da0f7cce2
Refactor training configs and unify Slurm for training SFT & GRPO ( #231 )
...
* Refactor Slurm
* Fix
* FML
* Nuke
* Clean
* Fix config
* Fix deps
* Fix logging
2025-02-07 15:56:43 +01:00
Kashif Rasul
250ab46ea1
[GRPO] add cosine reward ( #206 )
...
* add cosine reward
* fix merge
* fix typo
* fix check
2025-02-07 08:10:48 +01:00
lewtun
cec57f3a55
Add GPQA Diamond and fix evaluation deps ( #196 )
...
* Add GPQA Diamond
* Add table
* Fix README
* Up
* Fixes
* Ignore logs
* Fix
* Pin deps
* Fix GRPO
* Add Llama 70B tabels
* Restore dp
* Pin lighteval
* Use bfloat16
* Tune table
* Add note
2025-02-06 15:24:52 +01:00
Lewis
138df0ca44
chore(setup.py): bump vllm>=0.7.1 ( #181 )
...
See https://github.com/huggingface/trl/pull/2766 .
2025-02-05 09:53:31 +01:00
Kashif Rasul
a0d61ccece
use ruff ( #137 )
...
* use ruff
* reformat
* re-run
* update deps
* undo
* Update src/open_r1/configs.py
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Update src/open_r1/configs.py
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Update src/open_r1/configs.py
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Update src/open_r1/configs.py
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* fix help strings
* fix ruff version
* fix formatting
---------
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-01-31 13:36:08 +01:00
Kashif Rasul
c0b53fae29
Grpo slurm scripts ( #112 )
...
* initial grpo.slurm script
* initial zero3 yaml using 1 less gpu
* add completion and promp length
* initial doc
* use main
* fix typo
* remove num_processes
* use vllm 0.7.0
* remove double module load
* update math-verify
* Update README.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* overwrite num_procs in the slurm script
* add vllm args to readme
* update readme
---------
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-01-30 10:22:45 +01:00
Hynek Kydlíček
e2235cf978
Improve repoduction of r1 reported score ( #92 )
...
* bump up deps, fix aime24 evals, make grpo more strict
* minor fixes
* 🤨 fmt
* bump lighteval + set boxed to match first
* remove dead code
* bump lighteval
* add ed's tp branch swtich
---------
Co-authored-by: Hynek Kydlicek <kydlicek.hynek@huggingface.co>
2025-01-29 11:29:05 +01:00
Hynek Kydlíček
90b0947382
Reward verification and evaluation fixes ( #55 )
...
* bump up deps, fix aime24 evals, make grpo more strict
* minor fixes
* 🤨 fmt
* Update src/open_r1/grpo.py
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
---------
Co-authored-by: Hynek Kydlicek <kydlicek.hynek@huggingface.co>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-01-26 18:35:48 +01:00
lewtun
2580fd8c1b
Fix Slurm SFT and gather Slurm scripts ( #19 )
...
* Fix slurm
* Fix generate
* Fix install
* Fix c
2025-01-25 13:47:52 +01:00
Quentin Gallouédec
742cc008b2
Pin main for transformers and trl
2025-01-25 11:07:17 +01:00
Agus
33795e1b5a
Add math-verify to check accuracy of completions on GRPO ( #14 )
...
* Add math-verify to check accuracy of completions on GRPO
* Handle make_conversation
* Update src/open_r1/grpo.py
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Update src/open_r1/grpo.py
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Update src/open_r1/grpo.py
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* fix quality
* Remove unnecesary item access in parsed answer
---------
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-01-25 11:03:58 +01:00
Gabriel Martín Blázquez
02bed5308c
Add synthetic data generation script ( #9 )
...
* Add synthetic data generation script
Co-authored-by: Anton <anton-l@users.noreply.github.com>
Co-authored-by: Agustin <plaguss@users.noreply.github.com>
* Fix format
* Fix imports sorting
---------
Co-authored-by: Anton <anton-l@users.noreply.github.com>
Co-authored-by: Agustin <plaguss@users.noreply.github.com>
2025-01-25 01:42:24 +01:00
lewtun
26184f71ae
Refactor evaluation ( #6 )
2025-01-24 23:46:34 +01:00
Edward Beeching
9c398973e8
Adds Math-500 and AIME24 evals ( #4 )
...
* adds evals
* up max model len
---------
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2025-01-24 23:09:07 +01:00
lewtun
6acc9a0aa0
Add configs and stuff ( #2 )
2025-01-24 20:05:18 +01:00
Quentin Gallouédec
a4bf90465f
Update setup.py ( #1 )
2025-01-24 19:13:04 +01:00
Lewis Tunstall
2ff66e6cde
Add skeleton
2025-01-24 16:50:13 +00:00