Commit graph

43 commits

Author SHA1 Message Date
Lewis Tunstall
f6a07648e2 Bump vLLM and TRL 2025-05-28 06:48:01 +00:00
lewtun
9eef995b4d
Bump deps (#656) 2025-05-27 15:38:21 +02:00
lewtun
57e85b522f
Add better logging defaults for GRPO (#657) 2025-05-25 13:24:52 +02:00
Guilherme Penedo
c1e1192294
GRPO with codeforces problems (#627)
* add

* update

* updates

* updates #2

* weighted_sum and python fixes

* bugfix

* merging ioi/cf setups

* integrating the morph changes

* move morph_client

* run style

* small changes for mixed languages training

* revert grpo.py changes

* piston readme

* local test fetching

* bug fixes

* updated readme

* style fixes

* style fixes 2

* deps changes

* import sorting

* fix tests

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update src/open_r1/rewards.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

---------

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-05-25 11:55:27 +02:00
lewtun
db2d9b011a
Bump lower bound on liger-kernel (#654)
Related to https://github.com/huggingface/open-r1/pull/653

(I forgot to include this in that PR)
2025-05-22 08:44:13 +02:00
lewtun
8067149e90
Bump DeepSpeed to 0.16.8 to fix OOM on Qwen3 (#653) 2025-05-21 22:25:57 +02:00
lewtun
ebd5913a85
Bump LightEval (#643) 2025-05-16 10:52:05 +02:00
lewtun
c802f00512
Use pass@1 for all evals (#633)
* Use pass@1 for all evals

* Update scores
2025-05-09 17:42:36 +02:00
Andrei
af81114044
Code Execution using Morph Cloud (#614)
* initial commit for morphcloud sandbox support

* initial

* fixed prints in morph client for ioi

* updated import

* context manager

* removed unnecessary comments

* more intelligent instance/snapshot management

* update

* Add documentation for Morph integration

* Delete MORPH_INTEGRATION.md

* added retry and modularity to morph client

* updates to kwargs and setup.py

* Update setup.py

* added languages codepath + fixed slurm + added m
orph tests

* make quality formatting fixes

* conditional imports for morph

---------

Co-authored-by: arb8020 <arbeightytwenty@gmail.com>
2025-05-08 08:59:54 +02:00
lewtun
52520a6713
Fix style (#631)
* Fix style

* Fix

* Add jieba
2025-05-05 15:49:10 +02:00
lewtun
75c3999180
Bump LightEval to enable DP>1 (#629)
* Bump LightEval to enable DP>1

* Remove redundant arg

* Update eval scores

* Fix slurm
2025-04-30 22:02:20 +02:00
lewtun
50590a41b9
Enable data and tensor parallelism for GRPO (#626)
* Bump deps

* Fix SLurm

* Fix
2025-04-26 11:50:08 +02:00
lewtun
5112bfc401
Fix SFT for base models (#604)
* Fix pad token bug in SFT

* Add ChatML default

* Clean up

* Refactor grpo model load

* Add doc

* Bump deepspeed
2025-04-16 11:45:50 +02:00
lewtun
04dbf21989
Bump TRL and vLLM (#595)
* Bump TRL and vLLM

* Fix style

* Bump liger

* Add liger
2025-04-11 16:32:33 +02:00
lewtun
bf08f56849
[WIP] Bump lighteval with proper pass@1 (#584)
* Bump lighteval with proper pass@1

* Bump lighteval

* Update AIME24
2025-04-08 20:53:34 +02:00
Edward Beeching
9915e06f1e
Async code reward fixes (#546)
* expose num parallel code executions

* add e2b benchmarking script

* adds new parallel code execution with better execption handling

* style

* update default

* increase sandbox timeout

* Add pretty table and Sandbox IDs

* Add Sandbox ID

* fix merge

---------

Co-authored-by: Lewis Tunstall <lewis.c.tunstall@gmail.com>
2025-03-28 14:08:15 +01:00
lewtun
8000dd2384
[WIP] RL goes brrr (#533)
* Fix vLLM recipes

* Add vllm server to Slurm

* Add overlap across srun

* Fix NUM_NODES

* Refactor TP to script

* fix train script to work withnew  GRPO

* lewis nits

* bump trl, transformers

---------

Co-authored-by: edbeeching <edbeeching@gmail.com>
2025-03-24 15:15:02 +01:00
Edward Beeching
8782fa6e90
bump lighteval, expose the lcb_v4 benchmark (#441) 2025-02-26 17:59:44 +01:00
Edward Beeching
a20666d5b5
Bumps TRL (#437) 2025-02-26 10:35:50 +01:00
lewtun
3f9d75a595
Bump Liger kernel (#399)
Needed to enable SFT training via https://github.com/huggingface/trl/pull/2874
2025-02-23 17:44:03 +01:00
lewtun
49d9b741a5
Pin dependencies (#393) 2025-02-22 14:46:09 +01:00
lewtun
9fb45bede6
Fix LightEval commands and dependencies (#386)
* Fix lighteval cmd

* Fix typo

* Pin lighteval

* Hacks to the max

* Fix slurm

* Fix

* Pin lighteval

* Pin l

---------

Co-authored-by: lewis@huggingface.co <lewis@ip-26-0-160-242.ec2.internal>
2025-02-21 14:52:45 +01:00
lewtun
d76ecc12a2
Add E2B code interpreter reward function (#364)
* Add stuff

* Make it kind of work

* Add more stuff

* Add fix for parse

* Fix

* Refactor

* Clean up

* Fix config

* Fix sys

* Add SFT config

* Use min rate

* Fix eval

* Add base model

* Add s1k

* Disable eval

* Fix

* Add import checker

* Fix importer

* Fix

* Tune config

* Tune

* Fix

* Fix save

* Tuen beta

* Remove configs

* Fix vLLM

* Fix

* Add note

* Add doc

* doc

* Fix

* Tune lr

* Add command
2025-02-19 11:26:46 +01:00
Edward Beeching
7041fbc9d6
Update setup.py (#315)
adds peft as a temp dep due to https://github.com/huggingface/trl/issues/2849
2025-02-13 15:04:03 +01:00
Almaz Zinollayev
517adddae3
[Testing Github workflow] Updating workflows and makefile (#214)
* [Testing Github workflow] Updating workflows and makefile

* [Testing Github workflow] - Refactoring workflow, fixing tests erorr, easier debugging

* [Testing Github workflow] Converting docstring into raw string

* [Testing Github workflow] - Fixing test_zero_max_penalty_returns_zero() test

* [Testing Github workflow] Removing redundant test
2025-02-10 18:28:35 +01:00
lewtun
3519a7fa3d
Remove duplicate math-verify (#234) 2025-02-07 20:01:54 +01:00
lewtun
0da0f7cce2
Refactor training configs and unify Slurm for training SFT & GRPO (#231)
* Refactor Slurm

* Fix

* FML

* Nuke

* Clean

* Fix config

* Fix deps

* Fix logging
2025-02-07 15:56:43 +01:00
Kashif Rasul
250ab46ea1
[GRPO] add cosine reward (#206)
* add cosine reward

* fix merge

* fix typo

* fix check
2025-02-07 08:10:48 +01:00
lewtun
cec57f3a55
Add GPQA Diamond and fix evaluation deps (#196)
* Add GPQA Diamond

* Add table

* Fix README

* Up

* Fixes

* Ignore logs

* Fix

* Pin deps

* Fix GRPO

* Add Llama 70B tabels

* Restore dp

* Pin lighteval

* Use bfloat16

* Tune table

* Add note
2025-02-06 15:24:52 +01:00
Lewis
138df0ca44
chore(setup.py): bump vllm>=0.7.1 (#181)
See https://github.com/huggingface/trl/pull/2766.
2025-02-05 09:53:31 +01:00
Kashif Rasul
a0d61ccece
use ruff (#137)
* use ruff

* reformat

* re-run

* update deps

* undo

* Update src/open_r1/configs.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/open_r1/configs.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/open_r1/configs.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/open_r1/configs.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* fix help strings

* fix ruff version

* fix formatting

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-01-31 13:36:08 +01:00
Kashif Rasul
c0b53fae29
Grpo slurm scripts (#112)
* initial grpo.slurm script

* initial zero3 yaml using 1 less gpu

* add completion and promp length

* initial doc

* use main

* fix typo

* remove num_processes

* use vllm 0.7.0

* remove double module load

* update math-verify

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* overwrite num_procs in the slurm script

* add vllm args to readme

* update readme

---------

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-01-30 10:22:45 +01:00
Hynek Kydlíček
e2235cf978
Improve repoduction of r1 reported score (#92)
* bump up deps, fix aime24 evals, make grpo more strict

* minor fixes

* 🤨 fmt

* bump lighteval + set boxed to match first

* remove dead code

* bump lighteval

* add ed's tp branch swtich

---------

Co-authored-by: Hynek Kydlicek <kydlicek.hynek@huggingface.co>
2025-01-29 11:29:05 +01:00
Hynek Kydlíček
90b0947382
Reward verification and evaluation fixes (#55)
* bump up deps, fix aime24 evals, make grpo more strict

* minor fixes

* 🤨 fmt

* Update src/open_r1/grpo.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

---------

Co-authored-by: Hynek Kydlicek <kydlicek.hynek@huggingface.co>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-01-26 18:35:48 +01:00
lewtun
2580fd8c1b
Fix Slurm SFT and gather Slurm scripts (#19)
* Fix slurm

* Fix generate

* Fix install

* Fix c
2025-01-25 13:47:52 +01:00
Quentin Gallouédec
742cc008b2
Pin main for transformers and trl 2025-01-25 11:07:17 +01:00
Agus
33795e1b5a
Add math-verify to check accuracy of completions on GRPO (#14)
* Add math-verify to check accuracy of completions on GRPO

* Handle make_conversation

* Update src/open_r1/grpo.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/open_r1/grpo.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/open_r1/grpo.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* fix quality

* Remove unnecesary item access in parsed answer

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-01-25 11:03:58 +01:00
Gabriel Martín Blázquez
02bed5308c
Add synthetic data generation script (#9)
* Add synthetic data generation script

Co-authored-by: Anton <anton-l@users.noreply.github.com>
Co-authored-by: Agustin <plaguss@users.noreply.github.com>

* Fix format

* Fix imports sorting

---------

Co-authored-by: Anton <anton-l@users.noreply.github.com>
Co-authored-by: Agustin <plaguss@users.noreply.github.com>
2025-01-25 01:42:24 +01:00
lewtun
26184f71ae
Refactor evaluation (#6) 2025-01-24 23:46:34 +01:00
Edward Beeching
9c398973e8
Adds Math-500 and AIME24 evals (#4)
* adds evals

* up max model len

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2025-01-24 23:09:07 +01:00
lewtun
6acc9a0aa0
Add configs and stuff (#2) 2025-01-24 20:05:18 +01:00
Quentin Gallouédec
a4bf90465f
Update setup.py (#1) 2025-01-24 19:13:04 +01:00
Lewis Tunstall
2ff66e6cde Add skeleton 2025-01-24 16:50:13 +00:00