Commit graph

42 commits

Author SHA1 Message Date
Lewis Tunstall
3bcc4fc86e Add codeforces 2025-05-28 19:21:15 +00:00
lewtun
a6b4f668fb
Fix Weka refresh (#666)
* Fix Weka refresh

* Update evaluate.slurm
2025-05-28 13:45:48 +02:00
lewtun
722f144d21
Refresh Weka on Slurm (#662)
* Refresh Weka on Slurm

* Include current working dir
2025-05-27 19:21:15 +02:00
Guilherme Penedo
c1e1192294
GRPO with codeforces problems (#627)
* add

* update

* updates

* updates #2

* weighted_sum and python fixes

* bugfix

* merging ioi/cf setups

* integrating the morph changes

* move morph_client

* run style

* small changes for mixed languages training

* revert grpo.py changes

* piston readme

* local test fetching

* bug fixes

* updated readme

* style fixes

* style fixes 2

* deps changes

* import sorting

* fix tests

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update src/open_r1/rewards.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

---------

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-05-25 11:55:27 +02:00
Edward Beeching
ea5b7edf22
Add dataset filtering script (#637)
* add dataset filtering script

* remove subset selection

* save wip

* save wip

* update filter script

* refactor to run on chunks

* rename script

* cleanup

* update dapo filtering

* fixes

* dapo filt config

* udpate compute pass rate

* clean

* update readme and config

* add merging snippet
2025-05-16 10:26:49 +02:00
lewtun
4fc2a3ff82
Add time to Slurm (#639) 2025-05-09 19:19:51 +02:00
Andrei
af81114044
Code Execution using Morph Cloud (#614)
* initial commit for morphcloud sandbox support

* initial

* fixed prints in morph client for ioi

* updated import

* context manager

* removed unnecessary comments

* more intelligent instance/snapshot management

* update

* Add documentation for Morph integration

* Delete MORPH_INTEGRATION.md

* added retry and modularity to morph client

* updates to kwargs and setup.py

* Update setup.py

* added languages codepath + fixed slurm + added m
orph tests

* make quality formatting fixes

* conditional imports for morph

---------

Co-authored-by: arb8020 <arbeightytwenty@gmail.com>
2025-05-08 08:59:54 +02:00
lewtun
75c3999180
Bump LightEval to enable DP>1 (#629)
* Bump LightEval to enable DP>1

* Remove redundant arg

* Update eval scores

* Fix slurm
2025-04-30 22:02:20 +02:00
lewtun
50590a41b9
Enable data and tensor parallelism for GRPO (#626)
* Bump deps

* Fix SLurm

* Fix
2025-04-26 11:50:08 +02:00
Edward Beeching
715c8787fb
add back grad accumulations steps (#612) 2025-04-17 16:41:39 +02:00
Edward Beeching
3a0e89678c
Fix eval system prompt (#591)
* fix eval system prompt

* style
2025-04-11 11:23:06 +02:00
lewtun
bf08f56849
[WIP] Bump lighteval with proper pass@1 (#584)
* Bump lighteval with proper pass@1

* Bump lighteval

* Update AIME24
2025-04-08 20:53:34 +02:00
Edward Beeching
1b3bf043dc
Adds a E2B router server that executes batches of scripts (#561)
* adds a dedicated e2b server to handle batches of requests

* fix reward tests

* update slow reward

* style

* updates e2b router to be more generic

* refactor

* refactoring

* licence, cleanup

* update tests

* style

* fix import when e2b not present

* style

* rename sandbox file

* rename to RoutedSandbox

* update readme

* nits

* nits2

* unlimited max time

* update logs path
2025-04-07 21:01:06 +02:00
lewtun
8000dd2384
[WIP] RL goes brrr (#533)
* Fix vLLM recipes

* Add vllm server to Slurm

* Add overlap across srun

* Fix NUM_NODES

* Refactor TP to script

* fix train script to work withnew  GRPO

* lewis nits

* bump trl, transformers

---------

Co-authored-by: edbeeching <edbeeching@gmail.com>
2025-03-24 15:15:02 +01:00
Guilherme Penedo
7835979801
adds support for running GRPO on IOI problems (#495)
* adds support for running GRPO on IOI problems

* nit

* bugfixes + recipe

* added piston info and readme changes

* readme updates

* run isort to fix checks

* Update src/open_r1/rewards.py

Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com>

* adding ioi test

* fix merge issues with python slow tests

* style

* generalize piston workers

* generalize readme

* fix extract code

* finalize slow tests

---------

Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com>
Co-authored-by: edbeeching <edbeeching@gmail.com>
2025-03-21 08:48:00 +01:00
lewtun
44cb13d4ba
Fix vLLM (#464) 2025-03-03 17:25:30 +01:00
lewtun
eeca246b07
Update prompt template and sampling parameters for evaluation (#392)
* Pin t

* Pin t

* Set top p

* C

* Tune math prompt

* Improve math prompt

* Update tables
2025-02-22 15:21:01 +01:00
lewtun
9fb45bede6
Fix LightEval commands and dependencies (#386)
* Fix lighteval cmd

* Fix typo

* Pin lighteval

* Hacks to the max

* Fix slurm

* Fix

* Pin lighteval

* Pin l

---------

Co-authored-by: lewis@huggingface.co <lewis@ip-26-0-160-242.ec2.internal>
2025-02-21 14:52:45 +01:00
Edward Beeching
80e7e7b23c
move details script and fix wandb logging (#314) 2025-02-13 11:13:00 +01:00
Anton Lozhkov
440ae0b24e
Add the actual async generation script (#273)
* sglang inference server

* add vllm

* readme

* add a generation script

* ruff
2025-02-10 16:52:23 +01:00
Anton Lozhkov
baec330ef5
Add SGLang inference scripts (#268)
* sglang inference server

* add vllm

* readme
2025-02-10 14:37:58 +01:00
Edward Beeching
cabf27560b
hardcodes num_processes to 7 when using vllm (#264)
* hardcodes num_processes to 7 when using vllm

* nits
2025-02-10 11:43:16 +01:00
lewtun
9be2e9a859
Add retry mechanism for pushing eval results (#252)
The Hub throws 403 errors if there are too many concurrent pushes to the same repo, so we need a retry mechanism when that happens.
2025-02-09 09:44:35 +01:00
lewtun
0da0f7cce2
Refactor training configs and unify Slurm for training SFT & GRPO (#231)
* Refactor Slurm

* Fix

* FML

* Nuke

* Clean

* Fix config

* Fix deps

* Fix logging
2025-02-07 15:56:43 +01:00
lewtun
a60b175aeb
Update CUDA (#209)
* Update CUDA

* Fix

* Remove module

* Restore CUDA

* Move cuda import
2025-02-06 16:31:13 +01:00
lewtun
3fbdeac96c
Fix slurm eval (#208) 2025-02-06 15:46:33 +01:00
lewtun
cec57f3a55
Add GPQA Diamond and fix evaluation deps (#196)
* Add GPQA Diamond

* Add table

* Fix README

* Up

* Fixes

* Ignore logs

* Fix

* Pin deps

* Fix GRPO

* Add Llama 70B tabels

* Restore dp

* Pin lighteval

* Use bfloat16

* Tune table

* Add note
2025-02-06 15:24:52 +01:00
Edward Beeching
3fd56dc7b4
fix uv env path + details (#188)
* fix uv env path + details

* Update slurm/grpo.slurm

---------

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-02-05 23:59:25 +01:00
Jingze Shi
e450a6fbc4
Recipes for optimzing training scripts (#120)
* Add recipe configs to optimize scripts (#73)

* remove small models

* Add README for recipes

* Add README for recipes

* Attempt to resolve conflicts

* Optimize src scripts

* Update recipe of DeepSeek-R1-Distill-Qwen-7B

* Update recipe of Qwen2.5-1.5B

* Updated recipe readme for qwen

* Update training command for recipes

* Update README.md

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

* Update preprocessing_num_workers from 36 to 8

* Add small language model recipes for quickly verify R1

* Fix src code quality

* Add back the Slurm job command

* Remove recipe of doge

* Fix torch_dtype is not used

* fix grpo yaml

* fix grpo yaml

* fix deprecation warning

* fix config folder location

* Remove duplicate variables in grpo.py

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update recipes/qwen/Qwen2.5-1.5B-Instruct/grpo/confg_full.yaml

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-01-31 12:41:53 +01:00
Kashif Rasul
c0b53fae29
Grpo slurm scripts (#112)
* initial grpo.slurm script

* initial zero3 yaml using 1 less gpu

* add completion and promp length

* initial doc

* use main

* fix typo

* remove num_processes

* use vllm 0.7.0

* remove double module load

* update math-verify

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* overwrite num_procs in the slurm script

* add vllm args to readme

* update readme

---------

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-01-30 10:22:45 +01:00
Edward Beeching
972e47eff0
Adds auto eval callbacks (#115)
* adds auto eval callbacks

* updates training scripts with callbacks

* style

* date

* update gitignore with logs, eval results, etc

* remove unused imports

* nits
2025-01-30 09:39:47 +01:00
Hynek Kydlíček
e2235cf978
Improve repoduction of r1 reported score (#92)
* bump up deps, fix aime24 evals, make grpo more strict

* minor fixes

* 🤨 fmt

* bump lighteval + set boxed to match first

* remove dead code

* bump lighteval

* add ed's tp branch swtich

---------

Co-authored-by: Hynek Kydlicek <kydlicek.hynek@huggingface.co>
2025-01-29 11:29:05 +01:00
Gabriel Martín Blázquez
c5941ed5e4
Add --timeout, --retries, --prompt-template and TP and PP by slurm variables (#94)
* Set TP and PP using slurm variables

* Add `--timeout` argument

* add `--prompt-template` argument

* Group generations

* Add `--retries` argument
2025-01-28 18:30:04 +01:00
Gabriel Martín Blázquez
b03480d868
Add --input-batch-size, --client-replicas args and download Ray logs (#71)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-01-27 14:58:13 +01:00
Edward Beeching
feb59d2b42
Update evaluate.slurm
typo in eval slurm
2025-01-27 09:43:02 +01:00
Hynek Kydlíček
90b0947382
Reward verification and evaluation fixes (#55)
* bump up deps, fix aime24 evals, make grpo more strict

* minor fixes

* 🤨 fmt

* Update src/open_r1/grpo.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

---------

Co-authored-by: Hynek Kydlicek <kydlicek.hynek@huggingface.co>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-01-26 18:35:48 +01:00
Anton Lozhkov
15df4fb134
vllm speed tweaks (#43) 2025-01-26 01:59:50 +01:00
elie
64c0ed2254
fix evaluate.slurm (#27) 2025-01-25 15:28:49 +01:00
elie
f169d2cd8e
add evaluate.slurm (#26) 2025-01-25 15:23:16 +01:00
Gabriel Martín Blázquez
a90b99686a
Fix passing vLLM server URL (#21)
* Use head node ip as vLLM server url

* Pass correct server url

* Add num_generations argument

* Fix style

* Remove `select`

---------

Co-authored-by: plaguss <agustin@argilla.io>
2025-01-25 15:01:15 +01:00
elie
43cb6a0e0f
fix sft.slurm 2025-01-25 14:48:45 +01:00
lewtun
2580fd8c1b
Fix Slurm SFT and gather Slurm scripts (#19)
* Fix slurm

* Fix generate

* Fix install

* Fix c
2025-01-25 13:47:52 +01:00