Commit graph

79 commits

Author SHA1 Message Date
Lewis Tunstall
97b1c22e55 Merge branch 'bump-deps-0' into zero-math-code 2025-05-28 10:11:06 +02:00
Lewis Tunstall
f6a07648e2 Bump vLLM and TRL 2025-05-28 06:48:01 +00:00
lewtun
33f84def0d
Align EOS token ID between tokenizer and generation config (#663)
* Align EOS token ID between tokenizer and generation config

* Fix
2025-05-27 17:20:13 +02:00
Lewis Tunstall
82fb385fa5 Refine tests 2025-05-27 13:39:00 +00:00
lewtun
5ac5971ea5
Add OpenR1-Distill recipe (#661) 2025-05-26 17:57:44 +02:00
Guilherme Penedo
c1e1192294
GRPO with codeforces problems (#627)
* add

* update

* updates

* updates #2

* weighted_sum and python fixes

* bugfix

* merging ioi/cf setups

* integrating the morph changes

* move morph_client

* run style

* small changes for mixed languages training

* revert grpo.py changes

* piston readme

* local test fetching

* bug fixes

* updated readme

* style fixes

* style fixes 2

* deps changes

* import sorting

* fix tests

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update src/open_r1/rewards.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

---------

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-05-25 11:55:27 +02:00
lewtun
9366aa2df3
Add dataset mixer (#647)
* Prototype

* Clean up

* Refactor

* Add tests

* Add doc and make scripts work

* Tune doc

* Up

* Tune

* Add column verification

* Fix types

* Fix YAML

* Fix types

* Fix doc

* f

* f
2025-05-20 11:40:42 +02:00
Quentin Gallouédec
5e0c210f9c
use hf papers (#646) 2025-05-19 13:48:14 +02:00
Edward Beeching
ea5b7edf22
Add dataset filtering script (#637)
* add dataset filtering script

* remove subset selection

* save wip

* save wip

* update filter script

* refactor to run on chunks

* rename script

* cleanup

* update dapo filtering

* fixes

* dapo filt config

* udpate compute pass rate

* clean

* update readme and config

* add merging snippet
2025-05-16 10:26:49 +02:00
lewtun
c802f00512
Use pass@1 for all evals (#633)
* Use pass@1 for all evals

* Update scores
2025-05-09 17:42:36 +02:00
Andrei
af81114044
Code Execution using Morph Cloud (#614)
* initial commit for morphcloud sandbox support

* initial

* fixed prints in morph client for ioi

* updated import

* context manager

* removed unnecessary comments

* more intelligent instance/snapshot management

* update

* Add documentation for Morph integration

* Delete MORPH_INTEGRATION.md

* added retry and modularity to morph client

* updates to kwargs and setup.py

* Update setup.py

* added languages codepath + fixed slurm + added m
orph tests

* make quality formatting fixes

* conditional imports for morph

---------

Co-authored-by: arb8020 <arbeightytwenty@gmail.com>
2025-05-08 08:59:54 +02:00
lewtun
9373ad3055
Update README.md 2025-04-30 22:16:18 +02:00
lewtun
75c3999180
Bump LightEval to enable DP>1 (#629)
* Bump LightEval to enable DP>1

* Remove redundant arg

* Update eval scores

* Fix slurm
2025-04-30 22:02:20 +02:00
lewtun
50590a41b9
Enable data and tensor parallelism for GRPO (#626)
* Bump deps

* Fix SLurm

* Fix
2025-04-26 11:50:08 +02:00
lewtun
5112bfc401
Fix SFT for base models (#604)
* Fix pad token bug in SFT

* Add ChatML default

* Clean up

* Refactor grpo model load

* Add doc

* Bump deepspeed
2025-04-16 11:45:50 +02:00
lewtun
04dbf21989
Bump TRL and vLLM (#595)
* Bump TRL and vLLM

* Fix style

* Bump liger

* Add liger
2025-04-11 16:32:33 +02:00
Shenghang Tsai
2a7bb45f05
Update README.md (#590) 2025-04-10 13:11:35 +02:00
lewtun
bf08f56849
[WIP] Bump lighteval with proper pass@1 (#584)
* Bump lighteval with proper pass@1

* Bump lighteval

* Update AIME24
2025-04-08 20:53:34 +02:00
Edward Beeching
1b3bf043dc
Adds a E2B router server that executes batches of scripts (#561)
* adds a dedicated e2b server to handle batches of requests

* fix reward tests

* update slow reward

* style

* updates e2b router to be more generic

* refactor

* refactoring

* licence, cleanup

* update tests

* style

* fix import when e2b not present

* style

* rename sandbox file

* rename to RoutedSandbox

* update readme

* nits

* nits2

* unlimited max time

* update logs path
2025-04-07 21:01:06 +02:00
lewtun
4ec555b0c8
Restore single-node instructions to run GRPO (#549) 2025-03-27 10:29:07 +01:00
lewtun
8000dd2384
[WIP] RL goes brrr (#533)
* Fix vLLM recipes

* Add vllm server to Slurm

* Add overlap across srun

* Fix NUM_NODES

* Refactor TP to script

* fix train script to work withnew  GRPO

* lewis nits

* bump trl, transformers

---------

Co-authored-by: edbeeching <edbeeching@gmail.com>
2025-03-24 15:15:02 +01:00
Guilherme Penedo
7835979801
adds support for running GRPO on IOI problems (#495)
* adds support for running GRPO on IOI problems

* nit

* bugfixes + recipe

* added piston info and readme changes

* readme updates

* run isort to fix checks

* Update src/open_r1/rewards.py

Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com>

* adding ioi test

* fix merge issues with python slow tests

* style

* generalize piston workers

* generalize readme

* fix extract code

* finalize slow tests

---------

Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com>
Co-authored-by: edbeeching <edbeeching@gmail.com>
2025-03-21 08:48:00 +01:00
koskotheim
d436b7b9c0
fix typo (#507) 2025-03-15 20:56:14 +01:00
lewtun
d5922af8ce
Add OlympicCoder recipes (#505)
* Add OlympicCoder recipes

* Fix configs

* Add FSDP config
2025-03-13 19:08:34 +01:00
lewtun
3b5d6603bf
Add citation and acknowledgements (#481)
* Update README.md

* Update README.md

* Update README.md
2025-03-05 20:23:57 +01:00
lewtun
44cb13d4ba
Fix vLLM (#464) 2025-03-03 17:25:30 +01:00
Marco Z
c7733d3fa4
update makefile and readme (#449)
Co-authored-by: Marco Zocca <marco.zocca@unfoldml.com>
2025-03-01 15:08:30 +01:00
Agus
7188001281
Add script to decontaminate datasets against benchmark datasets (#416)
* Add script to decontaminate datasets against benchmark datasets

* Add docs for the decontamination script

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update scripts/decontaminate.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update scripts/decontaminate.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update scripts/decontaminate.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update scripts/decontaminate.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update scripts/decontaminate.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Add license header and attribution to the authors

---------

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-02-24 19:54:44 +01:00
lewtun
eeca246b07
Update prompt template and sampling parameters for evaluation (#392)
* Pin t

* Pin t

* Set top p

* C

* Tune math prompt

* Improve math prompt

* Update tables
2025-02-22 15:21:01 +01:00
lewtun
9fb45bede6
Fix LightEval commands and dependencies (#386)
* Fix lighteval cmd

* Fix typo

* Pin lighteval

* Hacks to the max

* Fix slurm

* Fix

* Pin lighteval

* Pin l

---------

Co-authored-by: lewis@huggingface.co <lewis@ip-26-0-160-242.ec2.internal>
2025-02-21 14:52:45 +01:00
lewtun
d76ecc12a2
Add E2B code interpreter reward function (#364)
* Add stuff

* Make it kind of work

* Add more stuff

* Add fix for parse

* Fix

* Refactor

* Clean up

* Fix config

* Fix sys

* Add SFT config

* Use min rate

* Fix eval

* Add base model

* Add s1k

* Disable eval

* Fix

* Add import checker

* Fix importer

* Fix

* Tune config

* Tune

* Fix

* Fix save

* Tuen beta

* Remove configs

* Fix vLLM

* Fix

* Add note

* Add doc

* doc

* Fix

* Tune lr

* Add command
2025-02-19 11:26:46 +01:00
Agus
740a7a4305
Add LiveCodeBench's codegeneration task from lighteval (#346)
* Add lcb:codegeneration task from ligtheval

* Add results from R1 Qwen 32B
2025-02-19 08:32:33 +01:00
lewtun
78c197df51
Enable chat template and system prompt to be configured during training (#349)
* Enable chat template to be configured

* Add notes to README

* Handle None

* Remove default system prompt

* Fix ST

* Tune hparams

* Fix

* Tune

* Fix
2025-02-18 14:46:43 +01:00
Edward Beeching
f987b3c877
bump vllm to version to 0.7.2 (#311)
VLLM has made a number of throughput improvements in version 0.7.2, so it's worth bumping the version, particularly for GRPO training runs.
2025-02-13 10:48:11 +01:00
lewtun
96a6b0fa33
Enable Weights & Biases defaults to be overridden in training (#294)
* Enable WandB defaults to be set

* Fix
2025-02-12 13:01:07 +01:00
Lewis
db19392bef
chore(README): fix link, consistent formatting for CUDA warning (#248)
low priority & cosmetic
2025-02-09 09:45:38 +01:00
Ty Feng
90c1bfe829
Fix README: Correct recipes path and missing --config option (#247)
* Fix incorrect recipes path in README

* Fix missing --config option and incorrect recipes path

* Fix missing --config option and incorrect recipes path
2025-02-09 08:21:35 +01:00
Xu Song
f5f0b55dc4
Fix typo (#241) 2025-02-08 10:28:11 +01:00
lewtun
0da0f7cce2
Refactor training configs and unify Slurm for training SFT & GRPO (#231)
* Refactor Slurm

* Fix

* FML

* Nuke

* Clean

* Fix config

* Fix deps

* Fix logging
2025-02-07 15:56:43 +01:00
Quentin Gallouédec
dba152a494
fix config name (#222) 2025-02-07 14:34:46 +01:00
lewtun
c4227d6220
Update README.md (#211) 2025-02-06 16:40:09 +01:00
lewtun
a60b175aeb
Update CUDA (#209)
* Update CUDA

* Fix

* Remove module

* Restore CUDA

* Move cuda import
2025-02-06 16:31:13 +01:00
lewtun
cec57f3a55
Add GPQA Diamond and fix evaluation deps (#196)
* Add GPQA Diamond

* Add table

* Fix README

* Up

* Fixes

* Ignore logs

* Fix

* Pin deps

* Fix GRPO

* Add Llama 70B tabels

* Restore dp

* Pin lighteval

* Use bfloat16

* Tune table

* Add note
2025-02-06 15:24:52 +01:00
Dongwei Jiang
571661a1e4
Provide a minimal reproducible experiment using GRPO for mathematical reasoning on base model, referencing the approach from SimpleRL-Reason (#197)
* Create config_base_math_smalllr.yaml

* Update README.md

* Update README.md
2025-02-06 11:43:42 +01:00
Jingze Shi
e450a6fbc4
Recipes for optimzing training scripts (#120)
* Add recipe configs to optimize scripts (#73)

* remove small models

* Add README for recipes

* Add README for recipes

* Attempt to resolve conflicts

* Optimize src scripts

* Update recipe of DeepSeek-R1-Distill-Qwen-7B

* Update recipe of Qwen2.5-1.5B

* Updated recipe readme for qwen

* Update training command for recipes

* Update README.md

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

* Update preprocessing_num_workers from 36 to 8

* Add small language model recipes for quickly verify R1

* Fix src code quality

* Add back the Slurm job command

* Remove recipe of doge

* Fix torch_dtype is not used

* fix grpo yaml

* fix grpo yaml

* fix deprecation warning

* fix config folder location

* Remove duplicate variables in grpo.py

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update recipes/qwen/Qwen2.5-1.5B-Instruct/grpo/confg_full.yaml

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-01-31 12:41:53 +01:00
Dongwei Jiang
22512e62bc
Update README.md (#132) 2025-01-31 11:27:17 +01:00
Sam Schorb
356f6a5c4f
Add Table of Contents to README for easier navigation (#125)
* Update README.md

* Update README.md
2025-01-30 16:32:13 +01:00
Kashif Rasul
c0b53fae29
Grpo slurm scripts (#112)
* initial grpo.slurm script

* initial zero3 yaml using 1 less gpu

* add completion and promp length

* initial doc

* use main

* fix typo

* remove num_processes

* use vllm 0.7.0

* remove double module load

* update math-verify

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* overwrite num_procs in the slurm script

* add vllm args to readme

* update readme

---------

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-01-30 10:22:45 +01:00
Lewis
fb1b4c4e3f
docs(README): note about CUDA 12.1 (#121)
will segfault for CUDA 14.1 under certain conditions; instructions are specific to 12.1

- fixes #106 
- fixes #117
2025-01-30 08:42:43 +01:00
Edward Beeching
bd0e15bfb5
Update README.md (#93) 2025-01-30 00:42:29 +01:00