Commit graph

47 commits

Author SHA1 Message Date
Lewis Tunstall
3bcc4fc86e Add codeforces 2025-05-28 19:21:15 +00:00
Lewis Tunstall
b369e428f8 Merge branch 'main' into zero-math-code 2025-05-28 09:22:22 +02:00
Lewis Tunstall
b6b1643c2d Fix benchmarks! 2025-05-27 20:44:35 +00:00
lewtun
33f84def0d
Align EOS token ID between tokenizer and generation config (#663)
* Align EOS token ID between tokenizer and generation config

* Fix
2025-05-27 17:20:13 +02:00
Lewis Tunstall
82fb385fa5 Refine tests 2025-05-27 13:39:00 +00:00
Lewis Tunstall
296aa66e1e Tweak format reward 2025-05-27 08:16:49 +00:00
lewtun
5ac5971ea5
Add OpenR1-Distill recipe (#661) 2025-05-26 17:57:44 +02:00
Lewis Tunstall
bc06504df5 Add better baseline defaults 2025-05-26 09:06:09 +00:00
Lewis Tunstall
9862bfec41 Relax reward 2025-05-26 08:09:03 +00:00
Lewis Tunstall
1f56bab96c Tune baseline 2025-05-25 17:22:06 +00:00
Lewis Tunstall
965d451d61 Restore baseline 2025-05-25 17:00:33 +00:00
Lewis Tunstall
31eacc4b9a Use GAS instead of generation 2025-05-25 16:57:33 +00:00
Lewis Tunstall
0b933a2aa4 Restore gas 2025-05-25 16:54:18 +00:00
Lewis Tunstall
cf765df201 Tune baseline 2025-05-25 13:21:01 +00:00
Lewis Tunstall
da0e9ae28d Add overlong punishment 2025-05-25 12:46:45 +00:00
Lewis Tunstall
7f777c0583 Add new DAPO recipe 2025-05-25 12:40:32 +00:00
Guilherme Penedo
c1e1192294
GRPO with codeforces problems (#627)
* add

* update

* updates

* updates #2

* weighted_sum and python fixes

* bugfix

* merging ioi/cf setups

* integrating the morph changes

* move morph_client

* run style

* small changes for mixed languages training

* revert grpo.py changes

* piston readme

* local test fetching

* bug fixes

* updated readme

* style fixes

* style fixes 2

* deps changes

* import sorting

* fix tests

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update src/open_r1/rewards.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

---------

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-05-25 11:55:27 +02:00
Edward Beeching
ea5b7edf22
Add dataset filtering script (#637)
* add dataset filtering script

* remove subset selection

* save wip

* save wip

* update filter script

* refactor to run on chunks

* rename script

* cleanup

* update dapo filtering

* fixes

* dapo filt config

* udpate compute pass rate

* clean

* update readme and config

* add merging snippet
2025-05-16 10:26:49 +02:00
lewtun
50590a41b9
Enable data and tensor parallelism for GRPO (#626)
* Bump deps

* Fix SLurm

* Fix
2025-04-26 11:50:08 +02:00
lewtun
5112bfc401
Fix SFT for base models (#604)
* Fix pad token bug in SFT

* Add ChatML default

* Clean up

* Refactor grpo model load

* Add doc

* Bump deepspeed
2025-04-16 11:45:50 +02:00
lewtun
8cf42663fd
Clean up recipes (#596) 2025-04-11 20:09:15 +02:00
lewtun
04dbf21989
Bump TRL and vLLM (#595)
* Bump TRL and vLLM

* Fix style

* Bump liger

* Add liger
2025-04-11 16:32:33 +02:00
lewtun
ca8664df1c
Fix missing prompt columns in recipes (#574) 2025-04-02 15:48:48 +02:00
lewtun
8000dd2384
[WIP] RL goes brrr (#533)
* Fix vLLM recipes

* Add vllm server to Slurm

* Add overlap across srun

* Fix NUM_NODES

* Refactor TP to script

* fix train script to work withnew  GRPO

* lewis nits

* bump trl, transformers

---------

Co-authored-by: edbeeching <edbeeching@gmail.com>
2025-03-24 15:15:02 +01:00
Guilherme Penedo
7835979801
adds support for running GRPO on IOI problems (#495)
* adds support for running GRPO on IOI problems

* nit

* bugfixes + recipe

* added piston info and readme changes

* readme updates

* run isort to fix checks

* Update src/open_r1/rewards.py

Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com>

* adding ioi test

* fix merge issues with python slow tests

* style

* generalize piston workers

* generalize readme

* fix extract code

* finalize slow tests

---------

Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com>
Co-authored-by: edbeeching <edbeeching@gmail.com>
2025-03-21 08:48:00 +01:00
lewtun
d5922af8ce
Add OlympicCoder recipes (#505)
* Add OlympicCoder recipes

* Fix configs

* Add FSDP config
2025-03-13 19:08:34 +01:00
lewtun
45ccf60109
Remove dataset_configs from YAML recipes (#461) 2025-03-03 13:54:58 +01:00
elie
3ba56c1c3d
Add config sft smollm (#425)
* add sft recipe

* add smollm sft

* max_length modif 1

* max_length modif 2
2025-02-25 21:45:59 +01:00
Edward Beeching
0c3ef8372e
updates max_seq_length to max length due to a bug in trl (#419) 2025-02-24 17:27:56 +01:00
lewtun
566cfd1a44
Align format reward with R1 traces and add reward function to count think / answer tags (#418)
* Fix tests

* Tune

* Add reward

* Apply suggestions from code review

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-02-24 17:16:40 +01:00
elie
5355687e6c
add sft recipe (#415) 2025-02-24 15:43:12 +01:00
lewtun
d76ecc12a2
Add E2B code interpreter reward function (#364)
* Add stuff

* Make it kind of work

* Add more stuff

* Add fix for parse

* Fix

* Refactor

* Clean up

* Fix config

* Fix sys

* Add SFT config

* Use min rate

* Fix eval

* Add base model

* Add s1k

* Disable eval

* Fix

* Add import checker

* Fix importer

* Fix

* Tune config

* Tune

* Fix

* Fix save

* Tuen beta

* Remove configs

* Fix vLLM

* Fix

* Add note

* Add doc

* doc

* Fix

* Tune lr

* Add command
2025-02-19 11:26:46 +01:00
lewtun
78c197df51
Enable chat template and system prompt to be configured during training (#349)
* Enable chat template to be configured

* Add notes to README

* Handle None

* Remove default system prompt

* Fix ST

* Tune hparams

* Fix

* Tune

* Fix
2025-02-18 14:46:43 +01:00
Almaz Zinollayev
698530484c
Adding grpo reward args into yaml files (#337) 2025-02-18 13:10:03 +01:00
Yen-Ting Lin
d5b67f4fe5
Add SFT configuration for Mistral-Small-24B-Instruct-2501 model (#348)
* Add SFT configuration for Mistral-Small-24B-Instruct-2501 model

* Rename config_numina.yaml to config_openr1_math.yaml

---------

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-02-18 08:52:45 +01:00
Kashif Rasul
90a6de94c7
Revert "Weighted reward functions (#213)" (#317)
This reverts commit fbea53267b.
2025-02-13 15:00:05 +01:00
Almaz Zinollayev
fbea53267b
Weighted reward functions (#213)
* [Weighted reward functions] Adding functionality to weigh rewards. Tests.

* [Weighted reward functions] Adding @wraps decorator to preserve reward function metadata

* style

* Changing grpo.py tests to run if cuda is available

* style

* Apply suggestions from code review

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2025-02-13 14:08:27 +01:00
Quentin Gallouédec
52aa8759a2
new grpo logic (#274) 2025-02-11 09:35:06 +01:00
Jinfeng Sun
82b2a6525f
fix(sft recipes): remove duplicate packing option from config (#280) 2025-02-11 09:34:19 +01:00
lewtun
3519a7fa3d
Remove duplicate math-verify (#234) 2025-02-07 20:01:54 +01:00
lewtun
0da0f7cce2
Refactor training configs and unify Slurm for training SFT & GRPO (#231)
* Refactor Slurm

* Fix

* FML

* Nuke

* Clean

* Fix config

* Fix deps

* Fix logging
2025-02-07 15:56:43 +01:00
Quentin Gallouédec
dba152a494
fix config name (#222) 2025-02-07 14:34:46 +01:00
Dongwei Jiang
571661a1e4
Provide a minimal reproducible experiment using GRPO for mathematical reasoning on base model, referencing the approach from SimpleRL-Reason (#197)
* Create config_base_math_smalllr.yaml

* Update README.md

* Update README.md
2025-02-06 11:43:42 +01:00
Jingze Shi
e450a6fbc4
Recipes for optimzing training scripts (#120)
* Add recipe configs to optimize scripts (#73)

* remove small models

* Add README for recipes

* Add README for recipes

* Attempt to resolve conflicts

* Optimize src scripts

* Update recipe of DeepSeek-R1-Distill-Qwen-7B

* Update recipe of Qwen2.5-1.5B

* Updated recipe readme for qwen

* Update training command for recipes

* Update README.md

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

* Update preprocessing_num_workers from 36 to 8

* Add small language model recipes for quickly verify R1

* Fix src code quality

* Add back the Slurm job command

* Remove recipe of doge

* Fix torch_dtype is not used

* fix grpo yaml

* fix grpo yaml

* fix deprecation warning

* fix config folder location

* Remove duplicate variables in grpo.py

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update recipes/qwen/Qwen2.5-1.5B-Instruct/grpo/confg_full.yaml

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-01-31 12:41:53 +01:00
lewtun
ca8f35c143
REFACTOR TO THE MAX (#7) 2025-01-25 00:12:25 +01:00
elie
c421bc893b
Improve sft (#5)
* first commit

* working training

* change model_id

* Update scripts/training/sft.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-01-24 22:23:49 +01:00
lewtun
6acc9a0aa0
Add configs and stuff (#2) 2025-01-24 20:05:18 +01:00