Lewis Tunstall
|
3bcc4fc86e
|
Add codeforces
|
2025-05-28 19:21:15 +00:00 |
|
Lewis Tunstall
|
b369e428f8
|
Merge branch 'main' into zero-math-code
|
2025-05-28 09:22:22 +02:00 |
|
Lewis Tunstall
|
b6b1643c2d
|
Fix benchmarks!
|
2025-05-27 20:44:35 +00:00 |
|
lewtun
|
33f84def0d
|
Align EOS token ID between tokenizer and generation config (#663)
* Align EOS token ID between tokenizer and generation config
* Fix
|
2025-05-27 17:20:13 +02:00 |
|
Lewis Tunstall
|
82fb385fa5
|
Refine tests
|
2025-05-27 13:39:00 +00:00 |
|
Lewis Tunstall
|
296aa66e1e
|
Tweak format reward
|
2025-05-27 08:16:49 +00:00 |
|
lewtun
|
5ac5971ea5
|
Add OpenR1-Distill recipe (#661)
|
2025-05-26 17:57:44 +02:00 |
|
Lewis Tunstall
|
bc06504df5
|
Add better baseline defaults
|
2025-05-26 09:06:09 +00:00 |
|
Lewis Tunstall
|
9862bfec41
|
Relax reward
|
2025-05-26 08:09:03 +00:00 |
|
Lewis Tunstall
|
1f56bab96c
|
Tune baseline
|
2025-05-25 17:22:06 +00:00 |
|
Lewis Tunstall
|
965d451d61
|
Restore baseline
|
2025-05-25 17:00:33 +00:00 |
|
Lewis Tunstall
|
31eacc4b9a
|
Use GAS instead of generation
|
2025-05-25 16:57:33 +00:00 |
|
Lewis Tunstall
|
0b933a2aa4
|
Restore gas
|
2025-05-25 16:54:18 +00:00 |
|
Lewis Tunstall
|
cf765df201
|
Tune baseline
|
2025-05-25 13:21:01 +00:00 |
|
Lewis Tunstall
|
da0e9ae28d
|
Add overlong punishment
|
2025-05-25 12:46:45 +00:00 |
|
Lewis Tunstall
|
7f777c0583
|
Add new DAPO recipe
|
2025-05-25 12:40:32 +00:00 |
|
Guilherme Penedo
|
c1e1192294
|
GRPO with codeforces problems (#627)
* add
* update
* updates
* updates #2
* weighted_sum and python fixes
* bugfix
* merging ioi/cf setups
* integrating the morph changes
* move morph_client
* run style
* small changes for mixed languages training
* revert grpo.py changes
* piston readme
* local test fetching
* bug fixes
* updated readme
* style fixes
* style fixes 2
* deps changes
* import sorting
* fix tests
* Update README.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update README.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update src/open_r1/rewards.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
---------
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
|
2025-05-25 11:55:27 +02:00 |
|
Edward Beeching
|
ea5b7edf22
|
Add dataset filtering script (#637)
* add dataset filtering script
* remove subset selection
* save wip
* save wip
* update filter script
* refactor to run on chunks
* rename script
* cleanup
* update dapo filtering
* fixes
* dapo filt config
* udpate compute pass rate
* clean
* update readme and config
* add merging snippet
|
2025-05-16 10:26:49 +02:00 |
|
lewtun
|
50590a41b9
|
Enable data and tensor parallelism for GRPO (#626)
* Bump deps
* Fix SLurm
* Fix
|
2025-04-26 11:50:08 +02:00 |
|
lewtun
|
5112bfc401
|
Fix SFT for base models (#604)
* Fix pad token bug in SFT
* Add ChatML default
* Clean up
* Refactor grpo model load
* Add doc
* Bump deepspeed
|
2025-04-16 11:45:50 +02:00 |
|
lewtun
|
8cf42663fd
|
Clean up recipes (#596)
|
2025-04-11 20:09:15 +02:00 |
|
lewtun
|
04dbf21989
|
Bump TRL and vLLM (#595)
* Bump TRL and vLLM
* Fix style
* Bump liger
* Add liger
|
2025-04-11 16:32:33 +02:00 |
|
lewtun
|
ca8664df1c
|
Fix missing prompt columns in recipes (#574)
|
2025-04-02 15:48:48 +02:00 |
|
lewtun
|
8000dd2384
|
[WIP] RL goes brrr (#533)
* Fix vLLM recipes
* Add vllm server to Slurm
* Add overlap across srun
* Fix NUM_NODES
* Refactor TP to script
* fix train script to work withnew GRPO
* lewis nits
* bump trl, transformers
---------
Co-authored-by: edbeeching <edbeeching@gmail.com>
|
2025-03-24 15:15:02 +01:00 |
|
Guilherme Penedo
|
7835979801
|
adds support for running GRPO on IOI problems (#495)
* adds support for running GRPO on IOI problems
* nit
* bugfixes + recipe
* added piston info and readme changes
* readme updates
* run isort to fix checks
* Update src/open_r1/rewards.py
Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com>
* adding ioi test
* fix merge issues with python slow tests
* style
* generalize piston workers
* generalize readme
* fix extract code
* finalize slow tests
---------
Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com>
Co-authored-by: edbeeching <edbeeching@gmail.com>
|
2025-03-21 08:48:00 +01:00 |
|
lewtun
|
d5922af8ce
|
Add OlympicCoder recipes (#505)
* Add OlympicCoder recipes
* Fix configs
* Add FSDP config
|
2025-03-13 19:08:34 +01:00 |
|
lewtun
|
45ccf60109
|
Remove dataset_configs from YAML recipes (#461)
|
2025-03-03 13:54:58 +01:00 |
|
elie
|
3ba56c1c3d
|
Add config sft smollm (#425)
* add sft recipe
* add smollm sft
* max_length modif 1
* max_length modif 2
|
2025-02-25 21:45:59 +01:00 |
|
Edward Beeching
|
0c3ef8372e
|
updates max_seq_length to max length due to a bug in trl (#419)
|
2025-02-24 17:27:56 +01:00 |
|
lewtun
|
566cfd1a44
|
Align format reward with R1 traces and add reward function to count think / answer tags (#418)
* Fix tests
* Tune
* Add reward
* Apply suggestions from code review
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
---------
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
|
2025-02-24 17:16:40 +01:00 |
|
elie
|
5355687e6c
|
add sft recipe (#415)
|
2025-02-24 15:43:12 +01:00 |
|
lewtun
|
d76ecc12a2
|
Add E2B code interpreter reward function (#364)
* Add stuff
* Make it kind of work
* Add more stuff
* Add fix for parse
* Fix
* Refactor
* Clean up
* Fix config
* Fix sys
* Add SFT config
* Use min rate
* Fix eval
* Add base model
* Add s1k
* Disable eval
* Fix
* Add import checker
* Fix importer
* Fix
* Tune config
* Tune
* Fix
* Fix save
* Tuen beta
* Remove configs
* Fix vLLM
* Fix
* Add note
* Add doc
* doc
* Fix
* Tune lr
* Add command
|
2025-02-19 11:26:46 +01:00 |
|
lewtun
|
78c197df51
|
Enable chat template and system prompt to be configured during training (#349)
* Enable chat template to be configured
* Add notes to README
* Handle None
* Remove default system prompt
* Fix ST
* Tune hparams
* Fix
* Tune
* Fix
|
2025-02-18 14:46:43 +01:00 |
|
Almaz Zinollayev
|
698530484c
|
Adding grpo reward args into yaml files (#337)
|
2025-02-18 13:10:03 +01:00 |
|
Yen-Ting Lin
|
d5b67f4fe5
|
Add SFT configuration for Mistral-Small-24B-Instruct-2501 model (#348)
* Add SFT configuration for Mistral-Small-24B-Instruct-2501 model
* Rename config_numina.yaml to config_openr1_math.yaml
---------
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
|
2025-02-18 08:52:45 +01:00 |
|
Kashif Rasul
|
90a6de94c7
|
Revert "Weighted reward functions (#213)" (#317)
This reverts commit fbea53267b.
|
2025-02-13 15:00:05 +01:00 |
|
Almaz Zinollayev
|
fbea53267b
|
Weighted reward functions (#213)
* [Weighted reward functions] Adding functionality to weigh rewards. Tests.
* [Weighted reward functions] Adding @wraps decorator to preserve reward function metadata
* style
* Changing grpo.py tests to run if cuda is available
* style
* Apply suggestions from code review
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
---------
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <quentin.gallouedec@huggingface.co>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
|
2025-02-13 14:08:27 +01:00 |
|
Quentin Gallouédec
|
52aa8759a2
|
new grpo logic (#274)
|
2025-02-11 09:35:06 +01:00 |
|
Jinfeng Sun
|
82b2a6525f
|
fix(sft recipes): remove duplicate packing option from config (#280)
|
2025-02-11 09:34:19 +01:00 |
|
lewtun
|
3519a7fa3d
|
Remove duplicate math-verify (#234)
|
2025-02-07 20:01:54 +01:00 |
|
lewtun
|
0da0f7cce2
|
Refactor training configs and unify Slurm for training SFT & GRPO (#231)
* Refactor Slurm
* Fix
* FML
* Nuke
* Clean
* Fix config
* Fix deps
* Fix logging
|
2025-02-07 15:56:43 +01:00 |
|
Quentin Gallouédec
|
dba152a494
|
fix config name (#222)
|
2025-02-07 14:34:46 +01:00 |
|
Dongwei Jiang
|
571661a1e4
|
Provide a minimal reproducible experiment using GRPO for mathematical reasoning on base model, referencing the approach from SimpleRL-Reason (#197)
* Create config_base_math_smalllr.yaml
* Update README.md
* Update README.md
|
2025-02-06 11:43:42 +01:00 |
|
Jingze Shi
|
e450a6fbc4
|
Recipes for optimzing training scripts (#120)
* Add recipe configs to optimize scripts (#73)
* remove small models
* Add README for recipes
* Add README for recipes
* Attempt to resolve conflicts
* Optimize src scripts
* Update recipe of DeepSeek-R1-Distill-Qwen-7B
* Update recipe of Qwen2.5-1.5B
* Updated recipe readme for qwen
* Update training command for recipes
* Update README.md
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
* Update preprocessing_num_workers from 36 to 8
* Add small language model recipes for quickly verify R1
* Fix src code quality
* Add back the Slurm job command
* Remove recipe of doge
* Fix torch_dtype is not used
* fix grpo yaml
* fix grpo yaml
* fix deprecation warning
* fix config folder location
* Remove duplicate variables in grpo.py
* Update README.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update README.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update recipes/qwen/Qwen2.5-1.5B-Instruct/grpo/confg_full.yaml
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
---------
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
|
2025-01-31 12:41:53 +01:00 |
|
lewtun
|
ca8f35c143
|
REFACTOR TO THE MAX (#7)
|
2025-01-25 00:12:25 +01:00 |
|
elie
|
c421bc893b
|
Improve sft (#5)
* first commit
* working training
* change model_id
* Update scripts/training/sft.py
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
---------
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
|
2025-01-24 22:23:49 +01:00 |
|
lewtun
|
6acc9a0aa0
|
Add configs and stuff (#2)
|
2025-01-24 20:05:18 +01:00 |
|