Lewis Tunstall
3bcc4fc86e
Add codeforces
2025-05-28 19:21:15 +00:00
lewtun
a6b4f668fb
Fix Weka refresh ( #666 )
...
* Fix Weka refresh
* Update evaluate.slurm
2025-05-28 13:45:48 +02:00
lewtun
722f144d21
Refresh Weka on Slurm ( #662 )
...
* Refresh Weka on Slurm
* Include current working dir
2025-05-27 19:21:15 +02:00
Guilherme Penedo
c1e1192294
GRPO with codeforces problems ( #627 )
...
* add
* update
* updates
* updates #2
* weighted_sum and python fixes
* bugfix
* merging ioi/cf setups
* integrating the morph changes
* move morph_client
* run style
* small changes for mixed languages training
* revert grpo.py changes
* piston readme
* local test fetching
* bug fixes
* updated readme
* style fixes
* style fixes 2
* deps changes
* import sorting
* fix tests
* Update README.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update README.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update src/open_r1/rewards.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
---------
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-05-25 11:55:27 +02:00
Edward Beeching
ea5b7edf22
Add dataset filtering script ( #637 )
...
* add dataset filtering script
* remove subset selection
* save wip
* save wip
* update filter script
* refactor to run on chunks
* rename script
* cleanup
* update dapo filtering
* fixes
* dapo filt config
* udpate compute pass rate
* clean
* update readme and config
* add merging snippet
2025-05-16 10:26:49 +02:00
lewtun
4fc2a3ff82
Add time to Slurm ( #639 )
2025-05-09 19:19:51 +02:00
Andrei
af81114044
Code Execution using Morph Cloud ( #614 )
...
* initial commit for morphcloud sandbox support
* initial
* fixed prints in morph client for ioi
* updated import
* context manager
* removed unnecessary comments
* more intelligent instance/snapshot management
* update
* Add documentation for Morph integration
* Delete MORPH_INTEGRATION.md
* added retry and modularity to morph client
* updates to kwargs and setup.py
* Update setup.py
* added languages codepath + fixed slurm + added m
orph tests
* make quality formatting fixes
* conditional imports for morph
---------
Co-authored-by: arb8020 <arbeightytwenty@gmail.com>
2025-05-08 08:59:54 +02:00
lewtun
75c3999180
Bump LightEval to enable DP>1 ( #629 )
...
* Bump LightEval to enable DP>1
* Remove redundant arg
* Update eval scores
* Fix slurm
2025-04-30 22:02:20 +02:00
lewtun
50590a41b9
Enable data and tensor parallelism for GRPO ( #626 )
...
* Bump deps
* Fix SLurm
* Fix
2025-04-26 11:50:08 +02:00
Edward Beeching
715c8787fb
add back grad accumulations steps ( #612 )
2025-04-17 16:41:39 +02:00
Edward Beeching
3a0e89678c
Fix eval system prompt ( #591 )
...
* fix eval system prompt
* style
2025-04-11 11:23:06 +02:00
lewtun
bf08f56849
[WIP] Bump lighteval with proper pass@1 ( #584 )
...
* Bump lighteval with proper pass@1
* Bump lighteval
* Update AIME24
2025-04-08 20:53:34 +02:00
Edward Beeching
1b3bf043dc
Adds a E2B router server that executes batches of scripts ( #561 )
...
* adds a dedicated e2b server to handle batches of requests
* fix reward tests
* update slow reward
* style
* updates e2b router to be more generic
* refactor
* refactoring
* licence, cleanup
* update tests
* style
* fix import when e2b not present
* style
* rename sandbox file
* rename to RoutedSandbox
* update readme
* nits
* nits2
* unlimited max time
* update logs path
2025-04-07 21:01:06 +02:00
lewtun
8000dd2384
[WIP] RL goes brrr ( #533 )
...
* Fix vLLM recipes
* Add vllm server to Slurm
* Add overlap across srun
* Fix NUM_NODES
* Refactor TP to script
* fix train script to work withnew GRPO
* lewis nits
* bump trl, transformers
---------
Co-authored-by: edbeeching <edbeeching@gmail.com>
2025-03-24 15:15:02 +01:00
Guilherme Penedo
7835979801
adds support for running GRPO on IOI problems ( #495 )
...
* adds support for running GRPO on IOI problems
* nit
* bugfixes + recipe
* added piston info and readme changes
* readme updates
* run isort to fix checks
* Update src/open_r1/rewards.py
Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com>
* adding ioi test
* fix merge issues with python slow tests
* style
* generalize piston workers
* generalize readme
* fix extract code
* finalize slow tests
---------
Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com>
Co-authored-by: edbeeching <edbeeching@gmail.com>
2025-03-21 08:48:00 +01:00
lewtun
44cb13d4ba
Fix vLLM ( #464 )
2025-03-03 17:25:30 +01:00
lewtun
eeca246b07
Update prompt template and sampling parameters for evaluation ( #392 )
...
* Pin t
* Pin t
* Set top p
* C
* Tune math prompt
* Improve math prompt
* Update tables
2025-02-22 15:21:01 +01:00
lewtun
9fb45bede6
Fix LightEval commands and dependencies ( #386 )
...
* Fix lighteval cmd
* Fix typo
* Pin lighteval
* Hacks to the max
* Fix slurm
* Fix
* Pin lighteval
* Pin l
---------
Co-authored-by: lewis@huggingface.co <lewis@ip-26-0-160-242.ec2.internal>
2025-02-21 14:52:45 +01:00
Edward Beeching
80e7e7b23c
move details script and fix wandb logging ( #314 )
2025-02-13 11:13:00 +01:00
Anton Lozhkov
440ae0b24e
Add the actual async generation script ( #273 )
...
* sglang inference server
* add vllm
* readme
* add a generation script
* ruff
2025-02-10 16:52:23 +01:00
Anton Lozhkov
baec330ef5
Add SGLang inference scripts ( #268 )
...
* sglang inference server
* add vllm
* readme
2025-02-10 14:37:58 +01:00
Edward Beeching
cabf27560b
hardcodes num_processes to 7 when using vllm ( #264 )
...
* hardcodes num_processes to 7 when using vllm
* nits
2025-02-10 11:43:16 +01:00
lewtun
9be2e9a859
Add retry mechanism for pushing eval results ( #252 )
...
The Hub throws 403 errors if there are too many concurrent pushes to the same repo, so we need a retry mechanism when that happens.
2025-02-09 09:44:35 +01:00
lewtun
0da0f7cce2
Refactor training configs and unify Slurm for training SFT & GRPO ( #231 )
...
* Refactor Slurm
* Fix
* FML
* Nuke
* Clean
* Fix config
* Fix deps
* Fix logging
2025-02-07 15:56:43 +01:00
lewtun
a60b175aeb
Update CUDA ( #209 )
...
* Update CUDA
* Fix
* Remove module
* Restore CUDA
* Move cuda import
2025-02-06 16:31:13 +01:00
lewtun
3fbdeac96c
Fix slurm eval ( #208 )
2025-02-06 15:46:33 +01:00
lewtun
cec57f3a55
Add GPQA Diamond and fix evaluation deps ( #196 )
...
* Add GPQA Diamond
* Add table
* Fix README
* Up
* Fixes
* Ignore logs
* Fix
* Pin deps
* Fix GRPO
* Add Llama 70B tabels
* Restore dp
* Pin lighteval
* Use bfloat16
* Tune table
* Add note
2025-02-06 15:24:52 +01:00
Edward Beeching
3fd56dc7b4
fix uv env path + details ( #188 )
...
* fix uv env path + details
* Update slurm/grpo.slurm
---------
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-02-05 23:59:25 +01:00
Jingze Shi
e450a6fbc4
Recipes for optimzing training scripts ( #120 )
...
* Add recipe configs to optimize scripts (#73 )
* remove small models
* Add README for recipes
* Add README for recipes
* Attempt to resolve conflicts
* Optimize src scripts
* Update recipe of DeepSeek-R1-Distill-Qwen-7B
* Update recipe of Qwen2.5-1.5B
* Updated recipe readme for qwen
* Update training command for recipes
* Update README.md
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
* Update preprocessing_num_workers from 36 to 8
* Add small language model recipes for quickly verify R1
* Fix src code quality
* Add back the Slurm job command
* Remove recipe of doge
* Fix torch_dtype is not used
* fix grpo yaml
* fix grpo yaml
* fix deprecation warning
* fix config folder location
* Remove duplicate variables in grpo.py
* Update README.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update README.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update recipes/qwen/Qwen2.5-1.5B-Instruct/grpo/confg_full.yaml
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
---------
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-01-31 12:41:53 +01:00
Kashif Rasul
c0b53fae29
Grpo slurm scripts ( #112 )
...
* initial grpo.slurm script
* initial zero3 yaml using 1 less gpu
* add completion and promp length
* initial doc
* use main
* fix typo
* remove num_processes
* use vllm 0.7.0
* remove double module load
* update math-verify
* Update README.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* overwrite num_procs in the slurm script
* add vllm args to readme
* update readme
---------
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-01-30 10:22:45 +01:00
Edward Beeching
972e47eff0
Adds auto eval callbacks ( #115 )
...
* adds auto eval callbacks
* updates training scripts with callbacks
* style
* date
* update gitignore with logs, eval results, etc
* remove unused imports
* nits
2025-01-30 09:39:47 +01:00
Hynek Kydlíček
e2235cf978
Improve repoduction of r1 reported score ( #92 )
...
* bump up deps, fix aime24 evals, make grpo more strict
* minor fixes
* 🤨 fmt
* bump lighteval + set boxed to match first
* remove dead code
* bump lighteval
* add ed's tp branch swtich
---------
Co-authored-by: Hynek Kydlicek <kydlicek.hynek@huggingface.co>
2025-01-29 11:29:05 +01:00
Gabriel Martín Blázquez
c5941ed5e4
Add --timeout, --retries, --prompt-template and TP and PP by slurm variables ( #94 )
...
* Set TP and PP using slurm variables
* Add `--timeout` argument
* add `--prompt-template` argument
* Group generations
* Add `--retries` argument
2025-01-28 18:30:04 +01:00
Gabriel Martín Blázquez
b03480d868
Add --input-batch-size, --client-replicas args and download Ray logs ( #71 )
...
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-01-27 14:58:13 +01:00
Edward Beeching
feb59d2b42
Update evaluate.slurm
...
typo in eval slurm
2025-01-27 09:43:02 +01:00
Hynek Kydlíček
90b0947382
Reward verification and evaluation fixes ( #55 )
...
* bump up deps, fix aime24 evals, make grpo more strict
* minor fixes
* 🤨 fmt
* Update src/open_r1/grpo.py
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
---------
Co-authored-by: Hynek Kydlicek <kydlicek.hynek@huggingface.co>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-01-26 18:35:48 +01:00
Anton Lozhkov
15df4fb134
vllm speed tweaks ( #43 )
2025-01-26 01:59:50 +01:00
elie
64c0ed2254
fix evaluate.slurm ( #27 )
2025-01-25 15:28:49 +01:00
elie
f169d2cd8e
add evaluate.slurm ( #26 )
2025-01-25 15:23:16 +01:00
Gabriel Martín Blázquez
a90b99686a
Fix passing vLLM server URL ( #21 )
...
* Use head node ip as vLLM server url
* Pass correct server url
* Add num_generations argument
* Fix style
* Remove `select`
---------
Co-authored-by: plaguss <agustin@argilla.io>
2025-01-25 15:01:15 +01:00
elie
43cb6a0e0f
fix sft.slurm
2025-01-25 14:48:45 +01:00
lewtun
2580fd8c1b
Fix Slurm SFT and gather Slurm scripts ( #19 )
...
* Fix slurm
* Fix generate
* Fix install
* Fix c
2025-01-25 13:47:52 +01:00