Lewis Tunstall
|
97b1c22e55
|
Merge branch 'bump-deps-0' into zero-math-code
|
2025-05-28 10:11:06 +02:00 |
|
Lewis Tunstall
|
f6a07648e2
|
Bump vLLM and TRL
|
2025-05-28 06:48:01 +00:00 |
|
lewtun
|
33f84def0d
|
Align EOS token ID between tokenizer and generation config (#663)
* Align EOS token ID between tokenizer and generation config
* Fix
|
2025-05-27 17:20:13 +02:00 |
|
Lewis Tunstall
|
82fb385fa5
|
Refine tests
|
2025-05-27 13:39:00 +00:00 |
|
lewtun
|
5ac5971ea5
|
Add OpenR1-Distill recipe (#661)
|
2025-05-26 17:57:44 +02:00 |
|
Guilherme Penedo
|
c1e1192294
|
GRPO with codeforces problems (#627)
* add
* update
* updates
* updates #2
* weighted_sum and python fixes
* bugfix
* merging ioi/cf setups
* integrating the morph changes
* move morph_client
* run style
* small changes for mixed languages training
* revert grpo.py changes
* piston readme
* local test fetching
* bug fixes
* updated readme
* style fixes
* style fixes 2
* deps changes
* import sorting
* fix tests
* Update README.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update README.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update src/open_r1/rewards.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
---------
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
|
2025-05-25 11:55:27 +02:00 |
|
lewtun
|
9366aa2df3
|
Add dataset mixer (#647)
* Prototype
* Clean up
* Refactor
* Add tests
* Add doc and make scripts work
* Tune doc
* Up
* Tune
* Add column verification
* Fix types
* Fix YAML
* Fix types
* Fix doc
* f
* f
|
2025-05-20 11:40:42 +02:00 |
|
Quentin Gallouédec
|
5e0c210f9c
|
use hf papers (#646)
|
2025-05-19 13:48:14 +02:00 |
|
Edward Beeching
|
ea5b7edf22
|
Add dataset filtering script (#637)
* add dataset filtering script
* remove subset selection
* save wip
* save wip
* update filter script
* refactor to run on chunks
* rename script
* cleanup
* update dapo filtering
* fixes
* dapo filt config
* udpate compute pass rate
* clean
* update readme and config
* add merging snippet
|
2025-05-16 10:26:49 +02:00 |
|
lewtun
|
c802f00512
|
Use pass@1 for all evals (#633)
* Use pass@1 for all evals
* Update scores
|
2025-05-09 17:42:36 +02:00 |
|
Andrei
|
af81114044
|
Code Execution using Morph Cloud (#614)
* initial commit for morphcloud sandbox support
* initial
* fixed prints in morph client for ioi
* updated import
* context manager
* removed unnecessary comments
* more intelligent instance/snapshot management
* update
* Add documentation for Morph integration
* Delete MORPH_INTEGRATION.md
* added retry and modularity to morph client
* updates to kwargs and setup.py
* Update setup.py
* added languages codepath + fixed slurm + added m
orph tests
* make quality formatting fixes
* conditional imports for morph
---------
Co-authored-by: arb8020 <arbeightytwenty@gmail.com>
|
2025-05-08 08:59:54 +02:00 |
|
lewtun
|
9373ad3055
|
Update README.md
|
2025-04-30 22:16:18 +02:00 |
|
lewtun
|
75c3999180
|
Bump LightEval to enable DP>1 (#629)
* Bump LightEval to enable DP>1
* Remove redundant arg
* Update eval scores
* Fix slurm
|
2025-04-30 22:02:20 +02:00 |
|
lewtun
|
50590a41b9
|
Enable data and tensor parallelism for GRPO (#626)
* Bump deps
* Fix SLurm
* Fix
|
2025-04-26 11:50:08 +02:00 |
|
lewtun
|
5112bfc401
|
Fix SFT for base models (#604)
* Fix pad token bug in SFT
* Add ChatML default
* Clean up
* Refactor grpo model load
* Add doc
* Bump deepspeed
|
2025-04-16 11:45:50 +02:00 |
|
lewtun
|
04dbf21989
|
Bump TRL and vLLM (#595)
* Bump TRL and vLLM
* Fix style
* Bump liger
* Add liger
|
2025-04-11 16:32:33 +02:00 |
|
Shenghang Tsai
|
2a7bb45f05
|
Update README.md (#590)
|
2025-04-10 13:11:35 +02:00 |
|
lewtun
|
bf08f56849
|
[WIP] Bump lighteval with proper pass@1 (#584)
* Bump lighteval with proper pass@1
* Bump lighteval
* Update AIME24
|
2025-04-08 20:53:34 +02:00 |
|
Edward Beeching
|
1b3bf043dc
|
Adds a E2B router server that executes batches of scripts (#561)
* adds a dedicated e2b server to handle batches of requests
* fix reward tests
* update slow reward
* style
* updates e2b router to be more generic
* refactor
* refactoring
* licence, cleanup
* update tests
* style
* fix import when e2b not present
* style
* rename sandbox file
* rename to RoutedSandbox
* update readme
* nits
* nits2
* unlimited max time
* update logs path
|
2025-04-07 21:01:06 +02:00 |
|
lewtun
|
4ec555b0c8
|
Restore single-node instructions to run GRPO (#549)
|
2025-03-27 10:29:07 +01:00 |
|
lewtun
|
8000dd2384
|
[WIP] RL goes brrr (#533)
* Fix vLLM recipes
* Add vllm server to Slurm
* Add overlap across srun
* Fix NUM_NODES
* Refactor TP to script
* fix train script to work withnew GRPO
* lewis nits
* bump trl, transformers
---------
Co-authored-by: edbeeching <edbeeching@gmail.com>
|
2025-03-24 15:15:02 +01:00 |
|
Guilherme Penedo
|
7835979801
|
adds support for running GRPO on IOI problems (#495)
* adds support for running GRPO on IOI problems
* nit
* bugfixes + recipe
* added piston info and readme changes
* readme updates
* run isort to fix checks
* Update src/open_r1/rewards.py
Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com>
* adding ioi test
* fix merge issues with python slow tests
* style
* generalize piston workers
* generalize readme
* fix extract code
* finalize slow tests
---------
Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com>
Co-authored-by: edbeeching <edbeeching@gmail.com>
|
2025-03-21 08:48:00 +01:00 |
|
koskotheim
|
d436b7b9c0
|
fix typo (#507)
|
2025-03-15 20:56:14 +01:00 |
|
lewtun
|
d5922af8ce
|
Add OlympicCoder recipes (#505)
* Add OlympicCoder recipes
* Fix configs
* Add FSDP config
|
2025-03-13 19:08:34 +01:00 |
|
lewtun
|
3b5d6603bf
|
Add citation and acknowledgements (#481)
* Update README.md
* Update README.md
* Update README.md
|
2025-03-05 20:23:57 +01:00 |
|
lewtun
|
44cb13d4ba
|
Fix vLLM (#464)
|
2025-03-03 17:25:30 +01:00 |
|
Marco Z
|
c7733d3fa4
|
update makefile and readme (#449)
Co-authored-by: Marco Zocca <marco.zocca@unfoldml.com>
|
2025-03-01 15:08:30 +01:00 |
|
Agus
|
7188001281
|
Add script to decontaminate datasets against benchmark datasets (#416)
* Add script to decontaminate datasets against benchmark datasets
* Add docs for the decontamination script
* Update README.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update README.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update README.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update scripts/decontaminate.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update scripts/decontaminate.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update scripts/decontaminate.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update scripts/decontaminate.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update scripts/decontaminate.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Add license header and attribution to the authors
---------
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
|
2025-02-24 19:54:44 +01:00 |
|
lewtun
|
eeca246b07
|
Update prompt template and sampling parameters for evaluation (#392)
* Pin t
* Pin t
* Set top p
* C
* Tune math prompt
* Improve math prompt
* Update tables
|
2025-02-22 15:21:01 +01:00 |
|
lewtun
|
9fb45bede6
|
Fix LightEval commands and dependencies (#386)
* Fix lighteval cmd
* Fix typo
* Pin lighteval
* Hacks to the max
* Fix slurm
* Fix
* Pin lighteval
* Pin l
---------
Co-authored-by: lewis@huggingface.co <lewis@ip-26-0-160-242.ec2.internal>
|
2025-02-21 14:52:45 +01:00 |
|
lewtun
|
d76ecc12a2
|
Add E2B code interpreter reward function (#364)
* Add stuff
* Make it kind of work
* Add more stuff
* Add fix for parse
* Fix
* Refactor
* Clean up
* Fix config
* Fix sys
* Add SFT config
* Use min rate
* Fix eval
* Add base model
* Add s1k
* Disable eval
* Fix
* Add import checker
* Fix importer
* Fix
* Tune config
* Tune
* Fix
* Fix save
* Tuen beta
* Remove configs
* Fix vLLM
* Fix
* Add note
* Add doc
* doc
* Fix
* Tune lr
* Add command
|
2025-02-19 11:26:46 +01:00 |
|
Agus
|
740a7a4305
|
Add LiveCodeBench's codegeneration task from lighteval (#346)
* Add lcb:codegeneration task from ligtheval
* Add results from R1 Qwen 32B
|
2025-02-19 08:32:33 +01:00 |
|
lewtun
|
78c197df51
|
Enable chat template and system prompt to be configured during training (#349)
* Enable chat template to be configured
* Add notes to README
* Handle None
* Remove default system prompt
* Fix ST
* Tune hparams
* Fix
* Tune
* Fix
|
2025-02-18 14:46:43 +01:00 |
|
Edward Beeching
|
f987b3c877
|
bump vllm to version to 0.7.2 (#311)
VLLM has made a number of throughput improvements in version 0.7.2, so it's worth bumping the version, particularly for GRPO training runs.
|
2025-02-13 10:48:11 +01:00 |
|
lewtun
|
96a6b0fa33
|
Enable Weights & Biases defaults to be overridden in training (#294)
* Enable WandB defaults to be set
* Fix
|
2025-02-12 13:01:07 +01:00 |
|
Lewis
|
db19392bef
|
chore(README): fix link, consistent formatting for CUDA warning (#248)
low priority & cosmetic
|
2025-02-09 09:45:38 +01:00 |
|
Ty Feng
|
90c1bfe829
|
Fix README: Correct recipes path and missing --config option (#247)
* Fix incorrect recipes path in README
* Fix missing --config option and incorrect recipes path
* Fix missing --config option and incorrect recipes path
|
2025-02-09 08:21:35 +01:00 |
|
Xu Song
|
f5f0b55dc4
|
Fix typo (#241)
|
2025-02-08 10:28:11 +01:00 |
|
lewtun
|
0da0f7cce2
|
Refactor training configs and unify Slurm for training SFT & GRPO (#231)
* Refactor Slurm
* Fix
* FML
* Nuke
* Clean
* Fix config
* Fix deps
* Fix logging
|
2025-02-07 15:56:43 +01:00 |
|
Quentin Gallouédec
|
dba152a494
|
fix config name (#222)
|
2025-02-07 14:34:46 +01:00 |
|
lewtun
|
c4227d6220
|
Update README.md (#211)
|
2025-02-06 16:40:09 +01:00 |
|
lewtun
|
a60b175aeb
|
Update CUDA (#209)
* Update CUDA
* Fix
* Remove module
* Restore CUDA
* Move cuda import
|
2025-02-06 16:31:13 +01:00 |
|
lewtun
|
cec57f3a55
|
Add GPQA Diamond and fix evaluation deps (#196)
* Add GPQA Diamond
* Add table
* Fix README
* Up
* Fixes
* Ignore logs
* Fix
* Pin deps
* Fix GRPO
* Add Llama 70B tabels
* Restore dp
* Pin lighteval
* Use bfloat16
* Tune table
* Add note
|
2025-02-06 15:24:52 +01:00 |
|
Dongwei Jiang
|
571661a1e4
|
Provide a minimal reproducible experiment using GRPO for mathematical reasoning on base model, referencing the approach from SimpleRL-Reason (#197)
* Create config_base_math_smalllr.yaml
* Update README.md
* Update README.md
|
2025-02-06 11:43:42 +01:00 |
|
Jingze Shi
|
e450a6fbc4
|
Recipes for optimzing training scripts (#120)
* Add recipe configs to optimize scripts (#73)
* remove small models
* Add README for recipes
* Add README for recipes
* Attempt to resolve conflicts
* Optimize src scripts
* Update recipe of DeepSeek-R1-Distill-Qwen-7B
* Update recipe of Qwen2.5-1.5B
* Updated recipe readme for qwen
* Update training command for recipes
* Update README.md
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
* Update preprocessing_num_workers from 36 to 8
* Add small language model recipes for quickly verify R1
* Fix src code quality
* Add back the Slurm job command
* Remove recipe of doge
* Fix torch_dtype is not used
* fix grpo yaml
* fix grpo yaml
* fix deprecation warning
* fix config folder location
* Remove duplicate variables in grpo.py
* Update README.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update README.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update recipes/qwen/Qwen2.5-1.5B-Instruct/grpo/confg_full.yaml
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
---------
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
|
2025-01-31 12:41:53 +01:00 |
|
Dongwei Jiang
|
22512e62bc
|
Update README.md (#132)
|
2025-01-31 11:27:17 +01:00 |
|
Sam Schorb
|
356f6a5c4f
|
Add Table of Contents to README for easier navigation (#125)
* Update README.md
* Update README.md
|
2025-01-30 16:32:13 +01:00 |
|
Kashif Rasul
|
c0b53fae29
|
Grpo slurm scripts (#112)
* initial grpo.slurm script
* initial zero3 yaml using 1 less gpu
* add completion and promp length
* initial doc
* use main
* fix typo
* remove num_processes
* use vllm 0.7.0
* remove double module load
* update math-verify
* Update README.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* overwrite num_procs in the slurm script
* add vllm args to readme
* update readme
---------
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
|
2025-01-30 10:22:45 +01:00 |
|
Lewis
|
fb1b4c4e3f
|
docs(README): note about CUDA 12.1 (#121)
will segfault for CUDA 14.1 under certain conditions; instructions are specific to 12.1
- fixes #106
- fixes #117
|
2025-01-30 08:42:43 +01:00 |
|
Edward Beeching
|
bd0e15bfb5
|
Update README.md (#93)
|
2025-01-30 00:42:29 +01:00 |
|