Quentin Gallouédec
|
5e0c210f9c
|
use hf papers (#646)
|
2025-05-19 13:48:14 +02:00 |
|
Edward Beeching
|
ea5b7edf22
|
Add dataset filtering script (#637)
* add dataset filtering script
* remove subset selection
* save wip
* save wip
* update filter script
* refactor to run on chunks
* rename script
* cleanup
* update dapo filtering
* fixes
* dapo filt config
* udpate compute pass rate
* clean
* update readme and config
* add merging snippet
|
2025-05-16 10:26:49 +02:00 |
|
Andrei
|
af81114044
|
Code Execution using Morph Cloud (#614)
* initial commit for morphcloud sandbox support
* initial
* fixed prints in morph client for ioi
* updated import
* context manager
* removed unnecessary comments
* more intelligent instance/snapshot management
* update
* Add documentation for Morph integration
* Delete MORPH_INTEGRATION.md
* added retry and modularity to morph client
* updates to kwargs and setup.py
* Update setup.py
* added languages codepath + fixed slurm + added m
orph tests
* make quality formatting fixes
* conditional imports for morph
---------
Co-authored-by: arb8020 <arbeightytwenty@gmail.com>
|
2025-05-08 08:59:54 +02:00 |
|
Edward Beeching
|
c1eadaa097
|
E2B Router bug fixes (#592)
* fix eval system prompt
* style
* fix a rare issue where the execution is None
* fixes a bug in the e2b router
|
2025-04-11 14:04:59 +02:00 |
|
Edward Beeching
|
1b3bf043dc
|
Adds a E2B router server that executes batches of scripts (#561)
* adds a dedicated e2b server to handle batches of requests
* fix reward tests
* update slow reward
* style
* updates e2b router to be more generic
* refactor
* refactoring
* licence, cleanup
* update tests
* style
* fix import when e2b not present
* style
* rename sandbox file
* rename to RoutedSandbox
* update readme
* nits
* nits2
* unlimited max time
* update logs path
|
2025-04-07 21:01:06 +02:00 |
|
Edward Beeching
|
9915e06f1e
|
Async code reward fixes (#546)
* expose num parallel code executions
* add e2b benchmarking script
* adds new parallel code execution with better execption handling
* style
* update default
* increase sandbox timeout
* Add pretty table and Sandbox IDs
* Add Sandbox ID
* fix merge
---------
Co-authored-by: Lewis Tunstall <lewis.c.tunstall@gmail.com>
|
2025-03-28 14:08:15 +01:00 |
|
lewtun
|
8000dd2384
|
[WIP] RL goes brrr (#533)
* Fix vLLM recipes
* Add vllm server to Slurm
* Add overlap across srun
* Fix NUM_NODES
* Refactor TP to script
* fix train script to work withnew GRPO
* lewis nits
* bump trl, transformers
---------
Co-authored-by: edbeeching <edbeeching@gmail.com>
|
2025-03-24 15:15:02 +01:00 |
|
lewtun
|
299446902d
|
Enable decontamination on dataset configs (#460)
|
2025-03-04 09:22:01 +01:00 |
|
Agus
|
7188001281
|
Add script to decontaminate datasets against benchmark datasets (#416)
* Add script to decontaminate datasets against benchmark datasets
* Add docs for the decontamination script
* Update README.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update README.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update README.md
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update scripts/decontaminate.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update scripts/decontaminate.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update scripts/decontaminate.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update scripts/decontaminate.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Update scripts/decontaminate.py
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Add license header and attribution to the authors
---------
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
|
2025-02-24 19:54:44 +01:00 |
|
Edward Beeching
|
80e7e7b23c
|
move details script and fix wandb logging (#314)
|
2025-02-13 11:13:00 +01:00 |
|
Anton Lozhkov
|
fa9b621cc9
|
Fix uuid in the data generator (#284)
* fix uuid issues
|
2025-02-11 14:08:46 +01:00 |
|
Anton Lozhkov
|
3f630aaabb
|
Rename to generate_reasoning.py (#275)
|
2025-02-10 16:53:53 +01:00 |
|
Anton Lozhkov
|
440ae0b24e
|
Add the actual async generation script (#273)
* sglang inference server
* add vllm
* readme
* add a generation script
* ruff
|
2025-02-10 16:52:23 +01:00 |
|
Kashif Rasul
|
a0d61ccece
|
use ruff (#137)
* use ruff
* reformat
* re-run
* update deps
* undo
* Update src/open_r1/configs.py
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Update src/open_r1/configs.py
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Update src/open_r1/configs.py
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Update src/open_r1/configs.py
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* fix help strings
* fix ruff version
* fix formatting
---------
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
|
2025-01-31 13:36:08 +01:00 |
|
Edward Beeching
|
972e47eff0
|
Adds auto eval callbacks (#115)
* adds auto eval callbacks
* updates training scripts with callbacks
* style
* date
* update gitignore with logs, eval results, etc
* remove unused imports
* nits
|
2025-01-30 09:39:47 +01:00 |
|
lewtun
|
ca8f35c143
|
REFACTOR TO THE MAX (#7)
|
2025-01-25 00:12:25 +01:00 |
|
lewtun
|
26184f71ae
|
Refactor evaluation (#6)
|
2025-01-24 23:46:34 +01:00 |
|
elie
|
c421bc893b
|
Improve sft (#5)
* first commit
* working training
* change model_id
* Update scripts/training/sft.py
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
---------
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
|
2025-01-24 22:23:49 +01:00 |
|
lewtun
|
6acc9a0aa0
|
Add configs and stuff (#2)
|
2025-01-24 20:05:18 +01:00 |
|
Lewis Tunstall
|
697c119dd8
|
Add data
|
2025-01-24 16:51:03 +00:00 |
|
Lewis Tunstall
|
2ff66e6cde
|
Add skeleton
|
2025-01-24 16:50:13 +00:00 |
|