Commit graph

18 commits

Author SHA1 Message Date
Edward Beeching
c1eadaa097
E2B Router bug fixes (#592)
* fix eval system prompt

* style

* fix a rare issue where the execution is None

* fixes a bug in the e2b router
2025-04-11 14:04:59 +02:00
Edward Beeching
1b3bf043dc
Adds a E2B router server that executes batches of scripts (#561)
* adds a dedicated e2b server to handle batches of requests

* fix reward tests

* update slow reward

* style

* updates e2b router to be more generic

* refactor

* refactoring

* licence, cleanup

* update tests

* style

* fix import when e2b not present

* style

* rename sandbox file

* rename to RoutedSandbox

* update readme

* nits

* nits2

* unlimited max time

* update logs path
2025-04-07 21:01:06 +02:00
Edward Beeching
9915e06f1e
Async code reward fixes (#546)
* expose num parallel code executions

* add e2b benchmarking script

* adds new parallel code execution with better execption handling

* style

* update default

* increase sandbox timeout

* Add pretty table and Sandbox IDs

* Add Sandbox ID

* fix merge

---------

Co-authored-by: Lewis Tunstall <lewis.c.tunstall@gmail.com>
2025-03-28 14:08:15 +01:00
lewtun
8000dd2384
[WIP] RL goes brrr (#533)
* Fix vLLM recipes

* Add vllm server to Slurm

* Add overlap across srun

* Fix NUM_NODES

* Refactor TP to script

* fix train script to work withnew  GRPO

* lewis nits

* bump trl, transformers

---------

Co-authored-by: edbeeching <edbeeching@gmail.com>
2025-03-24 15:15:02 +01:00
lewtun
299446902d
Enable decontamination on dataset configs (#460) 2025-03-04 09:22:01 +01:00
Agus
7188001281
Add script to decontaminate datasets against benchmark datasets (#416)
* Add script to decontaminate datasets against benchmark datasets

* Add docs for the decontamination script

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update README.md

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update scripts/decontaminate.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update scripts/decontaminate.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update scripts/decontaminate.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update scripts/decontaminate.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Update scripts/decontaminate.py

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

* Add license header and attribution to the authors

---------

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2025-02-24 19:54:44 +01:00
Edward Beeching
80e7e7b23c
move details script and fix wandb logging (#314) 2025-02-13 11:13:00 +01:00
Anton Lozhkov
fa9b621cc9
Fix uuid in the data generator (#284)
* fix uuid issues
2025-02-11 14:08:46 +01:00
Anton Lozhkov
3f630aaabb
Rename to generate_reasoning.py (#275) 2025-02-10 16:53:53 +01:00
Anton Lozhkov
440ae0b24e
Add the actual async generation script (#273)
* sglang inference server

* add vllm

* readme

* add a generation script

* ruff
2025-02-10 16:52:23 +01:00
Kashif Rasul
a0d61ccece
use ruff (#137)
* use ruff

* reformat

* re-run

* update deps

* undo

* Update src/open_r1/configs.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/open_r1/configs.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/open_r1/configs.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/open_r1/configs.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* fix help strings

* fix ruff version

* fix formatting

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-01-31 13:36:08 +01:00
Edward Beeching
972e47eff0
Adds auto eval callbacks (#115)
* adds auto eval callbacks

* updates training scripts with callbacks

* style

* date

* update gitignore with logs, eval results, etc

* remove unused imports

* nits
2025-01-30 09:39:47 +01:00
lewtun
ca8f35c143
REFACTOR TO THE MAX (#7) 2025-01-25 00:12:25 +01:00
lewtun
26184f71ae
Refactor evaluation (#6) 2025-01-24 23:46:34 +01:00
elie
c421bc893b
Improve sft (#5)
* first commit

* working training

* change model_id

* Update scripts/training/sft.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-01-24 22:23:49 +01:00
lewtun
6acc9a0aa0
Add configs and stuff (#2) 2025-01-24 20:05:18 +01:00
Lewis Tunstall
697c119dd8 Add data 2025-01-24 16:51:03 +00:00
Lewis Tunstall
2ff66e6cde Add skeleton 2025-01-24 16:50:13 +00:00