mirrors/open-r1

mirror of https://github.com/huggingface/open-r1.git synced 2026-06-24 01:54:06 +00:00

Author	SHA1	Message	Date
Quentin Gallouédec	5e0c210f9c	use hf papers (#646 )	2025-05-19 13:48:14 +02:00
Edward Beeching	ea5b7edf22	Add dataset filtering script (#637 ) * add dataset filtering script * remove subset selection * save wip * save wip * update filter script * refactor to run on chunks * rename script * cleanup * update dapo filtering * fixes * dapo filt config * udpate compute pass rate * clean * update readme and config * add merging snippet	2025-05-16 10:26:49 +02:00
Andrei	af81114044	Code Execution using Morph Cloud (#614 ) * initial commit for morphcloud sandbox support * initial * fixed prints in morph client for ioi * updated import * context manager * removed unnecessary comments * more intelligent instance/snapshot management * update * Add documentation for Morph integration * Delete MORPH_INTEGRATION.md * added retry and modularity to morph client * updates to kwargs and setup.py * Update setup.py * added languages codepath + fixed slurm + added m orph tests * make quality formatting fixes * conditional imports for morph --------- Co-authored-by: arb8020 <arbeightytwenty@gmail.com>	2025-05-08 08:59:54 +02:00
Edward Beeching	c1eadaa097	E2B Router bug fixes (#592 ) * fix eval system prompt * style * fix a rare issue where the execution is None * fixes a bug in the e2b router	2025-04-11 14:04:59 +02:00
Edward Beeching	1b3bf043dc	Adds a E2B router server that executes batches of scripts (#561 ) * adds a dedicated e2b server to handle batches of requests * fix reward tests * update slow reward * style * updates e2b router to be more generic * refactor * refactoring * licence, cleanup * update tests * style * fix import when e2b not present * style * rename sandbox file * rename to RoutedSandbox * update readme * nits * nits2 * unlimited max time * update logs path	2025-04-07 21:01:06 +02:00
Edward Beeching	9915e06f1e	Async code reward fixes (#546 ) * expose num parallel code executions * add e2b benchmarking script * adds new parallel code execution with better execption handling * style * update default * increase sandbox timeout * Add pretty table and Sandbox IDs * Add Sandbox ID * fix merge --------- Co-authored-by: Lewis Tunstall <lewis.c.tunstall@gmail.com>	2025-03-28 14:08:15 +01:00
lewtun	8000dd2384	[WIP] RL goes brrr (#533 ) * Fix vLLM recipes * Add vllm server to Slurm * Add overlap across srun * Fix NUM_NODES * Refactor TP to script * fix train script to work withnew GRPO * lewis nits * bump trl, transformers --------- Co-authored-by: edbeeching <edbeeching@gmail.com>	2025-03-24 15:15:02 +01:00
lewtun	299446902d	Enable decontamination on dataset configs (#460 )	2025-03-04 09:22:01 +01:00
Agus	7188001281	Add script to decontaminate datasets against benchmark datasets (#416 ) * Add script to decontaminate datasets against benchmark datasets * Add docs for the decontamination script * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update scripts/decontaminate.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update scripts/decontaminate.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update scripts/decontaminate.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update scripts/decontaminate.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update scripts/decontaminate.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Add license header and attribution to the authors --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2025-02-24 19:54:44 +01:00
Edward Beeching	80e7e7b23c	move details script and fix wandb logging (#314 )	2025-02-13 11:13:00 +01:00
Anton Lozhkov	fa9b621cc9	Fix uuid in the data generator (#284 ) * fix uuid issues	2025-02-11 14:08:46 +01:00
Anton Lozhkov	3f630aaabb	Rename to generate_reasoning.py (#275 )	2025-02-10 16:53:53 +01:00
Anton Lozhkov	440ae0b24e	Add the actual async generation script (#273 ) * sglang inference server * add vllm * readme * add a generation script * ruff	2025-02-10 16:52:23 +01:00
Kashif Rasul	a0d61ccece	use ruff (#137 ) * use ruff * reformat * re-run * update deps * undo * Update src/open_r1/configs.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/open_r1/configs.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/open_r1/configs.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/open_r1/configs.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * fix help strings * fix ruff version * fix formatting --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-01-31 13:36:08 +01:00
Edward Beeching	972e47eff0	Adds auto eval callbacks (#115 ) * adds auto eval callbacks * updates training scripts with callbacks * style * date * update gitignore with logs, eval results, etc * remove unused imports * nits	2025-01-30 09:39:47 +01:00
lewtun	ca8f35c143	REFACTOR TO THE MAX (#7 )	2025-01-25 00:12:25 +01:00
lewtun	26184f71ae	Refactor evaluation (#6 )	2025-01-24 23:46:34 +01:00
elie	c421bc893b	Improve sft (#5 ) * first commit * working training * change model_id * Update scripts/training/sft.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> --------- Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>	2025-01-24 22:23:49 +01:00
lewtun	6acc9a0aa0	Add configs and stuff (#2 )	2025-01-24 20:05:18 +01:00
Lewis Tunstall	697c119dd8	Add data	2025-01-24 16:51:03 +00:00
Lewis Tunstall	2ff66e6cde	Add skeleton	2025-01-24 16:50:13 +00:00

21 commits