Commit graph

19 commits

Author SHA1 Message Date
lewtun
b806e1092a
Bump vLLM and TRL (#665)
* Bump vLLM and TRL

* Fix Makefile
2025-05-28 13:47:25 +02:00
lewtun
75c3999180
Bump LightEval to enable DP>1 (#629)
* Bump LightEval to enable DP>1

* Remove redundant arg

* Update eval scores

* Fix slurm
2025-04-30 22:02:20 +02:00
lewtun
50590a41b9
Enable data and tensor parallelism for GRPO (#626)
* Bump deps

* Fix SLurm

* Fix
2025-04-26 11:50:08 +02:00
lewtun
8cf42663fd
Clean up recipes (#596) 2025-04-11 20:09:15 +02:00
lewtun
04dbf21989
Bump TRL and vLLM (#595)
* Bump TRL and vLLM

* Fix style

* Bump liger

* Add liger
2025-04-11 16:32:33 +02:00
Edward Beeching
5dcfae8979
Fixes bug with async code reward (#504)
* adds slow test for code reward

* fixes bug in setting language and the output parsing

* style

* removed redundant comment

* removed exeception as e

* remove rewards

* removed whitespace

* more whitespace

* remove need for loop with asyncio.run

* nits

* fix type error with e2n AsyncSandbox
2025-03-13 22:54:15 +01:00
lewtun
a465641ec7
Fix make evaluate (#470) 2025-03-04 14:25:58 +01:00
lewtun
44cb13d4ba
Fix vLLM (#464) 2025-03-03 17:25:30 +01:00
Marco Z
c7733d3fa4
update makefile and readme (#449)
Co-authored-by: Marco Zocca <marco.zocca@unfoldml.com>
2025-03-01 15:08:30 +01:00
lewtun
9fb45bede6
Fix LightEval commands and dependencies (#386)
* Fix lighteval cmd

* Fix typo

* Pin lighteval

* Hacks to the max

* Fix slurm

* Fix

* Pin lighteval

* Pin l

---------

Co-authored-by: lewis@huggingface.co <lewis@ip-26-0-160-242.ec2.internal>
2025-02-21 14:52:45 +01:00
Almaz Zinollayev
517adddae3
[Testing Github workflow] Updating workflows and makefile (#214)
* [Testing Github workflow] Updating workflows and makefile

* [Testing Github workflow] - Refactoring workflow, fixing tests erorr, easier debugging

* [Testing Github workflow] Converting docstring into raw string

* [Testing Github workflow] - Fixing test_zero_max_penalty_returns_zero() test

* [Testing Github workflow] Removing redundant test
2025-02-10 18:28:35 +01:00
Kashif Rasul
250ab46ea1
[GRPO] add cosine reward (#206)
* add cosine reward

* fix merge

* fix typo

* fix check
2025-02-07 08:10:48 +01:00
lewtun
cec57f3a55
Add GPQA Diamond and fix evaluation deps (#196)
* Add GPQA Diamond

* Add table

* Fix README

* Up

* Fixes

* Ignore logs

* Fix

* Pin deps

* Fix GRPO

* Add Llama 70B tabels

* Restore dp

* Pin lighteval

* Use bfloat16

* Tune table

* Add note
2025-02-06 15:24:52 +01:00
Kashif Rasul
a0d61ccece
use ruff (#137)
* use ruff

* reformat

* re-run

* update deps

* undo

* Update src/open_r1/configs.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/open_r1/configs.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/open_r1/configs.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/open_r1/configs.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* fix help strings

* fix ruff version

* fix formatting

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-01-31 13:36:08 +01:00
María Grandury
d6f1a179a5
Implement make evaluate command (#41)
* implement evaluate make command

* add example usage of make evaluate to readme
2025-01-27 10:45:56 +01:00
lewtun
ca8f35c143
REFACTOR TO THE MAX (#7) 2025-01-25 00:12:25 +01:00
lewtun
26184f71ae
Refactor evaluation (#6) 2025-01-24 23:46:34 +01:00
lewtun
6acc9a0aa0
Add configs and stuff (#2) 2025-01-24 20:05:18 +01:00
Lewis Tunstall
2ff66e6cde Add skeleton 2025-01-24 16:50:13 +00:00