mirrors/open-r1

mirror of https://github.com/huggingface/open-r1.git synced 2026-06-24 01:54:06 +00:00

Author	SHA1	Message	Date
Lewis Tunstall	3bcc4fc86e	Add codeforces	2025-05-28 19:21:15 +00:00
Lewis Tunstall	43375fa7b9	Merge branch 'main' into zero-math-code	2025-05-28 13:52:20 +02:00
lewtun	b806e1092a	Bump vLLM and TRL (#665 ) * Bump vLLM and TRL * Fix Makefile	2025-05-28 13:47:25 +02:00
lewtun	a6b4f668fb	Fix Weka refresh (#666 ) * Fix Weka refresh * Update evaluate.slurm	2025-05-28 13:45:48 +02:00
Lewis Tunstall	97b1c22e55	Merge branch 'bump-deps-0' into zero-math-code	2025-05-28 10:11:06 +02:00
Lewis Tunstall	cada407cd6	Merge branch 'main' into zero-math-code	2025-05-28 09:24:12 +02:00
lewtun	01b4351c45	Set DP=2 for smol model evals (#664 ) * Set DP=2 for smol model evals Temporary hack while the HF cluster is at max capacity :) * Style	2025-05-28 09:23:12 +02:00
Lewis Tunstall	b369e428f8	Merge branch 'main' into zero-math-code	2025-05-28 09:22:22 +02:00
Lewis Tunstall	f6a07648e2	Bump vLLM and TRL	2025-05-28 06:48:01 +00:00
Lewis Tunstall	898406d85f	Fix DP=2 for evals	2025-05-27 21:20:52 +00:00
Lewis Tunstall	b6b1643c2d	Fix benchmarks!	2025-05-27 20:44:35 +00:00
lewtun	722f144d21	Refresh Weka on Slurm (#662 ) * Refresh Weka on Slurm * Include current working dir	2025-05-27 19:21:15 +02:00
lewtun	33f84def0d	Align EOS token ID between tokenizer and generation config (#663 ) * Align EOS token ID between tokenizer and generation config * Fix	2025-05-27 17:20:13 +02:00
Lewis Tunstall	82fb385fa5	Refine tests	2025-05-27 13:39:00 +00:00
lewtun	9eef995b4d	Bump deps (#656 )	2025-05-27 15:38:21 +02:00
Lewis Tunstall	296aa66e1e	Tweak format reward	2025-05-27 08:16:49 +00:00
lewtun	5ac5971ea5	Add OpenR1-Distill recipe (#661 )	2025-05-26 17:57:44 +02:00
Lewis Tunstall	9f6abc8ed1	Relax format reward	2025-05-26 11:15:56 +00:00
Lewis Tunstall	bc06504df5	Add better baseline defaults	2025-05-26 09:06:09 +00:00
Lewis Tunstall	9862bfec41	Relax reward	2025-05-26 08:09:03 +00:00
Lewis Tunstall	1f56bab96c	Tune baseline	2025-05-25 17:22:06 +00:00
Lewis Tunstall	965d451d61	Restore baseline	2025-05-25 17:00:33 +00:00
Lewis Tunstall	31eacc4b9a	Use GAS instead of generation	2025-05-25 16:57:33 +00:00
Lewis Tunstall	0b933a2aa4	Restore gas	2025-05-25 16:54:18 +00:00
Lewis Tunstall	cf765df201	Tune baseline	2025-05-25 13:21:01 +00:00
Lewis Tunstall	da0e9ae28d	Add overlong punishment	2025-05-25 12:46:45 +00:00
Lewis Tunstall	7f777c0583	Add new DAPO recipe	2025-05-25 12:40:32 +00:00
Lewis Tunstall	b575444fe8	Add think format and accuracy rewards	2025-05-25 12:24:43 +00:00
Lewis Tunstall	6c7c102755	Merge remote-tracking branch 'origin/bump-deps-0' into zero-math-code	2025-05-25 14:05:42 +02:00
lewtun	57e85b522f	Add better logging defaults for GRPO (#657 )	2025-05-25 13:24:52 +02:00
lewtun	5374bc2bef	Merge branch 'main' into bump-deps-0	2025-05-25 12:02:52 +02:00
Lewis Tunstall	3258282733	Bump deps	2025-05-25 09:59:57 +00:00
Guilherme Penedo	c1e1192294	GRPO with codeforces problems (#627 ) * add * update * updates * updates #2 * weighted_sum and python fixes * bugfix * merging ioi/cf setups * integrating the morph changes * move morph_client * run style * small changes for mixed languages training * revert grpo.py changes * piston readme * local test fetching * bug fixes * updated readme * style fixes * style fixes 2 * deps changes * import sorting * fix tests * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update src/open_r1/rewards.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2025-05-25 11:55:27 +02:00
lewtun	db2d9b011a	Bump lower bound on liger-kernel (#654 ) Related to https://github.com/huggingface/open-r1/pull/653 (I forgot to include this in that PR)	2025-05-22 08:44:13 +02:00
lewtun	8067149e90	Bump DeepSpeed to 0.16.8 to fix OOM on Qwen3 (#653 )	2025-05-21 22:25:57 +02:00
lewtun	9366aa2df3	Add dataset mixer (#647 ) * Prototype * Clean up * Refactor * Add tests * Add doc and make scripts work * Tune doc * Up * Tune * Add column verification * Fix types * Fix YAML * Fix types * Fix doc * f * f	2025-05-20 11:40:42 +02:00
Quentin Gallouédec	5e0c210f9c	use hf papers (#646 )	2025-05-19 13:48:14 +02:00
lewtun	ebd5913a85	Bump LightEval (#643 )	2025-05-16 10:52:05 +02:00
Edward Beeching	ea5b7edf22	Add dataset filtering script (#637 ) * add dataset filtering script * remove subset selection * save wip * save wip * update filter script * refactor to run on chunks * rename script * cleanup * update dapo filtering * fixes * dapo filt config * udpate compute pass rate * clean * update readme and config * add merging snippet	2025-05-16 10:26:49 +02:00
lewtun	4fc2a3ff82	Add time to Slurm (#639 )	2025-05-09 19:19:51 +02:00
lewtun	c802f00512	Use pass@1 for all evals (#633 ) * Use pass@1 for all evals * Update scores	2025-05-09 17:42:36 +02:00
Edward Beeching	21b48fbe46	soft_overlong_punishment from DAPO paper (#638 ) * soft_overlong_punishment_reward * tests * doc string updated * style * non-sensical import removed * Update src/open_r1/rewards.py Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> * Update src/open_r1/rewards.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * max_completion_length set to 3.6 * style * quality * test case added for <max_com_len * style * max_len +cache len updated based on num chars * max_len_completion docstring added in cofig * Update configs.py * refactor soft overlong penalty to use completion ids * change decription to be tokens --------- Co-authored-by: shirinyamani <yamani.shirin@ucalgary.ca> Co-authored-by: Shirin Yamani <75791599+shirinyamani@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2025-05-09 17:26:34 +02:00
lewtun	6a0cd5c8ad	Fix style again :) (#636 )	2025-05-08 16:29:01 +02:00
Andrei	af81114044	Code Execution using Morph Cloud (#614 ) * initial commit for morphcloud sandbox support * initial * fixed prints in morph client for ioi * updated import * context manager * removed unnecessary comments * more intelligent instance/snapshot management * update * Add documentation for Morph integration * Delete MORPH_INTEGRATION.md * added retry and modularity to morph client * updates to kwargs and setup.py * Update setup.py * added languages codepath + fixed slurm + added m orph tests * make quality formatting fixes * conditional imports for morph --------- Co-authored-by: arb8020 <arbeightytwenty@gmail.com>	2025-05-08 08:59:54 +02:00
lewtun	52520a6713	Fix style (#631 ) * Fix style * Fix * Add jieba	2025-05-05 15:49:10 +02:00
Lewis Tunstall	c8b989109d	Fix style	2025-05-02 14:45:17 +00:00
lewtun	9373ad3055	Update README.md	2025-04-30 22:16:18 +02:00
binary-husky	65211f4824	🦜Enhance repetition penalty reward for language that cannot be split by whitespace (#516 ) * Update rewards.py * add test for repetition reward with language * Update src/open_r1/rewards.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update src/open_r1/rewards.py Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> --------- Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>	2025-04-30 22:02:59 +02:00
lewtun	75c3999180	Bump LightEval to enable DP>1 (#629 ) * Bump LightEval to enable DP>1 * Remove redundant arg * Update eval scores * Fix slurm	2025-04-30 22:02:20 +02:00
lewtun	50590a41b9	Enable data and tensor parallelism for GRPO (#626 ) * Bump deps * Fix SLurm * Fix	2025-04-26 11:50:08 +02:00

1 2 3 4 5

220 commits