Commit graph

12 commits

Author SHA1 Message Date
Lewis Tunstall
b6b1643c2d Fix benchmarks! 2025-05-27 20:44:35 +00:00
Lewis Tunstall
82fb385fa5 Refine tests 2025-05-27 13:39:00 +00:00
Lewis Tunstall
296aa66e1e Tweak format reward 2025-05-27 08:16:49 +00:00
Lewis Tunstall
bc06504df5 Add better baseline defaults 2025-05-26 09:06:09 +00:00
Lewis Tunstall
9862bfec41 Relax reward 2025-05-26 08:09:03 +00:00
Lewis Tunstall
1f56bab96c Tune baseline 2025-05-25 17:22:06 +00:00
Lewis Tunstall
965d451d61 Restore baseline 2025-05-25 17:00:33 +00:00
Lewis Tunstall
31eacc4b9a Use GAS instead of generation 2025-05-25 16:57:33 +00:00
Lewis Tunstall
0b933a2aa4 Restore gas 2025-05-25 16:54:18 +00:00
Lewis Tunstall
cf765df201 Tune baseline 2025-05-25 13:21:01 +00:00
Lewis Tunstall
da0e9ae28d Add overlong punishment 2025-05-25 12:46:45 +00:00
Lewis Tunstall
7f777c0583 Add new DAPO recipe 2025-05-25 12:40:32 +00:00