open-r1

mirrors/open-r1

Fork 0

mirror of https://github.com/huggingface/open-r1.git synced 2026-06-24 01:54:06 +00:00

Commit graph

f28c7c4fee

Merge 9f9ebc3aea into 1416fa0cf2 Hemanth Kumar M 2026-05-28 11:54:08 +00:00
9f9ebc3aea feat: implement robust code output validator with numeric tolerance Claude Code 2026-05-28 17:23:00 +05:30
40fa8aea53

Merge 003f19ba45 into 1416fa0cf2 Hemanth Kumar M 2026-05-28 11:51:51 +00:00
003f19ba45 feat: make CF test cache size configurable via environment variable Claude Code 2026-05-28 17:20:19 +05:30
fd20b7d03e

Merge c9dfb5f9d2 into 1416fa0cf2 Hemanth Kumar M 2026-05-28 11:46:04 +00:00
c9dfb5f9d2 refactor: extract shared config options into mixin to reduce duplication Claude Code 2026-05-28 17:15:07 +05:30
7219f61daf

Merge 3f196f3cd3 into 1416fa0cf2 ALI AL MARJANI 2026-05-12 16:34:41 +02:00
3f196f3cd3 Add grounded information extraction rewards for GRPO Ali322O 2026-05-12 16:30:33 +02:00
da3ae985cf

Merge 005fc7ccb5 into 1416fa0cf2 Sambhav Dixit 2026-04-07 15:13:48 -07:00
87d19a78a7

Merge ebadec2136 into 1416fa0cf2 Ahmet 2026-04-07 15:13:33 -07:00
de5e8a4e8f

Merge 62edc88f41 into 1416fa0cf2 August Moharrami 2026-04-07 15:13:24 -07:00
f11afb0ff4

Merge b221f66f84 into 1416fa0cf2 Yuhang Yao 2026-04-07 15:11:35 -07:00
26162d9b32

Merge 18fea48144 into 1416fa0cf2 Aymeric Roucher 2026-04-07 15:11:02 -07:00
73dfe6ef0d

Merge 9b35c35bbb into 1416fa0cf2 JillChen525 2026-04-07 15:10:47 -07:00
353599c152

Merge 1bab913b88 into 1416fa0cf2 Edward Beeching 2026-04-07 15:09:38 -07:00
733e45f96f

Merge a9ff8c975f into 1416fa0cf2 dignfei 2026-04-07 15:09:26 -07:00
9c640f85f2

Merge 9f05e95bbd into 1416fa0cf2 Katrina Drozdov (Evtimova) 2026-04-07 15:08:55 -07:00
8ff2d0da2e

Merge fb71fde087 into 1416fa0cf2 Jung-Yup Lee 2026-04-07 15:08:55 -07:00
c8e90322be

Merge 31a99af5bb into 1416fa0cf2 A-Mahla 2026-04-07 15:08:47 -07:00
4d9dd867ec

Merge 91e49092d3 into 1416fa0cf2 ShashankOS 2026-04-07 15:08:47 -07:00
0949bbd3f9

Merge d264a6912a into 1416fa0cf2 Kamal Raj Rajaratnam 2026-04-07 15:08:38 -07:00
014576bb52

Merge cf9ed9169c into 1416fa0cf2 qmzp93 2026-04-07 15:08:37 -07:00
d011160887

Merge cdc9aa541e into 1416fa0cf2 Andy Twigg 2026-04-07 02:43:10 +00:00
02fdbb3ecf

Merge ef2bdc3bd7 into 1416fa0cf2 Abishek 2026-04-05 02:36:11 +00:00
1326e5db42

Merge a853173b9e into 1416fa0cf2 astephenson 2026-04-02 12:19:02 -04:00
413d9207a9

Merge f386e7fec9 into 1416fa0cf2 Hayato Hongo 2026-04-02 12:19:02 -04:00
6aa3bd3232

Merge f84325f5b3 into 1416fa0cf2 wenzhaoabc 2026-04-02 12:19:02 -04:00
1416fa0cf2

🔒 pin tests.yml actions to commit SHAs (#721) main Pauline Bailly-Masson 2026-04-02 16:03:12 +02:00
2cbf9bc875 🔒 pin tests.yml actions to commit SHAs Pauline Bailly-Masson 2026-04-02 11:50:05 +02:00
c68739e361 configurable reward zb1439 2026-03-07 22:35:35 -08:00
b3a9bc1021 add failed_correct&passed_wrong into swanlab Zityuen 2026-03-07 18:30:16 -08:00
603495cebc make our reward registerable Zityuen 2026-03-07 11:22:54 -08:00
0800259b0b reward v2 Zityuen 2026-03-07 11:06:58 -08:00
a34e0cea59 modify config Zityuen 2026-03-05 16:00:22 -08:00
826644e008 fix ast parse in reward Zityuen 2026-03-02 22:52:22 -08:00
2c2b036500 fix1 Zityuen 2026-03-02 22:32:34 -08:00
2763cbe179 add entropy logging Zityuen 2026-03-02 18:56:53 -08:00
7bdf996c2f modify reward function Zityuen 2026-03-02 18:32:29 -08:00
4b55cd7012 filter->map Zityuen 2026-03-01 23:45:34 -08:00
7911e1a160 fix Zhe Yuan 2026-02-19 03:15:48 +00:00
3e7dcda316 fix Zhe Yuan 2026-02-19 03:03:05 +00:00
82f1f3c253 group sol 2 Zhe Yuan 2026-02-19 02:27:53 +00:00
cdc9aa541e

Simplify checkpoint assignment logic in training loop Andy Twigg 2026-02-13 18:34:15 -08:00
ef2bdc3bd7

Refactor import statements in e2b_router.py Abishek 2026-02-09 19:45:26 +05:30
b6c9eade0b coderm Zhe Yuan 2026-02-06 01:18:05 +00:00
4a6af641db move trl into src folder Zhe Yuan 2026-02-05 07:09:37 +00:00
e23a959135 idea4截断重采样，完整版 ColdJeans21 2026-01-23 22:47:22 +08:00
3009913138 插入的prompt不参与优势计算（完整版） root 2026-01-19 21:25:46 +08:00
a4028cda60 当采样的acc_reward为全0时，将采样的15%处截断，插入prompt，重采样，重计算acc_reward，以防出现全0的情况无法提供有效梯度和优势信号。增加了hint_template, hint_regeneration_count,hint_truncate_ratio参数，可以调整截断比例，重生成个数和插入prompt格式 ColdJeans21 2026-01-14 21:16:02 +08:00
fbb7ed795c

Merge edaff18fda into 0e06249d1c dependabot[bot] 2025-11-24 12:37:26 +00:00
edaff18fda

Bump actions/checkout from 4 to 6 dependabot/github_actions/actions/checkout-6 dependabot[bot] 2025-11-24 12:37:24 +00:00
35d4e4bc3e 原版GRPO，修改了奖励函数适配gsm8k的训练，调整了配置config_demo1文件适应训练，增加了两个shell脚本 ColdJeans21 2025-11-23 19:27:33 +08:00
cb0502dc95 将GRPO修改为I(H>threshold)*min[\frac{\pi_{\theta}}{\pi_{\theta_{old}}}(A_{i}+(1-r_{i})H_{i,t})] ColdJeans21 2025-11-21 16:16:56 +08:00
0f38ea2a83 还原为最初的GRPO，同时改为训练Qwen3-0.6B-base ColdJeans21 2025-11-19 15:34:39 +08:00
aa269c4912 增加trl库; 仅对“学习项”应用熵掩码I(H>threshold), KL 仍作用于所有 token ColdJeans21 2025-11-17 17:08:53 +08:00
36bbeda02c 更新.gitignore规则 ColdJeans21 2025-11-17 15:06:58 +08:00
ff9bc1c621 增加环境，并配置好相应的参数 ColdJeans21 2025-11-15 00:45:13 +08:00
cf9ed9169c

Fix typo in GRPO dataset filtering section qmzp93 2025-11-08 13:41:47 +08:00
d264a6912a feat(code): add LocalProvider; fix E2B detection; fix Optional import kamalraj23 2025-11-02 18:09:38 +01:00
f84325f5b3 adapt for local machine wenzhaoabc 2025-10-20 20:30:54 +08:00
c229c2f110 Merge branch 'main' of github.com:wenzhaoabc/open-r1 solve wenzhaoabc 2025-10-20 19:36:48 +08:00
542877759a add traffic reward wenzhaoabc 2025-10-20 19:35:30 +08:00
77ae93321c

Fix dataset loading by unpacking dataset_config wenzhaoabc 2025-10-15 16:45:53 +08:00
da521fc786

Merge eb380f5728 into 0e06249d1c dependabot[bot] 2025-09-08 12:43:41 +00:00
eb380f5728

Bump actions/setup-python from 5 to 6 dependabot/github_actions/actions/setup-python-6 dependabot[bot] 2025-09-08 12:43:38 +00:00
f386e7fec9 [MoE] ZeRO-3 leaf module setup for Qwen MoE model completed. HayatoHongo 2025-08-26 13:36:45 +09:00
91e49092d3 Add comprehensive PR description for enhanced utilities ExtMac 2025-08-18 17:02:53 -04:00
75331a365f Add comprehensive utility modules for Open-R1 ExtMac 2025-08-18 16:58:00 -04:00
62a65132dc Harden routed sandbox error handling to prevent TypeError crashes ExtMac 2025-08-18 16:27:29 -04:00
56c93f81ce rewards: make format_reward tolerant to whitespace/inline variants\n\n- Accept inline and multi-line <think>/<answer> layouts\n- Use whitespace-robust regex; keep tests passing\n- Verified on unit tests and sample inputs ExtMac 2025-08-18 16:18:58 -04:00
5c91805453 Setup Open-R1 project environment and resolve dependencies ExtMac 2025-08-18 16:05:44 -04:00
53bdb16ec0 feat: llama 3.1 Lisandra Moura 2025-08-14 17:50:46 +00:00
12ad492496

Bump actions/checkout from 4 to 5 dependabot[bot] 2025-08-11 17:28:51 +00:00
31a99af5bb CHG collator gui-training amir.mahla@huggingface.co 2025-08-10 12:03:15 +00:00
76c56724ee CHG collator amir.mahla@huggingface.co 2025-08-10 10:23:19 +00:00
b5a27167f1 CHG transform_messages amir.mahla@huggingface.co 2025-08-08 16:38:05 +00:00
dca3a06ada PUSH last config amir.mahla@huggingface.co 2025-08-08 15:35:50 +00:00
803c468507 ADD aguvis-stage-1 dataprocessing amir.mahla@huggingface.co 2025-08-04 16:17:40 +00:00
afbd97b1ec ADD new config dataset amir.mahla@huggingface.co 2025-07-30 12:46:11 +00:00
55c49d66c3 CHG recipe - add eval step amir.mahla@huggingface.co 2025-07-24 16:29:53 +00:00
4c89c85fff CHG recipe amir.mahla@huggingface.co 2025-07-24 16:02:14 +00:00
2ef6b50ccd FIX collator amir.mahla@huggingface.co 2025-07-24 15:54:58 +00:00
7852ddefc8 CHG qwenvl collator amir.mahla@huggingface.co 2025-07-24 13:58:04 +00:00
648a523325 Training script for qwenvl amir.mahla@huggingface.co 2025-07-24 13:55:13 +00:00
342f8f7856 Training script for qwenvl amir.mahla@huggingface.co 2025-07-24 13:38:17 +00:00
02819cf0ab Deleted action amir.mahla@huggingface.co 2025-07-21 12:43:26 +00:00
4a55c49641 Aguvis dataset transform amir.mahla@huggingface.co 2025-07-21 12:41:02 +00:00
dfcaecc92c Aguvis Data Pipeline Done Amir Mahla 2025-07-21 10:41:59 +02:00
c13574e28a ADD function_parser amir.mahla@huggingface.co 2025-07-17 22:43:57 +00:00
0e06249d1c

Update README.md Quentin Gallouédec 2025-07-17 13:20:00 -07:00
18fea48144 Fix script agent-traces aymeric@huggingface.co 2025-07-10 09:23:53 +00:00
e8a4c2bd08 Update imports aymeric@huggingface.co 2025-07-09 15:36:08 +00:00
6a63f2fed6 Unify conversion in only one script aymeric@huggingface.co 2025-07-09 13:58:01 +00:00
4c83688dba Modify aguvis conversion script aymeric@huggingface.co 2025-07-09 11:51:59 +00:00
868d4a4011 Small fixes in recipe aymeric@huggingface.co 2025-07-09 10:03:56 +00:00
5ca5d0f4ed Add Modal as provider for training with code interpreter Andrew Hinh 2025-07-07 19:27:19 +00:00
7e700c6218

Update citation (#688) Quentin Gallouédec 2025-07-07 10:23:08 -07:00
a5098b0c50

Update README.md Quentin Gallouédec 2025-07-07 08:40:00 -07:00
1b50860c3e Remove merge artifact aymeric-roucher 2025-07-03 08:24:01 +00:00
880a5853fb Merge branch 'agent-traces' of github.com:huggingface/open-r1 into agent-traces aymeric-roucher 2025-07-03 08:22:32 +00:00