Commit graph

  • f28c7c4fee
    Merge 9f9ebc3aea into 1416fa0cf2 Hemanth Kumar M 2026-05-28 11:54:08 +00:00
  • 9f9ebc3aea feat: implement robust code output validator with numeric tolerance Claude Code 2026-05-28 17:23:00 +05:30
  • 40fa8aea53
    Merge 003f19ba45 into 1416fa0cf2 Hemanth Kumar M 2026-05-28 11:51:51 +00:00
  • 003f19ba45 feat: make CF test cache size configurable via environment variable Claude Code 2026-05-28 17:20:19 +05:30
  • fd20b7d03e
    Merge c9dfb5f9d2 into 1416fa0cf2 Hemanth Kumar M 2026-05-28 11:46:04 +00:00
  • c9dfb5f9d2 refactor: extract shared config options into mixin to reduce duplication Claude Code 2026-05-28 17:15:07 +05:30
  • 7219f61daf
    Merge 3f196f3cd3 into 1416fa0cf2 ALI AL MARJANI 2026-05-12 16:34:41 +02:00
  • 3f196f3cd3 Add grounded information extraction rewards for GRPO Ali322O 2026-05-12 16:30:33 +02:00
  • da3ae985cf
    Merge 005fc7ccb5 into 1416fa0cf2 Sambhav Dixit 2026-04-07 15:13:48 -07:00
  • 87d19a78a7
    Merge ebadec2136 into 1416fa0cf2 Ahmet 2026-04-07 15:13:33 -07:00
  • de5e8a4e8f
    Merge 62edc88f41 into 1416fa0cf2 August Moharrami 2026-04-07 15:13:24 -07:00
  • f11afb0ff4
    Merge b221f66f84 into 1416fa0cf2 Yuhang Yao 2026-04-07 15:11:35 -07:00
  • 26162d9b32
    Merge 18fea48144 into 1416fa0cf2 Aymeric Roucher 2026-04-07 15:11:02 -07:00
  • 73dfe6ef0d
    Merge 9b35c35bbb into 1416fa0cf2 JillChen525 2026-04-07 15:10:47 -07:00
  • 353599c152
    Merge 1bab913b88 into 1416fa0cf2 Edward Beeching 2026-04-07 15:09:38 -07:00
  • 733e45f96f
    Merge a9ff8c975f into 1416fa0cf2 dignfei 2026-04-07 15:09:26 -07:00
  • 9c640f85f2
    Merge 9f05e95bbd into 1416fa0cf2 Katrina Drozdov (Evtimova) 2026-04-07 15:08:55 -07:00
  • 8ff2d0da2e
    Merge fb71fde087 into 1416fa0cf2 Jung-Yup Lee 2026-04-07 15:08:55 -07:00
  • c8e90322be
    Merge 31a99af5bb into 1416fa0cf2 A-Mahla 2026-04-07 15:08:47 -07:00
  • 4d9dd867ec
    Merge 91e49092d3 into 1416fa0cf2 ShashankOS 2026-04-07 15:08:47 -07:00
  • 0949bbd3f9
    Merge d264a6912a into 1416fa0cf2 Kamal Raj Rajaratnam 2026-04-07 15:08:38 -07:00
  • 014576bb52
    Merge cf9ed9169c into 1416fa0cf2 qmzp93 2026-04-07 15:08:37 -07:00
  • d011160887
    Merge cdc9aa541e into 1416fa0cf2 Andy Twigg 2026-04-07 02:43:10 +00:00
  • 02fdbb3ecf
    Merge ef2bdc3bd7 into 1416fa0cf2 Abishek 2026-04-05 02:36:11 +00:00
  • 1326e5db42
    Merge a853173b9e into 1416fa0cf2 astephenson 2026-04-02 12:19:02 -04:00
  • 413d9207a9
    Merge f386e7fec9 into 1416fa0cf2 Hayato Hongo 2026-04-02 12:19:02 -04:00
  • 6aa3bd3232
    Merge f84325f5b3 into 1416fa0cf2 wenzhaoabc 2026-04-02 12:19:02 -04:00
  • 1416fa0cf2
    🔒 pin tests.yml actions to commit SHAs (#721) main Pauline Bailly-Masson 2026-04-02 16:03:12 +02:00
  • 2cbf9bc875 🔒 pin tests.yml actions to commit SHAs Pauline Bailly-Masson 2026-04-02 11:50:05 +02:00
  • c68739e361 configurable reward zb1439 2026-03-07 22:35:35 -08:00
  • b3a9bc1021 add failed_correct&passed_wrong into swanlab Zityuen 2026-03-07 18:30:16 -08:00
  • 603495cebc make our reward registerable Zityuen 2026-03-07 11:22:54 -08:00
  • 0800259b0b reward v2 Zityuen 2026-03-07 11:06:58 -08:00
  • a34e0cea59 modify config Zityuen 2026-03-05 16:00:22 -08:00
  • 826644e008 fix ast parse in reward Zityuen 2026-03-02 22:52:22 -08:00
  • 2c2b036500 fix1 Zityuen 2026-03-02 22:32:34 -08:00
  • 2763cbe179 add entropy logging Zityuen 2026-03-02 18:56:53 -08:00
  • 7bdf996c2f modify reward function Zityuen 2026-03-02 18:32:29 -08:00
  • 4b55cd7012 filter->map Zityuen 2026-03-01 23:45:34 -08:00
  • 7911e1a160 fix Zhe Yuan 2026-02-19 03:15:48 +00:00
  • 3e7dcda316 fix Zhe Yuan 2026-02-19 03:03:05 +00:00
  • 82f1f3c253 group sol 2 Zhe Yuan 2026-02-19 02:27:53 +00:00
  • cdc9aa541e
    Simplify checkpoint assignment logic in training loop Andy Twigg 2026-02-13 18:34:15 -08:00
  • ef2bdc3bd7
    Refactor import statements in e2b_router.py Abishek 2026-02-09 19:45:26 +05:30
  • b6c9eade0b coderm Zhe Yuan 2026-02-06 01:18:05 +00:00
  • 4a6af641db move trl into src folder Zhe Yuan 2026-02-05 07:09:37 +00:00
  • e23a959135 idea4截断重采样,完整版 ColdJeans21 2026-01-23 22:47:22 +08:00
  • 3009913138 插入的prompt不参与优势计算(完整版) root 2026-01-19 21:25:46 +08:00
  • a4028cda60 当采样的acc_reward为全0时,将采样的15%处截断,插入prompt,重采样,重计算acc_reward,以防出现全0的情况无法提供有效梯度和优势信号。增加了hint_template, hint_regeneration_count,hint_truncate_ratio参数,可以调整截断比例,重生成个数和插入prompt格式 ColdJeans21 2026-01-14 21:16:02 +08:00
  • fbb7ed795c
    Merge edaff18fda into 0e06249d1c dependabot[bot] 2025-11-24 12:37:26 +00:00
  • edaff18fda
    Bump actions/checkout from 4 to 6 dependabot/github_actions/actions/checkout-6 dependabot[bot] 2025-11-24 12:37:24 +00:00
  • 35d4e4bc3e 原版GRPO,修改了奖励函数适配gsm8k的训练,调整了配置config_demo1文件适应训练,增加了两个shell脚本 ColdJeans21 2025-11-23 19:27:33 +08:00
  • cb0502dc95 将GRPO修改为I(H>threshold)*min[\frac{\pi_{\theta}}{\pi_{\theta_{old}}}(A_{i}+(1-r_{i})H_{i,t})] ColdJeans21 2025-11-21 16:16:56 +08:00
  • 0f38ea2a83 还原为最初的GRPO,同时改为训练Qwen3-0.6B-base ColdJeans21 2025-11-19 15:34:39 +08:00
  • aa269c4912 增加trl库; 仅对“学习项”应用熵掩码I(H>threshold), KL 仍作用于所有 token ColdJeans21 2025-11-17 17:08:53 +08:00
  • 36bbeda02c 更新.gitignore规则 ColdJeans21 2025-11-17 15:06:58 +08:00
  • ff9bc1c621 增加环境,并配置好相应的参数 ColdJeans21 2025-11-15 00:45:13 +08:00
  • cf9ed9169c
    Fix typo in GRPO dataset filtering section qmzp93 2025-11-08 13:41:47 +08:00
  • d264a6912a feat(code): add LocalProvider; fix E2B detection; fix Optional import kamalraj23 2025-11-02 18:09:38 +01:00
  • f84325f5b3 adapt for local machine wenzhaoabc 2025-10-20 20:30:54 +08:00
  • c229c2f110 Merge branch 'main' of github.com:wenzhaoabc/open-r1 solve wenzhaoabc 2025-10-20 19:36:48 +08:00
  • 542877759a add traffic reward wenzhaoabc 2025-10-20 19:35:30 +08:00
  • 77ae93321c
    Fix dataset loading by unpacking dataset_config wenzhaoabc 2025-10-15 16:45:53 +08:00
  • da521fc786
    Merge eb380f5728 into 0e06249d1c dependabot[bot] 2025-09-08 12:43:41 +00:00
  • eb380f5728
    Bump actions/setup-python from 5 to 6 dependabot/github_actions/actions/setup-python-6 dependabot[bot] 2025-09-08 12:43:38 +00:00
  • f386e7fec9 [MoE] ZeRO-3 leaf module setup for Qwen MoE model completed. HayatoHongo 2025-08-26 13:36:45 +09:00
  • 91e49092d3 Add comprehensive PR description for enhanced utilities ExtMac 2025-08-18 17:02:53 -04:00
  • 75331a365f Add comprehensive utility modules for Open-R1 ExtMac 2025-08-18 16:58:00 -04:00
  • 62a65132dc Harden routed sandbox error handling to prevent TypeError crashes ExtMac 2025-08-18 16:27:29 -04:00
  • 56c93f81ce rewards: make format_reward tolerant to whitespace/inline variants\n\n- Accept inline and multi-line <think>/<answer> layouts\n- Use whitespace-robust regex; keep tests passing\n- Verified on unit tests and sample inputs ExtMac 2025-08-18 16:18:58 -04:00
  • 5c91805453 Setup Open-R1 project environment and resolve dependencies ExtMac 2025-08-18 16:05:44 -04:00
  • 53bdb16ec0 feat: llama 3.1 Lisandra Moura 2025-08-14 17:50:46 +00:00
  • 12ad492496
    Bump actions/checkout from 4 to 5 dependabot[bot] 2025-08-11 17:28:51 +00:00
  • 31a99af5bb CHG collator gui-training amir.mahla@huggingface.co 2025-08-10 12:03:15 +00:00
  • 76c56724ee CHG collator amir.mahla@huggingface.co 2025-08-10 10:23:19 +00:00
  • b5a27167f1 CHG transform_messages amir.mahla@huggingface.co 2025-08-08 16:38:05 +00:00
  • dca3a06ada PUSH last config amir.mahla@huggingface.co 2025-08-08 15:35:50 +00:00
  • 803c468507 ADD aguvis-stage-1 dataprocessing amir.mahla@huggingface.co 2025-08-04 16:17:40 +00:00
  • afbd97b1ec ADD new config dataset amir.mahla@huggingface.co 2025-07-30 12:46:11 +00:00
  • 55c49d66c3 CHG recipe - add eval step amir.mahla@huggingface.co 2025-07-24 16:29:53 +00:00
  • 4c89c85fff CHG recipe amir.mahla@huggingface.co 2025-07-24 16:02:14 +00:00
  • 2ef6b50ccd FIX collator amir.mahla@huggingface.co 2025-07-24 15:54:58 +00:00
  • 7852ddefc8 CHG qwenvl collator amir.mahla@huggingface.co 2025-07-24 13:58:04 +00:00
  • 648a523325 Training script for qwenvl amir.mahla@huggingface.co 2025-07-24 13:55:13 +00:00
  • 342f8f7856 Training script for qwenvl amir.mahla@huggingface.co 2025-07-24 13:38:17 +00:00
  • 02819cf0ab Deleted action amir.mahla@huggingface.co 2025-07-21 12:43:26 +00:00
  • 4a55c49641 Aguvis dataset transform amir.mahla@huggingface.co 2025-07-21 12:41:02 +00:00
  • dfcaecc92c Aguvis Data Pipeline Done Amir Mahla 2025-07-21 10:41:59 +02:00
  • c13574e28a ADD function_parser amir.mahla@huggingface.co 2025-07-17 22:43:57 +00:00
  • 0e06249d1c
    Update README.md Quentin Gallouédec 2025-07-17 13:20:00 -07:00
  • 18fea48144 Fix script agent-traces aymeric@huggingface.co 2025-07-10 09:23:53 +00:00
  • e8a4c2bd08 Update imports aymeric@huggingface.co 2025-07-09 15:36:08 +00:00
  • 6a63f2fed6 Unify conversion in only one script aymeric@huggingface.co 2025-07-09 13:58:01 +00:00
  • 4c83688dba Modify aguvis conversion script aymeric@huggingface.co 2025-07-09 11:51:59 +00:00
  • 868d4a4011 Small fixes in recipe aymeric@huggingface.co 2025-07-09 10:03:56 +00:00
  • 5ca5d0f4ed Add Modal as provider for training with code interpreter Andrew Hinh 2025-07-07 19:27:19 +00:00
  • 7e700c6218
    Update citation (#688) Quentin Gallouédec 2025-07-07 10:23:08 -07:00
  • a5098b0c50
    Update README.md Quentin Gallouédec 2025-07-07 08:40:00 -07:00
  • 1b50860c3e Remove merge artifact aymeric-roucher 2025-07-03 08:24:01 +00:00
  • 880a5853fb Merge branch 'agent-traces' of github.com:huggingface/open-r1 into agent-traces aymeric-roucher 2025-07-03 08:22:32 +00:00