OpenClaw-RL 自进化引擎 来源:arXiv:2603.10165(HuggingFace #1,5500+ stars) 架构 4组件全异步:Serving/Rollout/PRM/Trainer,Server-Client,OpenAI-compatible API 三种范式 Binary RL (GRPO):二元奖励信号 On-Policy Distillation (OPD):在线策略蒸馏 Hybrid(最优):混合模式 关键创新 Overlap-Guided Hint Selection Log-prob clip 落地路径 Phase 1: Tinker LoRA 试水(月$5,零GPU) Phase 2: 租GPU Phase 3: 本地GPU 当前阻塞
来源:arXiv:2603.10165(HuggingFace #1,5500+ stars)
4组件全异步:Serving/Rollout/PRM/Trainer,Server-Client,OpenAI-compatible API
Phase 1: Tinker LoRA 试水(月$5,零GPU)
Phase 2: 租GPU
Phase 3: 本地GPU
无GPU、API模型(非自托管)、Tinker网络连通性、数据隐私