grpo

Cover image for Inside the Agentic RL Training Loop

Feb 13, 2026 · 38 min read Intelligence Cartography

Inside the Agentic RL Training Loop

A Step-by-Step Walkthrough using Slime and SWE-Bench as an Example

Cover image for JustTinker: Minimal RLVR for Building Reasoning Models Under $150

Jan 15, 2026 · 4 min read Intelligence Cartography

JustTinker: Minimal RLVR for Building Reasoning Models Under $150

Low-Resource RLVR for Transforming Instruct Models into Reasoning Models