X · @miramurati March 6, 2026 · X / Twitter

RT Tinker: Contextual AI used Tinker to post-train the planning behavior for a search agent. They land on a two-stage training recipe: On-Policy Disti…

RT TinkerContextual AI used Tinker to post-train the planning behavior for a search agent. They land on a two-stage training recipe: On-Policy Distillation and GRPO with a CLP reward. Read more 👇Abdallah Bashir: Search agents, whether they're powering deep research, or multi-step QA over a private corpus, spend most of

Read original