Fable in Shackles
This post was originally posted my Substack. I can be reached on LinkedIn and X.Two weeks ago, Anthropic released Fable 5, the public-facing version of Mythos…
This post was originally posted my Substack. I can be reached on LinkedIn and X.Two weeks ago, Anthropic released Fable 5, the public-facing version of Mythos…
This is the fifth post in the sequence Implications of Continual Learning for LLM Agents.SummaryWhile writing our continual learning sequence, we sent a survey to a…
AbstractWe make the case for training AIs to be risk-averse in resources — specifically, to treat resources as having diminishing marginal utility. These AIs would (for…
The more capabilities new frontier models gain, the more sharply the question arises how will we know when the model is doing something it shouldn't? Today,…
This work was done as a part of SPAR, under the mentorship of Mirko Bronzi and Damiano Fornasiere. TL;DRWe test models' ability to recover information about…
Methods note: The code used for the experiments and related open-source repo were built with Claude. The experimental design and writeup is my own, with minimal…
Epistemic status: I think the core idea could actually be built. My real doubt is whether anyone with the compute will ever bother to try it.…
TLDR: NLAs are a recent black box mech interp method for verbalizing model internals. I will be focusing on one of two components, the Activation Verbalizer…
After a billion architectures and a trillion variations, I finally found a transformer architecture that intrigued me. And this essay is step one towards the theory…
The agentic framework research has produced some very interesting results; from different topologies to different ways of using tool-calls, it has been one of the most…