News from LessWrong AI

LessWrong AI Communities 2 hr ago

Fable in Shackles

This post was originally posted my Substack. I can be reached on LinkedIn and X.Two weeks ago, Anthropic released Fable 5, the public-facing version of Mythos…

LessWrong AI Communities 2 hr ago

Expert Views on Continual Learning: Survey Results and Forecasts

This is the fifth post in the sequence Implications of Continual Learning for LLM Agents.SummaryWhile writing our continual learning sequence, we sent a survey to a…

LessWrong AI Communities 7 hr ago

Risk-Averse AIs

AbstractWe make the case for training AIs to be risk-averse in resources — specifically, to treat resources as having diminishing marginal utility. These AIs would (for…

LessWrong AI Communities 14 hr ago

Can weak AI watch strong AI?

The more capabilities new frontier models gain, the more sharply the question arises how will we know when the model is doing something it shouldn't? Today,…

LessWrong AI Communities 14 hr ago

Reasoning and learning about injected concepts in language models

This work was done as a part of SPAR, under the mentorship of Mirko Bronzi and Damiano Fornasiere. TL;DRWe test models' ability to recover information about…

LessWrong AI Communities 14 hr ago

Toy transformers may represent belief-state geometry optimally but not minimally

Methods note: The code used for the experiments and related open-source repo were built with Claude. The experimental design and writeup is my own, with minimal…

LessWrong AI Communities 15 hr ago

We Should Train Frontier AIs on a Synthetic World, Not Ours

Epistemic status: I think the core idea could actually be built. My real doubt is whether anyone with the compute will ever bother to try it.…

LessWrong AI Communities 16 hr ago

Can You Hide From a Natural Language Autoencoder?

TLDR: NLAs are a recent black box mech interp method for verbalizing model internals. I will be focusing on one of two components, the Activation Verbalizer…

LessWrong AI Communities 16 hr ago

Tree Transformers: A step towards generalizing the transformer architecture

After a billion architectures and a trillion variations, I finally found a transformer architecture that intrigued me. And this essay is step one towards the theory…

LessWrong AI Communities 17 hr ago

Agentic Frameworks: Or different ways to make LLM API calls

The agentic framework research has produced some very interesting results; from different topologies to different ways of using tool-calls, it has been one of the most…

Latest

Fable in Shackles

Expert Views on Continual Learning: Survey Results and Forecasts

Risk-Averse AIs

Can weak AI watch strong AI?

Reasoning and learning about injected concepts in language models

Toy transformers may represent belief-state geometry optimally but not minimally

We Should Train Frontier AIs on a Synthetic World, Not Ours

Can You Hide From a Natural Language Autoencoder?

Tree Transformers: A step towards generalizing the transformer architecture

Agentic Frameworks: Or different ways to make LLM API calls