News from EleutherAI

EleutherAI Open Source April 15, 2026

Early Indicators of Reward Hacking via Reasoning Interpolation

Using importance sampling with fine-tuned donor prefills to predict reward hacking emergence during training

EleutherAI Open Source October 7, 2025

Reward Hacking Resarch Update

Interim report on ongoing work on reward hacking

EleutherAI Open Source August 12, 2025

Pretraining Data Filtering for Open-Weight AI Safety

Announcing Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs

EleutherAI Open Source August 1, 2025

Attention Probes

Adding attention to linear probes

EleutherAI Open Source June 23, 2025

Research Update: Applications of Local Volume Measurement

Research update on on applying local volume measurement to downstream tasks

EleutherAI Open Source June 12, 2025

Studying inductive biases of random networks via local volumes

In this post, we will study inductive biases of the parameter-function map of random neural networks using star domain volume estimates. This builds on the ideas…

EleutherAI Open Source June 5, 2025

The Common Pile v0.1

Announcing the Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text

EleutherAI Open Source May 30, 2025

Product Key Memory Sparse Coders

Using Product Key Memories to encode sparse coder features

EleutherAI Open Source December 12, 2024

SAEs trained on the same data don’t learn the same features

In this post, we show that when two TopK SAEs are trained on the same data, with the same batch order but with different random initializations,…

EleutherAI Open Source November 10, 2024

Partially rewriting an LLM in natural language

Using interpretations of SAE latents to simulate activations.

Latest