Mechanistic Anomaly Detection Research Update 2
Interim report on ongoing work on mechanistic anomaly detection
Interim report on ongoing work on mechanistic anomaly detection
GPT-NeoX now supports post-training thanks to a collaboration with SynthLabs.
Exploring the implementation details of muTransfer
Interim report on ongoing work on mechanistic anomaly detection
Building and evaluating an open-source pipeline for auto-interpretability
Writing up results from a recent project
Achieving even more surgical edits than LEACE without concept labels at inference time.
Writing up results from a project from Spring 2023
Setting the record straight regarding Yi-34B and Llama 2.
Announcing a new resource, the FM Dev Cheatsheet.