Alignment Forum June 16, 2026 · Communities

Synthetic document finetuning for instilling positive traits

This is the fifth in a series of informal research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas. The fourth post can be found here.TLDR: Via adapting the methods of Marks et al and Li et al, we train Gemini 3 Flash to have certain traits/values by midtrai

Read original