Least-Squares Concept Erasure with Oracle Concept Labels
Achieving even more surgical edits than LEACE when we have concept labels at inference time.
Achieving even more surgical edits than LEACE when we have concept labels at inference time.
Explaining a result by Sam Marks and Max Tegmark
Introduction At the third New England RLHF Hackathon, several interesting projects were showcased, each focusing on different aspects of machine learning and reinforcement learning. Participants and…