r/LocalLLaMA
· Communities
Is it ever possible to have a malicious LLM with a backdoor
I was just brainstorming of possibilities that the LLMs behave differently than normal if trained to recognize a specific secret sentence, and then unlocks a backdoor of malicious behavior. This sounds to me very possible at first glance. Don't get me wrong, the risk is relevant for ALL LLMs (closed & open ones), as lo