Skip to content
arXiv cs.NE · Papers

Distributed Quality-Diversity Search for Toxicity in Large Language Models

arXiv:2606.24166v1 Announce Type: new Abstract: Large Language Models remain vulnerable to adversarial prompts that elicit harmful responses, and scaling red-teaming to cover a broad range of failure modes is constrained by the cost of text generation and evaluation. We present emph{ToxSearch-S}, a speciated extension