Skip to content
r/LocalLLaMA · Communities

How can I get better response time by caching my system prompt?

Hi, I've spent some time trying to find a solution to make my local AI cache the system prompt (unless it is already caching and hitting a wall on every new session is a thing)... I'm using Ornith 35b, with llama.cpp, on a Strix Halo (WIN10). It works great so far with my PI agent. I have around 7.1k tokens system prom