r/LocalLLaMA
· Communities
How can I get better response time by caching my system prompt?
Hi, I've spent some time trying to find a solution to make my local AI cache the system prompt (unless it is already caching and hitting a wall on every new session is a thing)... I'm using Ornith 35b, with llama.cpp, on a Strix Halo (WIN10). It works great so far with my PI agent. I have around 7.1k tokens system prom