Skip to content
r/LocalLLaMA · Communities

My config for daily beta llaama.cpp vulcan on 7900xtx/ubuntu. 262k inf, qwen3.6 35b a3b iq4_xs. sits about 22k MiB. crazy fast token generation and compacting. Twice as fast as optimized rocm 7.14 and lower memory usage/footprint.

#!/usr/bin/env bash # llama-server (Vulkan) — Qwen3.6-35B-A3B IQ4_XS. Set paths in CONFIG, then run. # Needs a Vulkan-enabled llama.cpp build and a Vulkan driver (vulkaninfo --summary). set -euo pipefail # ---- CONFIG ---- SERVER_PATH="${LLAMA_SERVER:-$HOME/llama.cpp/build/bin/llama-server}" MODEL="${LLAMA_MODEL:-$HOME