News from r/LocalLLaMA

r/LocalLLaMA Communities 6 hr ago

I did some model hacks, and got GLM5.2 from about 2.5 tok/s to >50 tok/s on my GH200 system.

G'day. This is part 3 on my Local LLM adventures. I have a crazy system hacked server-to-desktop system: Component Spec GPUs 2x Hopper H100, 96 GB…

r/LocalLLaMA Communities 7 hr ago

The Bank of Korea just released a report about AI productivity

I am sorry for sharing an article from a Korean website that you might not be familiar with. But South Korea is the only country currently…

r/LocalLLaMA Communities 7 hr ago

Nex-N2-Mini-Ultra-Uncensored-Heretic Is Out Now, an Agentic Model With Agentic Thinking Now Uncensored With 5/100 Refusals and 0.0020 KLD, Available in Safetensors and GGUF Formats!

Safetensors: https://huggingface.co/llmfan46/Nex-N2-mini-ultra-uncensored-heretic GGUFs: https://huggingface.co/llmfan46/Nex-N2-mini-ultra-uncensored-heretic-GGUF Find all my models here: HuggingFace-LLMFan46 If you like my work and find my models useful, then I would really appreciate if…

r/LocalLLaMA Communities 7 hr ago

Qwen-AgentWorld-35B-A3B for Coding?

Benchmark from its model card. Removed online models & Qwen-AgentWorld-397B-A17B from the table. Just Open models. Model MCP Search Term. SWE Android Web OS Overall DeepSeek-V4-Pro…

r/LocalLLaMA Communities 7 hr ago

Dual gpu sanity check: is this a smart buy?

Hi, I did a lot of reading online and was hoping you guys could help me out a bit more. It's technical stuff and I'm still…

r/LocalLLaMA Communities 8 hr ago

llama.cpp’s web UI now supports executing model generated JavaScript in the browser, through Web Workers (opt in)

A pull request adding a new run_javascript tool was merged into mainline a couple of weeks ago. I could not find any discussion about it here,…

r/LocalLLaMA Communities 8 hr ago

What’s everyone using to estimate VRAM/RAM (weights + KV cache) before spinning up a local model?

Hi All, I typically check the model size to estimate if it will fit but I was thinking there should be some better way. There is…

r/LocalLLaMA Communities 8 hr ago

Gemma 4 26BA4B Surprisingly Usable at IQ3_S – Are small quants really this usable?

I've been experimenting with using lower quants of Gemma 4 26B on my M3 16gb MacBook Air. The Quant runs at a solid 25 tokens per…

r/LocalLLaMA Communities 8 hr ago

How Baidu’s newly released Unlimited-OCR transcribes dozens of pages in one forward pass

https://i.redd.it/zjduf8zns79h1.gif Baidu released Unlimited-OCR 2 days ago, and they claim it can transcribe dozens of pages in one forward pass. I read the research paper, and…

r/LocalLLaMA Communities 9 hr ago

Qwen3.6 27B more dumb in vLLM compared to llama.cpp

Hello, I recently bought a new RTX 5060Ti to pair with the RTX 5060Ti I already own, now I have 32GB of VRAM. Up until now…

Latest

I did some model hacks, and got GLM5.2 from about 2.5 tok/s to >50 tok/s on my GH200 system.

The Bank of Korea just released a report about AI productivity

Nex-N2-Mini-Ultra-Uncensored-Heretic Is Out Now, an Agentic Model With Agentic Thinking Now Uncensored With 5/100 Refusals and 0.0020 KLD, Available in Safetensors and GGUF Formats!

Qwen-AgentWorld-35B-A3B for Coding?

Dual gpu sanity check: is this a smart buy?

llama.cpp’s web UI now supports executing model generated JavaScript in the browser, through Web Workers (opt in)

What’s everyone using to estimate VRAM/RAM (weights + KV cache) before spinning up a local model?

Gemma 4 26BA4B Surprisingly Usable at IQ3_S – Are small quants really this usable?

How Baidu’s newly released Unlimited-OCR transcribes dozens of pages in one forward pass

Qwen3.6 27B more dumb in vLLM compared to llama.cpp