News from r/LocalLLaMA

r/LocalLLaMA Communities 1 hr ago

I reverse engineered Windows Copilot into a free OpenAI compatible API (GPT-4, no API key, no billing)

So Microsoft gives you GPT-4 for free in Copilot. They just don't give you an API for it. So I made one. It logs into your…

r/LocalLLaMA Communities 1 hr ago

Has anyone else found vLLM outputs noticeably worse than llama.cpp for the same model?

I'm wondering if anyone else has come across this. I've tested the same model on llama.cpp and vLLM with similar settings and quantizations. The performance and…

r/LocalLLaMA Communities 2 hr ago

Sipp – an open-source library for in-browser inference built on llama.cpp

GitHub: https://github.com/noumena-labs/Sipp submitted by /u/lordhiggsboson [link] [comments]

r/LocalLLaMA Communities 3 hr ago

Big News for AMD / Strix Halo+ Owners

Admittedly this is news for me, but I'm hoping it could be of some use to others here as well! So, THE NPU IS USABLE!! I've…

r/LocalLLaMA Communities 3 hr ago

Build a LLM from Scratch using MLX

You probably have a burning desire to grasp the inner workings of LLMs. By now, terms like Attention, Transformers, and Tokenizers are likely ringing in your…

r/LocalLLaMA Communities 4 hr ago

My micro-benchmark: how good are LLMs at simulating wetting behaviour?

Example surfaces that LLMs are asked to simulate, showing simulated liquid (green) shaped by solid constraints (orange). Overall score, pass count, and recorded token/cost totals for…

r/LocalLLaMA Communities 4 hr ago

OpenAI and Broadcom unveil LLM-optimized inference chip

https://openai.com/index/openai-broadcom-jalapeno-inference-chip/ Quoted from the start of the blog post: Early testing shows that the first-generation accelerator will deliver performance per watt substantially better than current state-of-the-art…

r/LocalLLaMA Communities 4 hr ago

The Swiss Federal Supreme Court is evaluating Heretic

“Oh no, are they banning abliterated models now?!?” If that was your first thought when you read the title I can’t blame you. But that’s actually…

r/LocalLLaMA Communities 5 hr ago

I did some model hacks, and got GLM5.2 from about 2.5 tok/s to >50 tok/s on my GH200 system.

G'day. This is part 3 on my Local LLM adventures. I have a crazy system hacked server-to-desktop system: Component Spec GPUs 2x Hopper H100, 96 GB…

r/LocalLLaMA Communities 5 hr ago

The Bank of Korea just released a report about AI productivity

I am sorry for sharing an article from a Korean website that you might not be familiar with. But South Korea is the only country currently…

Latest

I reverse engineered Windows Copilot into a free OpenAI compatible API (GPT-4, no API key, no billing)

Has anyone else found vLLM outputs noticeably worse than llama.cpp for the same model?

Sipp – an open-source library for in-browser inference built on llama.cpp

Big News for AMD / Strix Halo+ Owners

Build a LLM from Scratch using MLX

My micro-benchmark: how good are LLMs at simulating wetting behaviour?

OpenAI and Broadcom unveil LLM-optimized inference chip

The Swiss Federal Supreme Court is evaluating Heretic

I did some model hacks, and got GLM5.2 from about 2.5 tok/s to >50 tok/s on my GH200 system.

The Bank of Korea just released a report about AI productivity