I did some model hacks, and got GLM5.2 from about 2.5 tok/s to >50 tok/s on my GH200 system.
G'day. This is part 3 on my Local LLM adventures. I have a crazy system hacked server-to-desktop system: Component Spec GPUs 2x Hopper H100, 96 GB…
G'day. This is part 3 on my Local LLM adventures. I have a crazy system hacked server-to-desktop system: Component Spec GPUs 2x Hopper H100, 96 GB…
I am sorry for sharing an article from a Korean website that you might not be familiar with. But South Korea is the only country currently…
Safetensors: https://huggingface.co/llmfan46/Nex-N2-mini-ultra-uncensored-heretic GGUFs: https://huggingface.co/llmfan46/Nex-N2-mini-ultra-uncensored-heretic-GGUF Find all my models here: HuggingFace-LLMFan46 If you like my work and find my models useful, then I would really appreciate if…
Benchmark from its model card. Removed online models & Qwen-AgentWorld-397B-A17B from the table. Just Open models. Model MCP Search Term. SWE Android Web OS Overall DeepSeek-V4-Pro…
Hi, I did a lot of reading online and was hoping you guys could help me out a bit more. It's technical stuff and I'm still…
A pull request adding a new run_javascript tool was merged into mainline a couple of weeks ago. I could not find any discussion about it here,…
Hi All, I typically check the model size to estimate if it will fit but I was thinking there should be some better way. There is…
I've been experimenting with using lower quants of Gemma 4 26B on my M3 16gb MacBook Air. The Quant runs at a solid 25 tokens per…
https://i.redd.it/zjduf8zns79h1.gif Baidu released Unlimited-OCR 2 days ago, and they claim it can transcribe dozens of pages in one forward pass. I read the research paper, and…
Hello, I recently bought a new RTX 5060Ti to pair with the RTX 5060Ti I already own, now I have 32GB of VRAM. Up until now…