Qwen3.6 27B more dumb in vLLM compared to llama.cpp
Hello, I recently bought a new RTX 5060Ti to pair with the RTX 5060Ti I already own, now I have 32GB of VRAM. Up until now…
Hello, I recently bought a new RTX 5060Ti to pair with the RTX 5060Ti I already own, now I have 32GB of VRAM. Up until now…
As retrieval systems scale, high-quality reranking becomes increasingly important. However, most existing rerankers, whether encoder-based or decoder-based, jointly encode the query and passage, tightly coupling their…
Supported Models: granite-speech-4.1-2b-plus by 24818 LFM2.5-ColBERT-350M & LFM2.5-Embedding-350M by 24913 Vulkan: vulkan: link ggml-cpu when GGML_VULKAN_CHECK_RESULTS / RUN_TESTS are enabled #24444 vulkan: make mul_mm ALIGNED a…
As part of my ablations, I want to generate text with a medical-oriented LLM, and I was surprised to find no exposed APIs for this kind…
Article URL: https://bunny.net/blog/were-making-bunny-dns-free/ Comments URL: https://news.ycombinator.com/item?id=48657030 Points: 277 # Comments: 89
The source is in Italian, but a well respected newspaper (like Financial Times) https://www.ilsole24ore.com/art/frontier-grand-challenge-domyn-guidera-progetto-dell-ai-sovrana-AIgNTNoD?refresh_ce=1 They are a startup that has already created a closed 260b model…
Article URL: https://www.ashbyhq.com/careers?ashby_jid=87b96eef-edc1-4de4-adb6-d460126d02f8&utm_source=hn Comments URL: https://news.ycombinator.com/item?id=48656219 Points: 0 # Comments: 0
Hey guys Thanks in advance for your help and knowledge! My setup is born out of the parts I had at hand. Wanting to maximise VRAM…
It looks like a new model, mentioned on https://huggingface.co/Qwen/Qwen-AgentWorld-35B-A3B and on https://qwen.ai/blog?id=qwen-agentworld submitted by /u/Shoddy_Bed3240 [link] [comments]
Full-document parsing instead of cropped-region OCR 32K output length for long OCR sequences Base and gundam image modes for different document layouts Transformers inference + SGLang…
Qwen just released Qwen-AgentWorld-35B-A3B — a 35B-parameter MoE with only ~3B active parameters per token. The interesting part: this is not positioned as a standard chat/instruction…
submitted by /u/dan945 [link] [comments]
The more capabilities new frontier models gain, the more sharply the question arises how will we know when the model is doing something it shouldn't? Today,…
This work was done as a part of SPAR, under the mentorship of Mirko Bronzi and Damiano Fornasiere. TL;DRWe test models' ability to recover information about…
Methods note: The code used for the experiments and related open-source repo were built with Claude. The experimental design and writeup is my own, with minimal…
submitted by /u/johnnyApplePRNG [link] [comments]
Epistemic status: I think the core idea could actually be built. My real doubt is whether anyone with the compute will ever bother to try it.…
Web / reddit search have not found this posted in this sub, even though it is several days old news. So I do post. Related links:…
Article URL: https://gitlab.com/baiyibai/pico-usb-wifi Comments URL: https://news.ycombinator.com/item?id=48654676 Points: 164 # Comments: 72
TLDR: NLAs are a recent black box mech interp method for verbalizing model internals. I will be focusing on one of two components, the Activation Verbalizer…
After a billion architectures and a trillion variations, I finally found a transformer architecture that intrigued me. And this essay is step one towards the theory…
Article URL: https://gist.github.com/retroplasma/ec21767d0a8380c7ea9c2fbee1c7d6bf Comments URL: https://news.ycombinator.com/item?id=48654465 Points: 127 # Comments: 52
Article URL: https://www.muppetlabs.com/~breadbox/software/tiny/revisit.html Comments URL: https://news.ycombinator.com/item?id=48654411 Points: 45 # Comments: 2
Article URL: https://arxiv.org/abs/2606.24597 Comments URL: https://news.ycombinator.com/item?id=48654351 Points: 134 # Comments: 42