r/LocalLLaMA
· Communities
I benchmarked 8 LLMs for medical scribing. Hallucinations were rare; omissions need attention.
I ran a small benchmark on LLMs for medical scribing. Reason: most discussion around AI scribe safety focuses on hallucinations. That matters, but in notes I kept seeing another problem: models often leave out clinically relevant details from the conversation. So I evaluated 8 frontier models on 300 synthetic doctor-pa