r/LocalLLaMA
· Communities
We open-sourced a harness for evaluating VLMs on your own video, with traced runs
The framing that finally made VLM evaluation tractable for us is simple: decide what setup is right for your task, on your videos, at the quality, latency, and cost you can actually support. Once we framed it that way, the work changed. We stopped reading leaderboards and started building small eval sets from productio