arXiv cs.AI June 24, 2026 · Papers

On the Stability of Prompt Ranking in Large Language Model Evaluation

arXiv:2606.24381v1 Announce Type: cross Abstract: Prompt-based interaction has become a dominant paradigm for using large language models (LLMs), where multiple candidate prompts are evaluated and the top-ranked one is selected for downstream use. This workflow implicitly assumes that prompt rankings are stable under m

Read original