News from arXiv cs.AI

arXiv cs.AI Papers 16 hr ago

On the Stability of Prompt Ranking in Large Language Model Evaluation

arXiv:2606.24381v1 Announce Type: cross Abstract: Prompt-based interaction has become a dominant paradigm for using large language models (LLMs), where multiple candidate prompts are evaluated and the…

arXiv cs.AI Papers 16 hr ago

VeryTrace: Verifying Reasoning Traces through Compilable Formalism and Structured Verification

arXiv:2606.24124v1 Announce Type: new Abstract: Multi-step reasoning with Chain-of-Thought (CoT) prompting remains fragile: logical errors or hallucinations in early steps silently propagate, producing confident but incorrect…

arXiv cs.AI Papers 16 hr ago

Poster: Exploring the Limits of Audio-Based Detection of Turkish Phone Call Scams

arXiv:2606.24523v1 Announce Type: cross Abstract: Scam phone calls exploit vulnerable communities worldwide, yet research on detection has focused almost exclusively on English and other high-resource languages.…

arXiv cs.AI Papers 16 hr ago

Evaluating the Interpretability of Sparse Autoencoders with Concept Annotations

arXiv:2606.24716v1 Announce Type: cross Abstract: Sparse autoencoders (SAEs) are increasingly used to extract interpretable concepts from vision and vision language models, yet existing evaluation methods largely…

arXiv cs.AI Papers 16 hr ago

Ensemble Feature Selection and Harris Hawks Optimization for Explainable Mental Health Risk Prediction in Female Sex Workers

arXiv:2606.24047v1 Announce Type: new Abstract: One of the significant mental health issues affecting female sex workers (FSWs) is mental disorders, especially depression. Exposure to violence, stigma,…

arXiv cs.AI Papers 16 hr ago

OmniPath: A Multi-Modal Agentic Framework for Auditing Wheelchair Accessibility

arXiv:2606.24129v1 Announce Type: new Abstract: For a wheelchair user, a standard blue line on a map is often a broken promise. While platforms like OpenStreetMap (OSM)…

arXiv cs.AI Papers 16 hr ago

Cross-Dataset, Age, and Gender Generalization: A Comprehensive Analysis of Fine-Tuning Strategies for Low-Resource Children's ASR

arXiv:2606.19791v2 Announce Type: replace-cross Abstract: The challenge associated with recognizing dysarthric speech primarily arises from pronounced acoustic variability attributed to impaired articulatory precision. Past research has…

arXiv cs.AI Papers 16 hr ago

TIP-Search: Time-Predictable Inference Scheduling for Market Prediction under Uncertain Load

arXiv:2506.08026v4 Announce Type: replace Abstract: Real-time market prediction services need correct predictions before a decision deadline; a correct prediction delivered late is not usable. TIP-Search studies…

arXiv cs.AI Papers 16 hr ago

Can Language Model Agents be Helpful Circuit Explainers in Mechanistic Interpretability?

arXiv:2606.24026v1 Announce Type: new Abstract: Mechanistic interpretability has made substantial progress in automatically localizing circuits, but explaining what localized components do remains labor-intensive and difficult to…

arXiv cs.AI Papers 16 hr ago

T2D-Bench: Evidence-Gated Evaluation of LLM Outputs for Type 2 Diabetes Using a Multi-Layer Clinical-Lifestyle Knowledge Graph

arXiv:2606.24145v1 Announce Type: new Abstract: Large language models (LLMs) can produce clinically fluent recommendations for type 2 diabetes while failing to satisfy guideline constraints or explicitly…

Latest

On the Stability of Prompt Ranking in Large Language Model Evaluation

VeryTrace: Verifying Reasoning Traces through Compilable Formalism and Structured Verification

Poster: Exploring the Limits of Audio-Based Detection of Turkish Phone Call Scams

Evaluating the Interpretability of Sparse Autoencoders with Concept Annotations

Ensemble Feature Selection and Harris Hawks Optimization for Explainable Mental Health Risk Prediction in Female Sex Workers

OmniPath: A Multi-Modal Agentic Framework for Auditing Wheelchair Accessibility

Cross-Dataset, Age, and Gender Generalization: A Comprehensive Analysis of Fine-Tuning Strategies for Low-Resource Children's ASR

TIP-Search: Time-Predictable Inference Scheduling for Market Prediction under Uncertain Load

Can Language Model Agents be Helpful Circuit Explainers in Mechanistic Interpretability?

T2D-Bench: Evidence-Gated Evaluation of LLM Outputs for Type 2 Diabetes Using a Multi-Layer Clinical-Lifestyle Knowledge Graph